HarbingerX commited on
Commit
3fe83aa
·
verified ·
1 Parent(s): e82e3b3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -0
README.md CHANGED
@@ -278,6 +278,18 @@ https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard
278
  |MuSR (0-shot) | 4.54|
279
  |MMLU-PRO (5-shot) | 22.33|
280
 
 
 
 
 
 
 
 
 
 
 
 
 
281
  ---
282
  # Risk Disclaimer
283
  By using this model, you acknowledge that you understand and assume the risks associated with its use. You are solely responsible for ensuring compliance with all applicable laws and regulations. We disclaim any liability for problems arising from the use of this open-source model, including but not limited to direct, indirect, incidental, consequential, or punitive damages. We make no warranties, express or implied, regarding the model's performance, accuracy, or fitness for a particular purpose. Your use of this model is at your own risk, and you agree to hold harmless and indemnify us, our affiliates, and our contributors from any claims, damages, or expenses arising from your use of the model.
 
278
  |MuSR (0-shot) | 4.54|
279
  |MMLU-PRO (5-shot) | 22.33|
280
 
281
+
282
+ Our maing goal here was to check BBH and MuSR. We're not going to "cheat" on those benches. But while IFEVAL is good for a 3B model, BBH is not so bad, but I really
283
+ expected an increase at MuSR.
284
+ Since our model is an experiment towards language and REAL language, after all language models are trained for language, we do not care about Math and other stuff. I mean
285
+ they are less important for us than those mentioned, why? Math, GPQA are at high-school level. If we had to go for it, we would want academic level.
286
+ MMLU is not what someone looking for a nice conversational and roleplay model.
287
+ AT LEAST,across the board, our model is not the worst on BBH and MuSR when it comes to Llama 3.2 3B.
288
+ While our model is somewhat successful in breaking previous BIAS, I (we) will keep training it in small doses to improve it's language capacity, and mind you, not to
289
+ be the leader of Open LLM Leaderboard, but the evaluation matters a lot and at least now we can have an idea.
290
+
291
+ Mind you, a RP model scoring low at MuSR and BBH is not really a good RP model. Ours is, say, average or even good considering it's a 3B model.
292
+
293
  ---
294
  # Risk Disclaimer
295
  By using this model, you acknowledge that you understand and assume the risks associated with its use. You are solely responsible for ensuring compliance with all applicable laws and regulations. We disclaim any liability for problems arising from the use of this open-source model, including but not limited to direct, indirect, incidental, consequential, or punitive damages. We make no warranties, express or implied, regarding the model's performance, accuracy, or fitness for a particular purpose. Your use of this model is at your own risk, and you agree to hold harmless and indemnify us, our affiliates, and our contributors from any claims, damages, or expenses arising from your use of the model.