HarbingerX
/

Zeitgeist-3b-V1

Text Generation

text-generation-inference

Not-For-All-Audiences

Inference Endpoints

Model card Files Files and versions Community

HarbingerX commited on 7 days ago

Commit

3fe83aa

·

verified ·

1 Parent(s): e82e3b3

Update README.md

Files changed (1) hide show

README.md +12 -0

README.md CHANGED Viewed

@@ -278,6 +278,18 @@ https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard
 |MuSR (0-shot)      |     4.54|
 |MMLU-PRO (5-shot)  |    22.33|
 ---
 # Risk Disclaimer
 By using this model, you acknowledge that you understand and assume the risks associated with its use. You are solely responsible for ensuring compliance with all applicable laws and regulations. We disclaim any liability for problems arising from the use of this open-source model, including but not limited to direct, indirect, incidental, consequential, or punitive damages. We make no warranties, express or implied, regarding the model's performance, accuracy, or fitness for a particular purpose. Your use of this model is at your own risk, and you agree to hold harmless and indemnify us, our affiliates, and our contributors from any claims, damages, or expenses arising from your use of the model.

 |MuSR (0-shot)      |     4.54|
 |MMLU-PRO (5-shot)  |    22.33|
+Our maing goal here was to check BBH and MuSR. We're not going to "cheat" on those benches. But while IFEVAL is good for a 3B model, BBH is not so bad, but I really
+expected an increase at MuSR.
+Since our model is an experiment towards language and REAL language, after all language models are trained for language, we do not care about Math and other stuff. I mean
+they are less important for us than those mentioned, why? Math, GPQA are at high-school level. If we had to go for it, we would want academic level.
+MMLU is not what someone looking for a nice conversational and roleplay model.
+AT LEAST,across the board, our model is not the worst on BBH and MuSR when it comes to Llama 3.2 3B.
+While our model is somewhat successful in breaking previous BIAS, I (we) will keep training it in small doses to improve it's language capacity, and mind you, not to
+be the leader of Open LLM Leaderboard, but the evaluation matters a lot and at least now we can have an idea.
+Mind you, a RP model scoring low at MuSR and BBH is not really a good RP model. Ours is, say, average or even good considering it's a 3B model.
 ---
 # Risk Disclaimer
 By using this model, you acknowledge that you understand and assume the risks associated with its use. You are solely responsible for ensuring compliance with all applicable laws and regulations. We disclaim any liability for problems arising from the use of this open-source model, including but not limited to direct, indirect, incidental, consequential, or punitive damages. We make no warranties, express or implied, regarding the model's performance, accuracy, or fitness for a particular purpose. Your use of this model is at your own risk, and you agree to hold harmless and indemnify us, our affiliates, and our contributors from any claims, damages, or expenses arising from your use of the model.