athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1 further pretrained on 1 epoch of the dirty stories from nothingiisreal/Reddit-Dirty-And-WritingPrompts, with all scores below 2 dropped.
Why do this? I have a niche use case where I cannot increase compute over 8b, and L3/3.1 are the only models in this size category that meet my needs for logic. However, both versions of L3/3.1 have the damn repetition/token overconfidence problem, and this is meant to disrupt that certainty without disrupting the model's ability to function.
By the way, I think it's the lm_head that is causing the looping, but it might be the embeddings being too separated. I'm not going to pay two more times to test them separately, however :p
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 20.74 |
IFEval (0-Shot) | 45.21 |
BBH (3-Shot) | 28.02 |
MATH Lvl 5 (4-Shot) | 8.84 |
GPQA (0-shot) | 5.59 |
MuSR (0-shot) | 8.30 |
MMLU-PRO (5-shot) | 28.50 |
- Downloads last month
- 131
Model tree for athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1-plus_reddit
Evaluation results
- strict accuracy on IFEval (0-Shot)Open LLM Leaderboard45.210
- normalized accuracy on BBH (3-Shot)Open LLM Leaderboard28.020
- exact match on MATH Lvl 5 (4-Shot)Open LLM Leaderboard8.840
- acc_norm on GPQA (0-shot)Open LLM Leaderboard5.590
- acc_norm on MuSR (0-shot)Open LLM Leaderboard8.300
- accuracy on MMLU-PRO (5-shot)test set Open LLM Leaderboard28.500