athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1 further pretrained on 1 epoch of the dirty stories from nothingiisreal/Reddit-Dirty-And-WritingPrompts, with all scores below 2 dropped.

Why do this? I have a niche use case where I cannot increase compute over 8b, and L3/3.1 are the only models in this size category that meet my needs for logic. However, both versions of L3/3.1 have the damn repetition/token overconfidence problem, and this is meant to disrupt that certainty without disrupting the model's ability to function.

By the way, I think it's the lm_head that is causing the looping, but it might be the embeddings being too separated. I'm not going to pay two more times to test them separately, however :p

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	20.74
IFEval (0-Shot)	45.21
BBH (3-Shot)	28.02
MATH Lvl 5 (4-Shot)	8.84
GPQA (0-shot)	5.59
MuSR (0-shot)	8.30
MMLU-PRO (5-shot)	28.50

Model tree for athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1-plus_reddit

Evaluation results

strict accuracy on IFEval (0-Shot)
Open LLM Leaderboard

45.210
normalized accuracy on BBH (3-Shot)
Open LLM Leaderboard

28.020
exact match on MATH Lvl 5 (4-Shot)
Open LLM Leaderboard

8.840
acc_norm on GPQA (0-shot)
Open LLM Leaderboard

5.590
acc_norm on MuSR (0-shot)
Open LLM Leaderboard

8.300
accuracy on MMLU-PRO (5-shot)
test set Open LLM Leaderboard

28.500

View on Papers With Code