ajagota71/pythia-70m-s-nlp-detox-checkpoint-epoch-100 Reinforcement Learning • 0.1B • Updated 30 days ago • 69
mradermacher/LongWriter-Zero-32B-i1-GGUF Reinforcement Learning • 33B • Updated 27 days ago • 999 • 2
ajagota71/pythia-410m-s-nlp-detox-checkpoint-epoch-20 Reinforcement Learning • 0.4B • Updated 27 days ago • 7
ajagota71/pythia-410m-s-nlp-detox-checkpoint-epoch-40 Reinforcement Learning • 0.4B • Updated 27 days ago • 7
ajagota71/pythia-410m-s-nlp-detox-checkpoint-epoch-60 Reinforcement Learning • 0.4B • Updated 27 days ago • 6
ajagota71/pythia-410m-s-nlp-detox-checkpoint-epoch-80 Reinforcement Learning • 0.4B • Updated 27 days ago • 8
ajagota71/pythia-410m-s-nlp-detox-checkpoint-epoch-100 Reinforcement Learning • 0.4B • Updated 27 days ago • 25
ajagota71/pythia-1b-s-nlp-detox-checkpoint-epoch-20 Reinforcement Learning • 1B • Updated 26 days ago • 4
ajagota71/pythia-1b-s-nlp-detox-checkpoint-epoch-40 Reinforcement Learning • 1B • Updated 26 days ago • 3
ajagota71/pythia-1b-s-nlp-detox-checkpoint-epoch-60 Reinforcement Learning • 1B • Updated 26 days ago • 2
ajagota71/pythia-1b-s-nlp-detox-checkpoint-epoch-80 Reinforcement Learning • 1B • Updated 26 days ago • 2
ajagota71/pythia-1b-s-nlp-detox-checkpoint-epoch-100 Reinforcement Learning • 1B • Updated 26 days ago • 9
ajagota71/llama-3-2-1b-s-nlp-detox-checkpoint-epoch-20 Reinforcement Learning • 1B • Updated 26 days ago • 3
ajagota71/llama-3-2-1b-s-nlp-detox-checkpoint-epoch-40 Reinforcement Learning • 1B • Updated 26 days ago • 3
ajagota71/llama-3-2-1b-s-nlp-detox-checkpoint-epoch-60 Reinforcement Learning • 1B • Updated 26 days ago • 3
ajagota71/llama-3-2-1b-s-nlp-detox-checkpoint-epoch-80 Reinforcement Learning • 1B • Updated 26 days ago • 4
ajagota71/llama-3-2-1b-rlhf-kl-p4-target-3-checkpoint-epoch-20 Reinforcement Learning • 1B • Updated 26 days ago • 2
ajagota71/llama-3-2-1b-rlhf-kl-p4-target-3-checkpoint-epoch-40 Reinforcement Learning • 1B • Updated 26 days ago • 2
ajagota71/llama-3-2-1b-rlhf-kl-p4-target-3-checkpoint-epoch-60 Reinforcement Learning • 1B • Updated 26 days ago • 3
ajagota71/llama-3-2-1b-rlhf-kl-p4-target-3-checkpoint-epoch-80 Reinforcement Learning • 1B • Updated 26 days ago • 3
ajagota71/llama-3-2-1b-rlhf-kl-p4-target-3-checkpoint-epoch-100 Reinforcement Learning • 1B • Updated 26 days ago • 2
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-20 Reinforcement Learning • 1B • Updated 26 days ago • 3
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-40 Reinforcement Learning • 1B • Updated 26 days ago • 3
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-60 Reinforcement Learning • 1B • Updated 26 days ago • 3