💡 DICE - a sail Collection

sail 's Collections

🔱 Sailor2 Language Models

🧬 RegMix: Data Mixture as Regression

📈 Scaling Laws with Vocabulary

⚓️ Sailor Language Models

💡 DICE

updated Jul 28, 2024

Self-alignment with DPO Implicit Rewards

Bootstrapping Language Models with DPO Implicit Rewards

Paper • 2406.09760 • Published Jun 14, 2024 • 39
sail/Llama-3-Base-8B-DICE-Iter1

Text Generation • Updated Jul 11, 2024 • 17 • 1
sail/Llama-3-Base-8B-DICE-Iter2

Text Generation • Updated Jul 11, 2024 • 13 • 2
sail/Zephyr-7B-DICE-Iter1

Text Generation • Updated Jul 11, 2024 • 13
sail/Zephyr-7B-DICE-Iter2

Text Generation • Updated Jul 11, 2024 • 16