Spaces:

catallama
/

README

Running

laurentiubp commited on May 26, 2024

Commit

4c294d5

verified ·

1 Parent(s): f05c5f8

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -14,9 +14,9 @@ short_description: CataLlama models official page
 CataLlama was trained on roughly **445 million new tokens** in three separate stages:
-- *Language engancement* with raw text - we could also call this "continued pre-training" at a very small scale.
-- *Supervised fine-tuning* on instructions consisting of 70% Catalan Language and 30% English Language.
-- *DPO fine-tuning* on preferences consisting of 70% Catalan language and 30% English Language.
 **Note:** This model is not intended to beat benchmarks, but to demonstrate techniques for augmenting LLMs on new languages and preserve rare languages as part of our world heritage.

 CataLlama was trained on roughly **445 million new tokens** in three separate stages:
+- **Language engancement** with raw text - we could also call this "continued pre-training" at a very small scale.
+- **Supervised fine-tuning** on instructions consisting of 70% Catalan Language and 30% English Language.
+- **DPO fine-tuning** on preferences consisting of 70% Catalan language and 30% English Language.
 **Note:** This model is not intended to beat benchmarks, but to demonstrate techniques for augmenting LLMs on new languages and preserve rare languages as part of our world heritage.