Update README.md
Browse files
README.md
CHANGED
@@ -14,9 +14,9 @@ short_description: CataLlama models official page
|
|
14 |
|
15 |
CataLlama was trained on roughly **445 million new tokens** in three separate stages:
|
16 |
|
17 |
-
-
|
18 |
-
-
|
19 |
-
-
|
20 |
|
21 |
**Note:** This model is not intended to beat benchmarks, but to demonstrate techniques for augmenting LLMs on new languages and preserve rare languages as part of our world heritage.
|
22 |
|
|
|
14 |
|
15 |
CataLlama was trained on roughly **445 million new tokens** in three separate stages:
|
16 |
|
17 |
+
- **Language engancement** with raw text - we could also call this "continued pre-training" at a very small scale.
|
18 |
+
- **Supervised fine-tuning** on instructions consisting of 70% Catalan Language and 30% English Language.
|
19 |
+
- **DPO fine-tuning** on preferences consisting of 70% Catalan language and 30% English Language.
|
20 |
|
21 |
**Note:** This model is not intended to beat benchmarks, but to demonstrate techniques for augmenting LLMs on new languages and preserve rare languages as part of our world heritage.
|
22 |
|