Update README.md
Browse files
README.md
CHANGED
@@ -70,6 +70,10 @@ To transfer knowledge from English model to Czech, we developed a simple method
|
|
70 |
<img src="figures/tllama_test.png" width="900"/>
|
71 |
Figure 4: Test perplexity over the course of training for vocabulary swap (swapping 1.7K tokens) method on TinyLLAMA. Our method (green curve) vs TinyLLAMA training from scratch (blue curve).
|
72 |
|
|
|
|
|
|
|
|
|
73 |
The vocabulary swap was done the same way as our [Czech-GPT-2](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k) model (check it out for comprehensive description.)
|
74 |
For CSMPT7b, we managed to align 4,177 english tokens with corresponding czech tokens.
|
75 |
|
@@ -97,7 +101,6 @@ Not mentioned hyperparameters were kept the same as for MPT.
|
|
97 |
| scheduler_steps | 170,000 | |
|
98 |
| scheduler_alpha | 0.1 | So LR on last step is 0.1*(vanilla LR) |
|
99 |
|
100 |
-
|
101 |
# Usage
|
102 |
## How to Setup Environment
|
103 |
```bash
|
|
|
70 |
<img src="figures/tllama_test.png" width="900"/>
|
71 |
Figure 4: Test perplexity over the course of training for vocabulary swap (swapping 1.7K tokens) method on TinyLLAMA. Our method (green curve) vs TinyLLAMA training from scratch (blue curve).
|
72 |
|
73 |
+
We also verify that finetuning from English to Czech is beneficial for MPT-7B model, compared from training a new model, at least on the first 10K steps. The training also seems to be more stable (notice yellow spike around 10k steps).
|
74 |
+
Figure 5: Test cross-entropy over the course of training on CSMPT7B (yellow-red). Comparison with TinyLLAMA (blue-green). Our method (red&green curve) vs TinyLLAMA training from scratch (yellow&blue curve).
|
75 |
+
<img src="figures/csmpt_tllama_test.png" width="900"/>
|
76 |
+
|
77 |
The vocabulary swap was done the same way as our [Czech-GPT-2](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k) model (check it out for comprehensive description.)
|
78 |
For CSMPT7b, we managed to align 4,177 english tokens with corresponding czech tokens.
|
79 |
|
|
|
101 |
| scheduler_steps | 170,000 | |
|
102 |
| scheduler_alpha | 0.1 | So LR on last step is 0.1*(vanilla LR) |
|
103 |
|
|
|
104 |
# Usage
|
105 |
## How to Setup Environment
|
106 |
```bash
|