Update README.md
Browse files
README.md
CHANGED
@@ -72,7 +72,7 @@ Figure 4: Test perplexity over the course of training for vocabulary swap (swapp
|
|
72 |
|
73 |
We also verify that finetuning from English to Czech is beneficial for MPT-7B model, compared from training a new model, at least on the first 10K steps. The training also seems to be more stable (notice yellow spike around 10k steps).
|
74 |
<img src="figures/csmpt_tllama_test.png" width="900"/>
|
75 |
-
Figure 5: Test cross-entropy over the course of training on CSMPT7B (yellow-red). Comparison with TinyLLAMA (blue-green). Our method (red&green curve) vs
|
76 |
|
77 |
The vocabulary swap was done the same way as our [Czech-GPT-2](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k) model (check it out for comprehensive description.)
|
78 |
For CSMPT7b, we managed to align 4,177 english tokens with corresponding czech tokens.
|
|
|
72 |
|
73 |
We also verify that finetuning from English to Czech is beneficial for MPT-7B model, compared from training a new model, at least on the first 10K steps. The training also seems to be more stable (notice yellow spike around 10k steps).
|
74 |
<img src="figures/csmpt_tllama_test.png" width="900"/>
|
75 |
+
Figure 5: Test cross-entropy over the course of training on CSMPT7B (yellow-red). Comparison with TinyLLAMA (blue-green). Our method (red&green curve) vs training from scratch (yellow&blue curve).
|
76 |
|
77 |
The vocabulary swap was done the same way as our [Czech-GPT-2](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k) model (check it out for comprehensive description.)
|
78 |
For CSMPT7b, we managed to align 4,177 english tokens with corresponding czech tokens.
|