Text Generation
Transformers
Safetensors
Czech
mpt
custom_code
text-generation-inference
Inference Endpoints
mfajcik commited on
Commit
7574425
1 Parent(s): 17f6699

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -70,6 +70,10 @@ To transfer knowledge from English model to Czech, we developed a simple method
70
  <img src="figures/tllama_test.png" width="900"/>
71
  Figure 4: Test perplexity over the course of training for vocabulary swap (swapping 1.7K tokens) method on TinyLLAMA. Our method (green curve) vs TinyLLAMA training from scratch (blue curve).
72
 
 
 
 
 
73
  The vocabulary swap was done the same way as our [Czech-GPT-2](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k) model (check it out for comprehensive description.)
74
  For CSMPT7b, we managed to align 4,177 english tokens with corresponding czech tokens.
75
 
@@ -97,7 +101,6 @@ Not mentioned hyperparameters were kept the same as for MPT.
97
  | scheduler_steps | 170,000 | |
98
  | scheduler_alpha | 0.1 | So LR on last step is 0.1*(vanilla LR) |
99
 
100
-
101
  # Usage
102
  ## How to Setup Environment
103
  ```bash
 
70
  <img src="figures/tllama_test.png" width="900"/>
71
  Figure 4: Test perplexity over the course of training for vocabulary swap (swapping 1.7K tokens) method on TinyLLAMA. Our method (green curve) vs TinyLLAMA training from scratch (blue curve).
72
 
73
+ We also verify that finetuning from English to Czech is beneficial for MPT-7B model, compared from training a new model, at least on the first 10K steps. The training also seems to be more stable (notice yellow spike around 10k steps).
74
+ Figure 5: Test cross-entropy over the course of training on CSMPT7B (yellow-red). Comparison with TinyLLAMA (blue-green). Our method (red&green curve) vs TinyLLAMA training from scratch (yellow&blue curve).
75
+ <img src="figures/csmpt_tllama_test.png" width="900"/>
76
+
77
  The vocabulary swap was done the same way as our [Czech-GPT-2](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k) model (check it out for comprehensive description.)
78
  For CSMPT7b, we managed to align 4,177 english tokens with corresponding czech tokens.
79
 
 
101
  | scheduler_steps | 170,000 | |
102
  | scheduler_alpha | 0.1 | So LR on last step is 0.1*(vanilla LR) |
103
 
 
104
  # Usage
105
  ## How to Setup Environment
106
  ```bash