Update README.md
Browse files
README.md
CHANGED
@@ -89,8 +89,6 @@ When adapting a model from English to other languages the tokenizer plays a cruc
|
|
89 |
|
90 |
If the tokenizer does not include the target language in its training data, the resulting model will need many more tokens to perform the same task.
|
91 |
We solve this problem by creating a new tokenizer in the target languages (Spanish and Catalan) and adapting the embedding layer to it.
|
92 |
-
ESTO QUE HACE AQUI??
|
93 |
-
It is a fine-tuned version of [/bscdata/models/falcon_7b_balanced_tokenizer_fp16/](https://huggingface.co//bscdata/models/falcon_7b_balanced_tokenizer_fp16/) on the /bscdata/data/open_data_26B_tokens_balanced_es_ca/open_data_26B_tokens_balanced_es_ca.py default dataset.
|
94 |
|
95 |
### New Tokenizer
|
96 |
We trained a new BPE Tokenizer for the Catalan and Spanish languages (equal representation). We shuffle a small amount of English in the mixture (since English is in the model training data).
|
|
|
89 |
|
90 |
If the tokenizer does not include the target language in its training data, the resulting model will need many more tokens to perform the same task.
|
91 |
We solve this problem by creating a new tokenizer in the target languages (Spanish and Catalan) and adapting the embedding layer to it.
|
|
|
|
|
92 |
|
93 |
### New Tokenizer
|
94 |
We trained a new BPE Tokenizer for the Catalan and Spanish languages (equal representation). We shuffle a small amount of English in the mixture (since English is in the model training data).
|