Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -89,8 +89,6 @@ When adapting a model from English to other languages the tokenizer plays a cruc
 If the tokenizer does not include the target language in its training data, the resulting model will need many more tokens to perform the same task.
 We solve this problem by creating a new tokenizer in the target languages (Spanish and Catalan) and adapting the embedding layer to it.
-ESTO QUE HACE AQUI??
-It is a fine-tuned version of [/bscdata/models/falcon_7b_balanced_tokenizer_fp16/](https://huggingface.co//bscdata/models/falcon_7b_balanced_tokenizer_fp16/) on the /bscdata/data/open_data_26B_tokens_balanced_es_ca/open_data_26B_tokens_balanced_es_ca.py default dataset.
 ### New Tokenizer
 We trained a new BPE Tokenizer for the Catalan and Spanish languages (equal representation). We shuffle a small amount of English in the mixture (since English is in the model training data).

 If the tokenizer does not include the target language in its training data, the resulting model will need many more tokens to perform the same task.
 We solve this problem by creating a new tokenizer in the target languages (Spanish and Catalan) and adapting the embedding layer to it.
 ### New Tokenizer
 We trained a new BPE Tokenizer for the Catalan and Spanish languages (equal representation). We shuffle a small amount of English in the mixture (since English is in the model training data).