Severino commited on
Commit
a2b9740
·
1 Parent(s): 00d3d0b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -2
README.md CHANGED
@@ -89,8 +89,6 @@ When adapting a model from English to other languages the tokenizer plays a cruc
89
 
90
  If the tokenizer does not include the target language in its training data, the resulting model will need many more tokens to perform the same task.
91
  We solve this problem by creating a new tokenizer in the target languages (Spanish and Catalan) and adapting the embedding layer to it.
92
- ESTO QUE HACE AQUI??
93
- It is a fine-tuned version of [/bscdata/models/falcon_7b_balanced_tokenizer_fp16/](https://huggingface.co//bscdata/models/falcon_7b_balanced_tokenizer_fp16/) on the /bscdata/data/open_data_26B_tokens_balanced_es_ca/open_data_26B_tokens_balanced_es_ca.py default dataset.
94
 
95
  ### New Tokenizer
96
  We trained a new BPE Tokenizer for the Catalan and Spanish languages (equal representation). We shuffle a small amount of English in the mixture (since English is in the model training data).
 
89
 
90
  If the tokenizer does not include the target language in its training data, the resulting model will need many more tokens to perform the same task.
91
  We solve this problem by creating a new tokenizer in the target languages (Spanish and Catalan) and adapting the embedding layer to it.
 
 
92
 
93
  ### New Tokenizer
94
  We trained a new BPE Tokenizer for the Catalan and Spanish languages (equal representation). We shuffle a small amount of English in the mixture (since English is in the model training data).