projecte-aina
/

FLOR-1.3B-Instructed

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

mmarimon commited on Mar 19

Commit

5d8b61d

•

1 Parent(s): 38fd0ed

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -21,8 +21,8 @@ license: apache-2.0
 ## Model description
-**FLOR-1.3B-Instructed** is a 1.3B-parameter transformer-based causal language model for Catalan, Spanish, and English, trained on a combined dataset from (InstruCAT)[https://huggingface.co/datasets/BSC-LT/InstruCat], a Catalan language set of instruction generated automatically from prject-aina task orientated dataset, a subset of the [Dolly](databricks/databricks-dolly-15k) dataset for English, and [MENTOR_ES](https://huggingface.co/datasets/projecte-aina/MENTOR_ES) and [MENTOR_CA](https://huggingface.co/datasets/projecte-aina/MENTOR_CA), a Spanish and Catalan sets of instructions commisioned by the BSC Language Technologies Unit.
-It is the result of a language adaptation technique performed on [BLOOM-7.1B](https://huggingface.co/bigscience/bloom-7b1),
 which involves modifying the model's vocabulary and embedding layer, and continuously pre-training the model with 140B tokens in our target languages.
 Blog post describing the base model with more parameters: [flor-6-3b, a chinchilla compliant model](https://medium.com/@mpamies247/flor-6-3b-a-chinchilla-compliant-model-for-catalan-spanish-and-english-7cdb389a9aac)

 ## Model description
+**FLOR-1.3B-Instructed** is a 1.3B-parameter transformer-based causal language model for Catalan, Spanish, and English, trained on a combined dataset from [InstruCat](https://huggingface.co/datasets/BSC-LT/InstruCat), a Catalan language set of instruction generated automatically from prject-aina task orientated dataset, a subset of the [Dolly](https://huggingface.co/datasets/databricks/databricks-dolly-15k) dataset for English, and [MENTOR_ES](https://huggingface.co/datasets/projecte-aina/MENTOR_ES) and [MENTOR_CA](https://huggingface.co/datasets/projecte-aina/MENTOR_CA), a Spanish and Catalan sets of instructions commisioned by the BSC Language Technologies Unit.
+It is th result of a language adaptation technique performed on [BLOOM-7.1B](https://huggingface.co/bigscience/bloom-7b1),
 which involves modifying the model's vocabulary and embedding layer, and continuously pre-training the model with 140B tokens in our target languages.
 Blog post describing the base model with more parameters: [flor-6-3b, a chinchilla compliant model](https://medium.com/@mpamies247/flor-6-3b-a-chinchilla-compliant-model-for-catalan-spanish-and-english-7cdb389a9aac)