gabriel-p's picture
Update README.md
0a1815a verified
---
license: apache-2.0
---
<h1 align="center">mT5 small spanish es</h1>
This is a Spanish fine-tuned version of Google's mT5-small model.
https://huggingface.co/google/mt5-small
# Datasets
The datasets used for the fine-tuning
Task Prefix
Multinli (English) multi nli premise:[Text] hypo:[Text]
Multinli (Spanish) multi nli premise:[Text] hypo:[Text]
Pawx (English) pawx sentence1:[Text] sentence2:[Text]
Pawx (Spanish) pawx sentence1:[Text] sentence2:[Text]
Squad (English) question:[Text] context:[Text]
Squad (Spanish) question:[Text] context:[Text]
Translations (English-Spanish) translate English to Spanish:[Text]
Translations (Spanish-English) translate Spanish to English:[Text]
# Inference
The following piece of code could be used to perfome the different model tasks.
Translations
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_name = "HURIDOCS/mt5-small-spanish-es"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
task = "translate Spanish to English:Esta frase es para probar el modelo"
input_ids = tokenizer(
[task],
return_tensors="pt",
padding="max_length",
truncation=True,
max_length=512
)["input_ids"]
output_ids = model.generate(
input_ids=input_ids,
max_length=84,
no_repeat_ngram_size=2,
num_beams=4
)[0]
result_text = tokenizer.decode(
output_ids,
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)
print(result_text)
Question answering
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_name = "HURIDOCS/mt5-small-spanish-es"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
task = '''question:En qué país se encuentra Normandía? context:Los normandos (normandos: Nourmann; Francés: Normandos; Normanni)
fue el pueblo que en los siglos X y XI dio su nombre a Normandía, una región de Francia.
Eran descendientes de invasores nórdicos ('normandos" viene de "Norseman") y piratas de Dinamarca, Islandia y Noruega que,
bajo su líder Rollo, acordaron jurar lealtad al rey Carlos III de Francia Occidental. A través de generaciones de asimilación
y mezcla con las poblaciones nativas francas y galas romanas, sus descendientes se fusionarían gradualmente con las culturas
carolingias de Francia Occidental. La identidad cultural y étnica distintiva de los normandos surgió inicialmente en la
primera mitad del siglo X, y continuó evolucionando durante los siglos siguientes.'''
input_ids = tokenizer(
[task],
return_tensors="pt",
padding="max_length",
truncation=True,
max_length=512
)["input_ids"]
output_ids = model.generate(
input_ids=input_ids,
max_length=84,
no_repeat_ngram_size=2,
num_beams=4
)[0]
result_text = tokenizer.decode(
output_ids,
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)
print(result_text)
# Fine-tuning
Check out the Transformers Libray examples
https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering
# Performance
Spanish SQuAD v2 512 tokens
Model Exact match F1
rank 1 mrm8488/distill-bert-base-spanish-wwm-cased 50.43% 71.45%
rank 2 **mT5 small spanish es** 48.35% 62.03%
rank 3 flan-t5-small 41.44% 56.48%