---
datasets:
- assin2
language:
- pt
metrics:
- accuracy
pipeline_tag: text-classification
tags:
- nli
---
# Model Card for Model ID
This is a **[BERTimbau-base](https://huggingface.co/neuralmind/bert-large-portuguese-cased) fine-tuned model** on 5K (premise, hypothesis) sentence pairsfrom
the **PLUE/MNLI (Portuguese translation of the SNLI's GLUE benchmark)** corpus. The original references are:
[Unsupervised Cross-Lingual Representation Learning At Scale](https://arxiv.org/pdf/1911.02116), [PLUE](https://huggingface.co/datasets/dlb/plue), respectivelly. This model is suitable for Portuguese.
## Model Details
### Model Description
- **Developed by:** Giovani Tavares and Felipe Ribas Serras
- **Oriented By:** Felipe Ribas Serras, Renata Wassermann and Marcelo Finger
- **Model type:** Transformer-based text classifier
- **Language(s) (NLP):** Portuguese
- **License:** mit
- **Finetuned from model** [BERTimbau-base](https://huggingface.co/neuralmind/bert-large-portuguese-cased)
### Model Sources
- **Repository:** [Natural-Portuguese-Language-Inference](https://github.com/giogvn/Natural-Portuguese-Language-Inference)
- **Paper:** This is an ongoing research. We are currently writing a paper where we fully describe our experiments.
## Uses
### Direct Use
This fine-tuned version of [BERTimbau-base](https://huggingface.co/neuralmind/bert-large-portuguese-cased) performs Natural
Language Inference (NLI), which is a text classification task.
The *(premise, hypothesis)* entailment definition used is the same as the one found in Salvatore's paper [1].
Therefore, this fine-tuned version of [BERTimbau-base](https://huggingface.co/neuralmind/bert-large-portuguese-cased) classifies pairs of sentences in the form *(premise, hypothesis)* into the classes *entailment*, *neutral* and *contradiction*.
## Demo
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
model_path = "giotvr/bertimbau_large_plue_mnli_fine_tuned"
premise = "As mudanças climáticas são uma ameaça séria para a biodiversidade do planeta."
hypothesis ="A biodiversidade do planeta é seriamente ameaçada pelas mudanças climáticas."
tokenizer = XLMRobertaTokenizer.from_pretrained(model_path, use_auth_token=True)
input_pair = tokenizer(premise, hypothesis, return_tensors="pt",padding=True, truncation=True)
model = AutoModelForSequenceClassification.from_pretrained(model_path, use_auth_token=True)
with torch.no_grad():
logits = model(**input_pair).logits
probs = torch.nn.functional.softmax(logits, dim=-1)
probs, sorted_indices = torch.sort(probs, descending=True)
for i, score in enumerate(probs[0]):
print(f"Class {sorted_indices[0][i]}: {score.item():.4f}")
```
### Recommendations
This model should be used for scientific purposes only. It was not tested for production environments.
## Fine-Tuning Details
### Fine-Tuning Data
---
- **Train Dataset**: [PLUE/MNLI](https://huggingface.co/datasets/dlb/plue)
- **Evaluation Dataset used for Hyperparameter Tuning:** [PLUE/MNLI](https://huggingface.co/datasets/dlb/plue)'s validation split
- **Test Datasets:**
- [ASSIN](https://huggingface.co/datasets/assin)'s test split
- [ASSIN2](https://huggingface.co/datasets/assin2)'s test split
- [PLUE/MNLI](https://huggingface.co/datasets/dlb/plue/viewer/mnli_matched)'s validation matched split
---
This is a fine tuned version of [BERTimbau-base](https://huggingface.co/neuralmind/bert-large-portuguese-cased) using the [ASSIN2 (Avaliação de Similaridade Semântica e Inferência textual)](https://huggingface.co/datasets/assin2) dataset. [ASSIN2](https://huggingface.co/datasets/assin2) is a corpus annotated with hypothesis/premise Portuguese sentence pairs suitable for detecting textual entailment or neutral
relationship between the members of such pairs. Such corpus is balanced with 7k *ptbr* (Brazilian Portuguese) sentence pairs.
### Fine-Tuning Procedure
The model's fine-tuning procedure can be summarized in three major subsequent tasks: