CentraleSupelec - Natural language processing
Practical session n°7
Natural Language Inferencing (NLI):
(NLI) is a classical NLP (Natural Language Processing) problem that involves taking two sentences (the premise and the hypothesis ), and deciding how they are related (if the premise entails the hypothesis, contradicts it, or neither).
Ex:
Premise | Label | Hypothesis |
---|---|---|
A man inspects the uniform of a figure in some East Asian country. | contradiction | The man is sleeping. |
An older and younger man smiling. | neutral | Two men are smiling and laughing at the cats playing on the floor. |
A soccer game with multiple males playing. | entailment | Some men are playing a sport. |
Stanford NLI (SNLI) corpus
In this labwork, I propose to use the Stanford NLI (SNLI) corpus ( https://nlp.stanford.edu/projects/snli/ ), available in the Datasets library by Huggingface.
from datasets import load_dataset
snli = load_dataset("snli")
#Removing sentence pairs with no label (-1)
snli = snli.filter(lambda example: example['label'] != -1)
Quick summary of the model
This is the model from : Youssef Adarrab, Othmane Baziz and Alain Malige
- Fist we import the corpus and do some visualization
- Second we apply DistilBert for sequence classification
- We illustrate through our work the code used for training, to obtain better results, one should run the training on more epochs