CentraleSupelec - Natural language processing

Practical session n°7

Natural Language Inferencing (NLI):

(NLI) is a classical NLP (Natural Language Processing) problem that involves taking two sentences (the premise and the hypothesis ), and deciding how they are related (if the premise entails the hypothesis, contradicts it, or neither).

Ex:

Premise	Label	Hypothesis
A man inspects the uniform of a figure in some East Asian country.	contradiction	The man is sleeping.
An older and younger man smiling.	neutral	Two men are smiling and laughing at the cats playing on the floor.
A soccer game with multiple males playing.	entailment	Some men are playing a sport.

Stanford NLI (SNLI) corpus

In this labwork, I propose to use the Stanford NLI (SNLI) corpus ( https://nlp.stanford.edu/projects/snli/ ), available in the Datasets library by Huggingface.

from datasets import load_dataset
snli = load_dataset("snli")
#Removing sentence pairs with no label (-1)
snli = snli.filter(lambda example: example['label'] != -1)

Quick summary of the model

This is the model from : Youssef Adarrab, Othmane Baziz and Alain Malige

Fist we import the corpus and do some visualization
Second we apply DistilBert for sequence classification
We illustrate through our work the code used for training, to obtain better results, one should run the training on more epochs