SLU Models
Collection
SSL models fine-tuned for the spoken intent classification task
•
9 items
•
Updated
ITALIC is an intent classification dataset for the Italian language, which is the first of its kind. It includes spoken and written utterances and is annotated with 60 intents. The dataset is available on Zenodo and connectors ara available for the HuggingFace Hub.
This is the facebook/wav2vec2-xls-r-300m model fine-tuned on the "Massive" split.
You can use the model directly in the following manner:
import torch
import librosa
from transformers import AutoModelForAudioClassification, AutoFeatureExtractor
## Load an audio file
audio_array, sr = librosa.load("path_to_audio.wav", sr=16000)
## Load model and feature extractor
model = AutoModelForAudioClassification.from_pretrained("alkiskoudounas/xls-r-128-italic-massive")
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-xls-r-300m")
## Extract features
inputs = feature_extractor(audio_array.squeeze(), sampling_rate=feature_extractor.sampling_rate, padding=True, return_tensors="pt")
## Compute logits
logits = model(**inputs).logits
For more information about the dataset and the model, please refer to the paper.
If you use this model in your research, please cite the following paper:
@inproceedings{koudounas2023italic,
title={ITALIC: An Italian Intent Classification Dataset},
author={Koudounas, Alkis and La Quatra, Moreno and Vaiani, Lorenzo and Colomba, Luca and Attanasio, Giuseppe and Pastor, Eliana and Cagliero, Luca and Baralis, Elena},
booktitle={Proc. Interspeech 2023},
pages={2153--2157},
year={2023}
}
Base model
facebook/wav2vec2-xls-r-300m