wav2vec 2.0 XLS-R 128 (300m) fine-tuned on ITALIC - "Massive"

ITALIC is an intent classification dataset for the Italian language, which is the first of its kind. It includes spoken and written utterances and is annotated with 60 intents. The dataset is available on Zenodo and connectors ara available for the HuggingFace Hub.

This is the facebook/wav2vec2-xls-r-300m model fine-tuned on the "Massive" split.

Usage

You can use the model directly in the following manner:

import torch
import librosa
from transformers import AutoModelForAudioClassification, AutoFeatureExtractor

## Load an audio file
audio_array, sr = librosa.load("path_to_audio.wav", sr=16000)

## Load model and feature extractor
model = AutoModelForAudioClassification.from_pretrained("alkiskoudounas/xls-r-128-italic-massive")
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-xls-r-300m")

## Extract features
inputs = feature_extractor(audio_array.squeeze(), sampling_rate=feature_extractor.sampling_rate, padding=True, return_tensors="pt")

## Compute logits
logits = model(**inputs).logits

For more information about the dataset and the model, please refer to the paper.

Citation

If you use this model in your research, please cite the following paper:

@inproceedings{koudounas2023italic,
  title={ITALIC: An Italian Intent Classification Dataset},
  author={Koudounas, Alkis and La Quatra, Moreno and Vaiani, Lorenzo and Colomba, Luca and Attanasio, Giuseppe and Pastor, Eliana and Cagliero, Luca and Baralis, Elena},
  booktitle={Proc. Interspeech 2023},
  pages={2153--2157},
  year={2023}
}

alkiskoudounas
/

xls-r-128-italic-massive

wav2vec 2.0 XLS-R 128 (300m) fine-tuned on ITALIC - "Massive"

Usage

Citation

Model tree for alkiskoudounas/xls-r-128-italic-massive

Dataset used to train alkiskoudounas/xls-r-128-italic-massive

Collection including alkiskoudounas/xls-r-128-italic-massive

SLU Models