Speech Emotion Recognition Model trained on litagin/Galgame_Speech_SER_16kHz.

Usage

from transformers import pipeline

REPO_ID = "litagin/anime_speech_emotion_classification"
pipe = pipeline(
    "audio-classification",
    model=REPO_ID,
    feature_extractor=REPO_ID,
    trust_remote_code=True,
    device="cuda",
)

audio_path = "path/to/audio.wav"
result = pipe(audio_path)
print(result)

Result:

[{'score': 0.5655683279037476, 'label': 'Angry'},
 {'score': 0.12489483505487442, 'label': 'Disgusted'},
 {'score': 0.11449059844017029, 'label': 'Embarrassed'},
 {'score': 0.06627542525529861, 'label': 'Surprised'},
 {'score': 0.06157735362648964, 'label': 'Sad'},
 {'score': 0.031055787578225136, 'label': 'Neutral'},
 {'score': 0.022820966318249702, 'label': 'Happy'},
 {'score': 0.00791135337203741, 'label': 'Fearful'},
 {'score': 0.00540440296754241, 'label': 'Sexual1'},
 {'score': 8.61035857724346e-07, 'label': 'Sexual2'}]

Label

  "id2label": {
    "0": "Angry",
    "1": "Disgusted",
    "2": "Embarrassed",
    "3": "Fearful",
    "4": "Happy",
    "5": "Sad",
    "6": "Surprised",
    "7": "Neutral",
    "8": "Sexual1",  # NSFW erotic voices such as ๅ–˜ใŽ
    "9": "Sexual2"  # Blowjob Oral Slurp SFX ใƒใƒฅใƒ‘้Ÿณ
  },
Downloads last month
1,932
Safetensors
Model size
165M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train litagin/anime_speech_emotion_classification