emotion2vec
/

emotion2vec_base

Model card Files Files and versions Community

BoJack commited on May 15, 2024

Commit

18a317b

verified ·

1 Parent(s): 2a72be2

Upload 5 files

Browse files

Files changed (6) hide show

.gitattributes +1 -0
README.md +107 -3
config.yaml +107 -0
configuration.json +12 -0
example/test.wav +0 -0
logo.png +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+logo.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,107 @@
----
-license: apache-2.0
----

+---
+license: other
+license_name: model-license
+license_link: https://github.com/alibaba-damo-academy/FunASR
+frameworks:
+- Pytorch
+tasks:
+- emotion-recognition
+---
+<div align="center">
+    <h1>
+    EMOTION2VEC
+    </h1>
+    <p>
+    emotion2vec: universal speech emotion representation model <br>
+    <b><em>emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation</em></b>
+    </p>
+    <p>
+    <img src="logo.png" style="width: 200px; height: 200px;">
+    </p>
+    <p>
+    </p>
+</div>
+# Guides
+emotion2vec is the first universal speech emotion representation model. Through self-supervised pre-training, emotion2vec has the ability to extract emotion representation across different tasks, languages, and scenarios.
+The version is an pre-trained representation model without fine-tuning, which can be used for feature extraction.
+# Model Card
+GitHub Repo: [emotion2vec](https://github.com/ddlBoJack/emotion2vec)
+|Model|⭐Model Scope|🤗Hugging Face|Fine-tuning Data (Hours)|
+|:---:|:-------------:|:-----------:|:-------------:|
+|emotion2vec|[Link](https://www.modelscope.cn/models/iic/emotion2vec_base/summary)|[Link](https://huggingface.co/emotion2vec/emotion2vec_base)|/|
+emotion2vec+ seed|[Link](https://modelscope.cn/models/iic/emotion2vec_plus_seed/summary)|[Link](https://huggingface.co/emotion2vec/emotion2vec_plus_seed)|201|
+emotion2vec+ base|[Link](https://modelscope.cn/models/iic/emotion2vec_plus_base/summary)|[Link](https://huggingface.co/emotion2vec/emotion2vec_plus_base)|4788|
+emotion2vec+ large|[Link](https://modelscope.cn/models/iic/emotion2vec_plus_large/summary)|[Link](https://huggingface.co/emotion2vec/emotion2vec_plus_large)|42526|
+# Installation
+`pip install -U funasr modelscope`
+# Usage
+input: 16k Hz speech recording
+granularity:
+- "utterance": Extract features from the entire utterance
+- "frame": Extract frame-level features (50 Hz)
+extract_embedding: Whether to extract features
+## Inference based on ModelScope
+```python
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+inference_pipeline = pipeline(
+    task=Tasks.emotion_recognition,
+    model="iic/emotion2vec_base")
+rec_result = inference_pipeline('https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav', output_dir="./outputs", granularity="utterance", extract_embedding=True)
+print(rec_result)
+```
+## Inference based on FunASR
+```python
+from funasr import AutoModel
+model = AutoModel(model="iic/emotion2vec_base")
+res = model(input='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav', output_dir="./outputs", granularity="utterance", extract_embedding=True)
+print(res)
+```
+Note: The model will automatically download.
+Supports input file list, wav.scp (Kaldi style):
+```cat wav.scp
+wav_name1 wav_path1.wav
+wav_name2 wav_path2.wav
+...
+```
+Outputs are emotion representation, saved in the output_dir in numpy format (can be loaded with np.load())
+# Note
+This repository is the Huggingface version of emotion2vec, with identical model parameters as the original model and Model Scope version.
+Original repository: [https://github.com/ddlBoJack/emotion2vec](https://github.com/ddlBoJack/emotion2vec)
+Model Scope repository: [https://github.com/alibaba-damo-academy/FunASR](https://github.com/alibaba-damo-academy/FunASR/tree/funasr1.0/examples/industrial_data_pretraining/emotion2vec)
+Hugging Face repository: [https://huggingface.co/emotion2vec](https://huggingface.co/emotion2vec)
+# Citation
+```BibTeX
+@article{ma2023emotion2vec,
+  title={emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation},
+  author={Ma, Ziyang and Zheng, Zhisheng and Ye, Jiaxin and Li, Jinchao and Gao, Zhifu and Zhang, Shiliang and Chen, Xie},
+  journal={arXiv preprint arXiv:2312.15185},
+  year={2023}
+}
+```

config.yaml ADDED Viewed

	@@ -0,0 +1,107 @@

+# network architecture
+model: Emotion2vec
+model_conf:
+    loss_beta: 0.0
+    loss_scale: null
+    depth: 8
+    start_drop_path_rate: 0.0
+    end_drop_path_rate: 0.0
+    num_heads: 12
+    norm_eps: 1e-05
+    norm_affine: true
+    encoder_dropout: 0.1
+    post_mlp_drop: 0.1
+    attention_dropout: 0.1
+    activation_dropout: 0.0
+    dropout_input: 0.0
+    layerdrop: 0.05
+    embed_dim: 768
+    mlp_ratio: 4.0
+    layer_norm_first: false
+    average_top_k_layers: 8
+    end_of_block_targets: false
+    clone_batch: 8
+    layer_norm_target_layer: false
+    batch_norm_target_layer: false
+    instance_norm_target_layer: true
+    instance_norm_targets: false
+    layer_norm_targets: false
+    ema_decay: 0.999
+    ema_same_dtype: true
+    log_norms: true
+    ema_end_decay: 0.99999
+    ema_anneal_end_step: 20000
+    ema_encoder_only: false
+    max_update: 100000
+    extractor_mode: layer_norm
+    shared_decoder: null
+    min_target_var: 0.1
+    min_pred_var: 0.01
+    supported_modality: AUDIO
+    mae_init: false
+    seed: 1
+    skip_ema: false
+    cls_loss: 1.0
+    recon_loss: 0.0
+    d2v_loss: 1.0
+    decoder_group: false
+    adversarial_training: false
+    adversarial_hidden_dim: 128
+    adversarial_weight: 0.1
+    cls_type: chunk
+    normalize: true
+    modalities:
+        audio:
+            type: AUDIO
+            prenet_depth: 4
+            prenet_layerdrop: 0.05
+            prenet_dropout: 0.1
+            start_drop_path_rate: 0.0
+            end_drop_path_rate: 0.0
+            num_extra_tokens: 10
+            init_extra_token_zero: true
+            mask_noise_std: 0.01
+            mask_prob_min: null
+            mask_prob: 0.5
+            inverse_mask: false
+            mask_prob_adjust: 0.05
+            keep_masked_pct: 0.0
+            mask_length: 5
+            add_masks: false
+            remove_masks: false
+            mask_dropout: 0.0
+            encoder_zero_mask: true
+            mask_channel_prob: 0.0
+            mask_channel_length: 64
+            ema_local_encoder: false
+            local_grad_mult: 1.0
+            use_alibi_encoder: true
+            alibi_scale: 1.0
+            learned_alibi: false
+            alibi_max_pos: null
+            learned_alibi_scale: true
+            learned_alibi_scale_per_head: true
+            learned_alibi_scale_per_layer: false
+            num_alibi_heads: 12
+            model_depth: 8
+            decoder:
+                decoder_dim: 384
+                decoder_groups: 16
+                decoder_kernel: 7
+                decoder_layers: 4
+                input_dropout: 0.1
+                add_positions_masked: false
+                add_positions_all: false
+                decoder_residual: true
+                projection_layers: 1
+                projection_ratio: 2.0
+            extractor_mode: layer_norm
+            feature_encoder_spec: '[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] + [(512,2,2)]'
+            conv_pos_width: 95
+            conv_pos_groups: 16
+            conv_pos_depth: 5
+            conv_pos_pre_ln: false

configuration.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "framework": "pytorch",
+  "task" : "emotion-recognition",
+  "pipeline": {"type":"funasr-pipeline"},
+  "model": {"type" : "funasr"},
+  "file_path_metas": {
+    "init_param":"emotion2vec_base.pt",
+    "config":"config.yaml"},
+  "model_name_in_hub": {
+    "ms":"iic/emotion2vec_base",
+    "hf":""}
+}

example/test.wav ADDED Viewed

Binary file (131 kB). View file

logo.png ADDED Viewed

Git LFS Details

SHA256: 8a1aa31431bfb2bf126d7cf383c8b681b2372c333f1328b342bab5969dc0a569
Pointer size: 132 Bytes
Size of remote file: 1.85 MB