File size: 1,260 Bytes
be22c40
 
 
 
 
 
 
580b8b8
aaccb31
744947d
be22c40
 
b3a3618
 
 
 
93b8da6
a2d19e9
6fdf70c
 
a70a986
6fdf70c
 
69df3f8
741ea80
6fdf70c
250ec05
6fdf70c
250ec05
 
 
 
 
6fdf70c
580b8b8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---
title: README
emoji: πŸ“ˆ
colorFrom: pink
colorTo: red
sdk: streamlit
pinned: false
sdk_version: 1.43.2
thumbnail: >-
  https://cdn-uploads.huggingface.co/production/uploads/629e1b71bb6419817ed7566c/jeUU2sPSuMRP9IIqVnufk.png
---

- GenSEC: Text-based Generative Audio & Speech Recognition with Cascaded ASR-LLMs
  - Task 1: ASR N-best hypotheses correction
  - Task 2: Speaker Tagging from N-best hypotheses
  - Task 3: Emotion Recognition from N-best hypotheses

- Open Source Model

  - Llama-7b pre-training for ASR correction
    - https://huggingface.co/GenSEC-LLM/SLT-Task1-Llama2-7b-HyPo-baseline


- IEEE SLT 2024, References [Paper](https://arxiv.org/abs/2409.09785). See below resources for baseline models and datasets. 

```bib
@inproceedings{yang2024large,
  title={Large language model based generative error correction: A challenge and baselines for speech recognition, speaker tagging, and emotion recognition},
  author={Yang, Chao-Han Huck and Park, Taejin and Gong, Yuan and Li, Yuanchao and Chen, Zhehuai and Lin, Yen-Ting and Chen, Chen and Hu, Yuchen and Dhawan, Kunal and {\.Z}elasko, Piotr and others},
  booktitle={2024 IEEE Spoken Language Technology Workshop (SLT)},
  pages={371--378},
  year={2024},
  organization={IEEE}
}
```