meetween
/

Llama-speechlmm-1.0-xl

Transformers

Safetensors

llava

Generated from Trainer

Model card Files Files and versions Community

stp99 commited on 24 days ago

Commit

8012980

verified ·

1 Parent(s): 469e9fd

Update README.md

Browse files

Files changed (1) hide show

README.md +26 -26

README.md CHANGED Viewed

@@ -80,38 +80,38 @@ Important: before you can use this model, you must follow these steps:
 3. Download the Auto-AVSR video encoder weights from [here](https://drive.google.com/file/d/1shcWXUK2iauRhW9NbwCc25FjU1CoMm8i/view?usp=sharing) and put them in `path/to/some_directory_2`
 4. Go to `config.json` and change the `video_encoder._name_or_path` to `path/to/some_directory_2/vsr_trlrs3vox2_base.pth`
-## Training data
 ### Monolingual
-| TASK     | Task name                    | Dataset            | License         | Metric(s)                 |
-| -------- | ---------------------------- | ------------------ | --------------- | ------------------------- |
-| **ASR**  | Automatic Speech Recognition | **LibriHeavy**     | CC-BY-4.0       | WER                       |
-|          |                              | **CommonVoice**    | Apache-2.0      |                           |
-|          |                              | **LibriTTS**       | CC BY 4.0       |                           |
-|          |                              | **Spoken SQUAD**   | CC-BY-SA-4.0    |                           |
-|          |                              | **Speech-Massive** | CC-BY-NC-SA-4.0 |                           |
-| **VSR**  | Visual Speech Recognition    | **LRS2-BBC**       | Custom          | WER                   |
-| **SSUM** | Speech Summarization         | **AMI**            | CC-BY-4.0       | Rouge-1, Rouge-2, Rouge-L |
-|          |                              | **ICSI**           | CC-BY-4.0       |                           |
-| **SQA**  | Spoken Question Answering    | **Spoken SQUAD**   | CC-BY-SA-4.0    | Accuracy, Exact Match, F1 |
 ### Multilingual
-| TASK             | Task name                     | Dataset                              | License                                    | Metric(s)           |
-| ---------------- | ----------------------------- | ------------------------------------ | ------------------------------------------ | ------------------- |
-| **ST**           | Speech-to-text Translation    | **CoVoST2**                          | CC0                                        | BLEU, COMET, BLEURT |
-|                  |                               | **FLEURS**                           | CC-BY-4.0                                  |                     |
-|                  |                               | **EuroParl-ST**                      | CC-BY-NC-4.0                               |                     |
-|                  |                               | **ACL 60/60**                        | CC-BY-4.0                                  |                     |
-| **MT**           | Machine Translation           | **FLORES**                           | CC-BY-SA-4.0                               | BLEU, COMET, BLEURT |
-|                  |                               | **ACL 60/60**                        | CC-BY-4.0                                  |                     |
-|                  |                               | **EuroParl-ST**                      | CC-BY-NC-4.0                               |                     |
-| **TextInstruct** | Text Instruction Following    | **Everything_Instruct_Multilingual** | Apache-2.0                                 | MMLU                |
-| **SLU**          | Spoken Language Understanding | **Speech-Massive**                   | CC-BY-NC-SA-4.0                            | Intent Accuracy     |
-|                  |                               | **SLURP**                            | CC BY 4.0 (text) <br> CC BY-NC 4.0 (audio) |                     |
-## Evaluation data
 coming soon...
 ## Framework versions

 3. Download the Auto-AVSR video encoder weights from [here](https://drive.google.com/file/d/1shcWXUK2iauRhW9NbwCc25FjU1CoMm8i/view?usp=sharing) and put them in `path/to/some_directory_2`
 4. Go to `config.json` and change the `video_encoder._name_or_path` to `path/to/some_directory_2/vsr_trlrs3vox2_base.pth`
+## Training Data
 ### Monolingual
+| TASK     | Task name                    | Dataset            | License         |
+| -------- | ---------------------------- | ------------------ | --------------- |
+| **ASR**  | Automatic Speech Recognition | **LibriHeavy**     | CC-BY-4.0       |
+|          |                              | **CommonVoice**    | Apache-2.0      |
+|          |                              | **LibriTTS**       | CC BY 4.0       |
+|          |                              | **Spoken SQUAD**   | CC-BY-SA-4.0    |
+|          |                              | **Speech-Massive** | CC-BY-NC-SA-4.0 |
+| **VSR**  | Visual Speech Recognition    | **LRS2-BBC**       | Custom          |
+| **SSUM** | Speech Summarization         | **AMI**            | CC-BY-4.0       |
+|          |                              | **ICSI**           | CC-BY-4.0       |
+| **SQA**  | Spoken Question Answering    | **Spoken SQUAD**   | CC-BY-SA-4.0    |
 ### Multilingual
+| TASK             | Task name                     | Dataset                              | License                                    |
+| ---------------- | ----------------------------- | ------------------------------------ | ------------------------------------------ |
+| **ST**           | Speech-to-text Translation    | **CoVoST2**                          | CC0                                        |
+|                  |                               | **FLEURS**                           | CC-BY-4.0                                  |
+|                  |                               | **EuroParl-ST**                      | CC-BY-NC-4.0                               |
+|                  |                               | **ACL 60/60**                        | CC-BY-4.0                                  |
+| **MT**           | Machine Translation           | **FLORES**                           | CC-BY-SA-4.0                               |
+|                  |                               | **ACL 60/60**                        | CC-BY-4.0                                  |
+|                  |                               | **EuroParl-ST**                      | CC-BY-NC-4.0                               |
+| **TextInstruct** | Text Instruction Following    | **Everything_Instruct_Multilingual** | Apache-2.0                                 |
+| **SLU**          | Spoken Language Understanding | **Speech-Massive**                   | CC-BY-NC-SA-4.0                            |
+|                  |                               | **SLURP**                            | CC BY 4.0 (text) <br> CC BY-NC 4.0 (audio) |
+## Evaluation Results
 coming soon...
 ## Framework versions