nectec
/

Pathumma-llm-audio-1.0.0

@@ -63,12 +63,13 @@ with torch.no_grad():
 print(response[0])
 ```
 ## Evaluation Performance
-| Model                        |  ASR-th CV18 Th (WER↓)   | ASR-en CV18 En (WER↓)    |   ASR-en Librispeech En (WER↓) | ThaiSER Emotion (Acc↑, F1↑)|  ThaiSER Gender (Acc↑, F1↑)  |
 |:----------------------------:|:------------------------:|:------------------------:|:------------------------------:|:------------------:|:--------------------:|
 | Typhoon-Audio-Preview        | 13.26                    | 13.34 (partial result)   | 5.07 (partial result)          |    41.50, 33.48    |       96.20, 96.69   |
 | DIVA                         | 69.15 (partial result)   | 37.40                    | 49.06                          |    18.64, 8.16     |       47.50, 35.90   |
 | Gemini-1.5-Pro               | 16.49                    | 12.94                    | 25.83                          |    26.00, 18.26    |       79.66, 77.32   |
-| Pathumma-llm-audio-1.0.0     | 12.03                    | 12.20                    | 11.36                          |    42.30, 36.88    |       90.30, 92.07   |
 ## Limitations and Future Work
 At present, our model remains in the experimental research phase and is not yet fully suitable for practical applications as an assistant. Future work will focus on upgrading the language model to a newer version [Pathumma-llm-text-1.0.0](https://huggingface.co/nectec/Pathumma-llm-text-1.0.0), and curating more refined and robust datasets to improve performance. Additionally, we aim to address and prioritize the safety and reliability of the model's outputs.

 print(response[0])
 ```
 ## Evaluation Performance
+Additional information is needed
+<!-- | Model                        |  ASR-th CV18 Th (WER↓)   | ASR-en CV18 En (WER↓)    |   ASR-en Librispeech En (WER↓) | ThaiSER Emotion (Acc↑, F1↑)|  ThaiSER Gender (Acc↑, F1↑)  |
 |:----------------------------:|:------------------------:|:------------------------:|:------------------------------:|:------------------:|:--------------------:|
 | Typhoon-Audio-Preview        | 13.26                    | 13.34 (partial result)   | 5.07 (partial result)          |    41.50, 33.48    |       96.20, 96.69   |
 | DIVA                         | 69.15 (partial result)   | 37.40                    | 49.06                          |    18.64, 8.16     |       47.50, 35.90   |
 | Gemini-1.5-Pro               | 16.49                    | 12.94                    | 25.83                          |    26.00, 18.26    |       79.66, 77.32   |
+| Pathumma-llm-audio-1.0.0     | 12.03                    | 12.20                    | 11.36                          |    42.30, 36.88    |       90.30, 92.07   | -->
 ## Limitations and Future Work
 At present, our model remains in the experimental research phase and is not yet fully suitable for practical applications as an assistant. Future work will focus on upgrading the language model to a newer version [Pathumma-llm-text-1.0.0](https://huggingface.co/nectec/Pathumma-llm-text-1.0.0), and curating more refined and robust datasets to improve performance. Additionally, we aim to address and prioritize the safety and reliability of the model's outputs.