Spaces:

pyf98
/

OWSM_v3_demo

Runtime error

App Files Files Community

pyf98 commited on Dec 27, 2023

Commit

582bb63

1 Parent(s): d49740e

update text

Browse files

Files changed (1) hide show

app.py +32 -5

app.py CHANGED Viewed

@@ -23,19 +23,46 @@ OWSM v3.1 has 1.02B parameters and is trained on 180k hours of paired speech dat
 - Long-form transcription
 - Language identification
 ```
-@article{peng2023owsm,
   title={Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data},
   author={Yifan Peng and Jinchuan Tian and Brian Yan and Dan Berrebbi and Xuankai Chang and Xinjian Li and Jiatong Shi and Siddhant Arora and William Chen and Roshan Sharma and Wangyou Zhang and Yui Sudo and Muhammad Shakeel and Jee-weon Jung and Soumi Maiti and Shinji Watanabe},
-  journal={arXiv preprint arXiv:2309.13876},
   year={2023}
 }
 ```
-As a demo, the input speech should not exceed 2 minutes. We also limit the maximum number of tokens to be generated.
-Please try our [Colab demo](https://colab.research.google.com/drive/1zKI3ZY_OtZd6YmVeED6Cxy1QwT1mqv9O?usp=sharing) if you want to explore more features.
-Disclaimer: OWSM has not been thoroughly evaluated in all tasks. Due to limited training data, it may not perform well for certain language directions.
 '''
 if not torch.cuda.is_available():

 - Long-form transcription
 - Language identification
+As a demo, the input speech should not exceed 2 minutes. We also limit the maximum number of tokens to be generated.
+Please try our [Colab demo](https://colab.research.google.com/drive/1zKI3ZY_OtZd6YmVeED6Cxy1QwT1mqv9O?usp=sharing) if you want to explore more features.
+Disclaimer: OWSM has not been thoroughly evaluated in all tasks. Due to limited training data, it may not perform well for certain language directions.
+Please consider citing the following related papers if you find our work helpful.
+<details><summary>citations</summary>
+<p>
 ```
+@inproceedings{peng2023owsm,
   title={Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data},
   author={Yifan Peng and Jinchuan Tian and Brian Yan and Dan Berrebbi and Xuankai Chang and Xinjian Li and Jiatong Shi and Siddhant Arora and William Chen and Roshan Sharma and Wangyou Zhang and Yui Sudo and Muhammad Shakeel and Jee-weon Jung and Soumi Maiti and Shinji Watanabe},
+  booktitle={Proc. ASRU},
   year={2023}
 }
+@inproceedings{peng23b_interspeech,
+  author={Yifan Peng and Kwangyoun Kim and Felix Wu and Brian Yan and Siddhant Arora and William Chen and Jiyang Tang and Suwon Shon and Prashant Sridhar and Shinji Watanabe},
+  title={{A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks}},
+  year=2023,
+  booktitle={Proc. INTERSPEECH},
+}
+@inproceedings{kim2023branchformer,
+  title={E-branchformer: Branchformer with enhanced merging for speech recognition},
+  author={Kim, Kwangyoun and Wu, Felix and Peng, Yifan and Pan, Jing and Sridhar, Prashant and Han, Kyu J and Watanabe, Shinji},
+  booktitle={2022 IEEE Spoken Language Technology Workshop (SLT)},
+  year={2023},
+}
+@InProceedings{pmlr-v162-peng22a,
+  title = 	 {Branchformer: Parallel {MLP}-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding},
+  author =       {Peng, Yifan and Dalmia, Siddharth and Lane, Ian and Watanabe, Shinji},
+  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
+  year = 	 {2022},
+}
 ```
+</p>
+</details>
 '''
 if not torch.cuda.is_available():