Spaces:

pyf98
/

OWSM_v3_demo

Sleeping

App Files Files Community

pyf98 commited on Oct 30, 2024

Commit

ceb8584

1 Parent(s): 39da4d1

update layout

Browse files

Files changed (1) hide show

app.py +8 -12

app.py CHANGED Viewed

@@ -6,13 +6,15 @@ from espnet2.bin.s2t_inference_language import Speech2Language
 from espnet2.bin.s2t_inference import Speech2Text
-TITLE="OWSM: Open Whisper-style Speech Model from CMU WAVLab"
 DESCRIPTION='''
 OWSM (pronounced as "awesome") is a series of Open Whisper-style Speech Models from [CMU WAVLab](https://www.wavlab.org/).
 We reproduce Whisper-style training using publicly available data and an open-source toolkit [ESPnet](https://github.com/espnet/espnet).
-For more details, please check our [website](https://www.wavlab.org/activities/2024/owsm/) or [paper](https://arxiv.org/abs/2309.13876) (Peng et al., ASRU 2023).
 The latest demo uses OWSM v3.1 based on [E-Branchformer](https://arxiv.org/abs/2210.00077).
 OWSM v3.1 has 1.02B parameters and is trained on 180k hours of labelled data. It supports various speech-to-text tasks:
 - Speech recognition in 151 languages
@@ -24,12 +26,9 @@ OWSM v3.1 has 1.02B parameters and is trained on 180k hours of labelled data. It
 As a demo, the input speech should not exceed 2 minutes. We also limit the maximum number of tokens to be generated.
 Please try our [Colab demo](https://colab.research.google.com/drive/1zKI3ZY_OtZd6YmVeED6Cxy1QwT1mqv9O?usp=sharing) if you want to explore more features.
-Disclaimer: OWSM has not been thoroughly evaluated in all tasks. Due to limited training data, it may not perform well for certain languages.
-Please consider citing the following related papers if you find our work helpful.
-<details><summary>citations</summary>
-<p>
 ```
 @inproceedings{peng2024owsm31,
@@ -45,10 +44,6 @@ Please consider citing the following related papers if you find our work helpful
   year={2023}
 }
 ```
-</p>
-</details>
 '''
 if not torch.cuda.is_available():
@@ -168,6 +163,7 @@ demo = gr.Interface(
     ],
     title=TITLE,
     description=DESCRIPTION,
     allow_flagging="never",
 )
@@ -176,5 +172,5 @@ if __name__ == "__main__":
     demo.launch(
         show_api=False,
         share=True,
-        ssr_mode=False,
     )

 from espnet2.bin.s2t_inference import Speech2Text
+TITLE="Open Whisper-style Speech Model from CMU WAVLab"
 DESCRIPTION='''
 OWSM (pronounced as "awesome") is a series of Open Whisper-style Speech Models from [CMU WAVLab](https://www.wavlab.org/).
 We reproduce Whisper-style training using publicly available data and an open-source toolkit [ESPnet](https://github.com/espnet/espnet).
+For more details, please check our [website](https://www.wavlab.org/activities/2024/owsm/).
+'''
+ARTICLE = '''
 The latest demo uses OWSM v3.1 based on [E-Branchformer](https://arxiv.org/abs/2210.00077).
 OWSM v3.1 has 1.02B parameters and is trained on 180k hours of labelled data. It supports various speech-to-text tasks:
 - Speech recognition in 151 languages
 As a demo, the input speech should not exceed 2 minutes. We also limit the maximum number of tokens to be generated.
 Please try our [Colab demo](https://colab.research.google.com/drive/1zKI3ZY_OtZd6YmVeED6Cxy1QwT1mqv9O?usp=sharing) if you want to explore more features.
+**Disclaimer:** OWSM has not been thoroughly evaluated in all tasks. Due to limited training data, it may not perform well for certain languages.
+Please consider citing the following papers if you find our work helpful.
 ```
 @inproceedings{peng2024owsm31,
   year={2023}
 }
 ```
 '''
 if not torch.cuda.is_available():
     ],
     title=TITLE,
     description=DESCRIPTION,
+    article=ARTICLE,
     allow_flagging="never",
 )
     demo.launch(
         show_api=False,
         share=True,
+        ssr_mode=True,
     )