Spaces:

BUT-FIT
/

DeCRED-ASR

Sleeping

App Files Files Community

Lakoc commited on Oct 23, 2024

Commit

643ae12

1 Parent(s): 8529a0b

Added more info about this space.

Browse files

Files changed (1) hide show

app.py +32 -10

app.py CHANGED Viewed

@@ -153,7 +153,9 @@ yt_transcribe = gr.Interface(
     outputs=["html", "text"],
     title="Transcribe YouTube",
     description=(
-        "Transcribe long-form YouTube videos with the click of a button! Select a model from the dropdown."
     ),
     allow_flagging="never",
 )
@@ -161,21 +163,41 @@ yt_transcribe = gr.Interface(
 with demo:
     gr.TabbedInterface([mf_transcribe, file_transcribe, yt_transcribe], ["Microphone", "Audio file", "YouTube"])
-    gr.Markdown(
-        "Disclaimer: This space currently runs on basic CPU hardware, so generation might take a bit longer. "
-        "You can clone the repository and run it locally for better performance. "
-        "Please refer to the [Hugging Face documentation](https://huggingface.co/docs/hub/spaces-overview#clone-the-repository) "
-        "on how to clone the repository and run it locally. "
-        "The model is not perfect and may make errors, so please use responsibly."
-    )
     gr.Markdown(
         """
-        ### Explore the Models:
         - [DeCRED Base](https://huggingface.co/BUT-FIT/DeCRED-base)
         - [DeCRED Small](https://huggingface.co/BUT-FIT/DeCRED-small)
         - [ED Base](https://huggingface.co/BUT-FIT/ED-base)
         - [ED Small](https://huggingface.co/BUT-FIT/ED-small)
         """
     )

     outputs=["html", "text"],
     title="Transcribe YouTube",
     description=(
+        """
+        ### *Currently only works on local instances of this space, as youtube-dl does not function from Hugging Face servers.*
+        Transcribe long-form YouTube videos with the click of a button! Select a model from the dropdown."""
     ),
     allow_flagging="never",
 )
 with demo:
     gr.TabbedInterface([mf_transcribe, file_transcribe, yt_transcribe], ["Microphone", "Audio file", "YouTube"])
     gr.Markdown(
         """
+        ## Overview
+        This space demonstrates the performance of **DeCRED** (**De**coder-**C**entric **R**egularization in **E**ncoder-**D**ecoder) for automatic speech recognition (ASR).
+        DeCRED enhances model robustness and generalization, particularly in out-of-domain scenarios, by introducing auxiliary classifiers in the decoder layers of encoder-decoder ASR architectures.
+        ## Key Features
+        - **Auxiliary Classifiers**: DeCRED integrates auxiliary classifiers in the decoder module to regularize training, improving the model’s ability to generalize across domains.
+        - **Enhanced Decoding**: It proposes two new decoding strategies that leverage auxiliary classifiers to re-estimate token probabilities, resulting in more accurate ASR predictions.
+        - **Strong Baseline**: Built on the **E-branchformer** architecture, DeCRED achieves competitive word error rates (WER) compared to Whisper-medium and OWSM v3 while requiring significantly less training data and a smaller model size.
+        - **Out-of-Domain Performance**: DeCRED demonstrates strong generalization, reducing WERs by 2.7 and 2.9 points on the AMI and Gigaspeech datasets, respectively.
+        ## Disclaimer
+        This space currently runs on basic CPU hardware, so generation might take a bit longer (approximately four times the length of the audio).
+        You can clone the repository and run it locally for better performance.
+        Please refer to the [Hugging Face documentation](https://huggingface.co/docs/hub/spaces-overview#clone-the-repository)
+        for instructions on how to clone the repository and run it locally.
+        The model is not perfect and may make errors, so please use it responsibly.
+        ## Explore the Models
         - [DeCRED Base](https://huggingface.co/BUT-FIT/DeCRED-base)
         - [DeCRED Small](https://huggingface.co/BUT-FIT/DeCRED-small)
         - [ED Base](https://huggingface.co/BUT-FIT/ED-base)
         - [ED Small](https://huggingface.co/BUT-FIT/ED-small)
+        ## Citation
+        If you use DeCRED in your research, please cite the following paper:
+        ```bibtex
+        @misc{polok_2024_decred,
+          title={Improving Automatic Speech Recognition with Decoder-Centric Regularization in Encoder-Decoder Models},
+          author={Alexander Polok, Santosh Kesiraju, Karel Beneš, Lukáš Burget, Jan Černocký},
+          year={2024},
+        }
+        ```
         """
     )