Binarybardakshat commited on
Commit
80f132f
·
verified ·
1 Parent(s): 0289adf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -35
README.md CHANGED
@@ -49,16 +49,12 @@ model-index:
49
 
50
  # SWRA (SWARA)
51
 
52
- `SWRA (SWARA)` is a Speech to Text Transformer (S2T) model trained by @binarybardakshat for automatic speech recognition (ASR). The S2T model was proposed in [this paper](https://arxiv.org/abs/2010.05171) and released in [this repository](https://github.com/pytorch/fairseq/tree/master/examples/speech_to_text).
53
 
54
  ## Model Description
55
 
56
  SWRA (SWARA) is an end-to-end sequence-to-sequence transformer model. It is trained with standard autoregressive cross-entropy loss and generates the transcripts autoregressively.
57
 
58
- ## Intended Uses & Limitations
59
-
60
- This model can be used for end-to-end speech recognition (ASR). See the [model hub](https://huggingface.co/models?filter=speech_to_text) to look for other S2T checkpoints.
61
-
62
  ### How to Use
63
 
64
  As this is a standard sequence-to-sequence transformer model, you can use the `generate` method to generate the transcripts by passing the speech features to the model.
@@ -134,33 +130,3 @@ print("WER:", wer.compute(predictions=result["transcription"], references=result
134
 
135
  The S2T-SMALL-LIBRISPEECH-ASR is trained on [LibriSpeech ASR Corpus](https://www.openslr.org/12), a dataset consisting of
136
  approximately 1000 hours of 16kHz read English speech.
137
-
138
-
139
- ## Training procedure
140
-
141
- ### Preprocessing
142
-
143
- The speech data is pre-processed by extracting Kaldi-compliant 80-channel log mel-filter bank features automatically from
144
- WAV/FLAC audio files via PyKaldi or torchaudio. Further utterance-level CMVN (cepstral mean and variance normalization)
145
- is applied to each example.
146
-
147
- The texts are lowercased and tokenized using SentencePiece and a vocabulary size of 10,000.
148
-
149
-
150
- ### Training
151
-
152
- The model is trained with standard autoregressive cross-entropy loss and using [SpecAugment](https://arxiv.org/abs/1904.08779).
153
- The encoder receives speech features, and the decoder generates the transcripts autoregressively.
154
-
155
-
156
- ### BibTeX entry and citation info
157
-
158
- ```bibtex
159
- @inproceedings{wang2020fairseqs2t,
160
- title = {fairseq S2T: Fast Speech-to-Text Modeling with fairseq},
161
- author = {Changhan Wang and Yun Tang and Xutai Ma and Anne Wu and Dmytro Okhonko and Juan Pino},
162
- booktitle = {Proceedings of the 2020 Conference of the Asian Chapter of the Association for Computational Linguistics (AACL): System Demonstrations},
163
- year = {2020},
164
- }
165
-
166
- ```
 
49
 
50
  # SWRA (SWARA)
51
 
52
+ `SWRA (SWARA)` is a Speech to Text Transformer (S2T) model trained by @binarybardakshat for automatic speech recognition (ASR).
53
 
54
  ## Model Description
55
 
56
  SWRA (SWARA) is an end-to-end sequence-to-sequence transformer model. It is trained with standard autoregressive cross-entropy loss and generates the transcripts autoregressively.
57
 
 
 
 
 
58
  ### How to Use
59
 
60
  As this is a standard sequence-to-sequence transformer model, you can use the `generate` method to generate the transcripts by passing the speech features to the model.
 
130
 
131
  The S2T-SMALL-LIBRISPEECH-ASR is trained on [LibriSpeech ASR Corpus](https://www.openslr.org/12), a dataset consisting of
132
  approximately 1000 hours of 16kHz read English speech.