Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -15,18 +15,19 @@ https://arxiv.org/abs/2308.08155
|
|
15 |
Whisper:
|
16 |
https://arxiv.org/abs/2212.04356
|
17 |
|
18 |
-
|
19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
-
Ten Teaching Examples:
|
22 |
-
1.
|
23 |
-
2.
|
24 |
-
3.
|
25 |
-
4.
|
26 |
-
5.
|
27 |
-
6.
|
28 |
-
7.
|
29 |
-
8.
|
30 |
-
9.
|
31 |
-
10.
|
32 |
|
|
|
|
|
|
|
|
15 |
Whisper:
|
16 |
https://arxiv.org/abs/2212.04356
|
17 |
|
18 |
+
# Q & A Using VectorDB FAISS GPT Queries:
|
19 |
+
|
20 |
+
## Eight key features of a robust AI speech recognition pipeline:
|
21 |
+
1. Scaling: The pipeline should be capable of scaling compute, models, and datasets to improve performance. This includes leveraging GPU acceleration and increasing the size of the training dataset.
|
22 |
+
2. Deep Learning Approaches: The pipeline should utilize deep learning approaches, such as deep neural networks, to improve speech recognition performance.
|
23 |
+
3. Weak Supervision: The pipeline should be able to leverage weakly supervised learning to increase the size of the training dataset. This involves using large amounts of transcripts of audio from the internet.
|
24 |
+
4. Zero-shot Transfer Learning: The resulting models from the pipeline should be able to generalize well to standard benchmarks without the need for any fine-tuning in a zero-shot transfer setting.
|
25 |
+
5. Accuracy and Robustness: The models generated by the pipeline should approach the accuracy and robustness of human speech recognition.
|
26 |
+
6. Pre-training Techniques: The pipeline should incorporate unsupervised pre-training techniques, such as Wav2Vec 2.0, which enable learning directly from raw audio without the need for handcrafted features.
|
27 |
+
7. Broad Range of Environments: The goal of the pipeline should be to work reliably "out of the box" in a broad range of environments without requiring supervised fine-tuning for every deployment distribution.
|
28 |
+
8. Combining Multiple Datasets: The pipeline should combine multiple existing high-quality speech recognition datasets to improve robustness and effectiveness of the models.
|
29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
|
31 |
+
|
32 |
+
ChatDev:
|
33 |
+
https://arxiv.org/pdf/2307.07924.pdf
|