Update README.md
Browse files
README.md
CHANGED
@@ -3,14 +3,14 @@ license: mpl-2.0
|
|
3 |
datasets:
|
4 |
- mozilla-foundation/common_voice_17_0
|
5 |
base_model:
|
6 |
-
- meta-llama/Llama-3.
|
7 |
---
|
8 |
# Model Card for Diva Llama 3
|
9 |
|
10 |
<!-- Provide a quick summary of what the model is/does. [Optional] -->
|
11 |
This is an end-to-end Voice Assistant Model which can handle speech and text as inputs. It is trained using distillation loss. More details in the [pre-print](https://arxiv.org/abs/2410.02678) here.
|
12 |
|
13 |
-
See the model in action at [diva-audio.github.io](https://diva-audio.github.io) or look at the full training logs on [Weights&Biases](https://wandb.ai/i18nlp/
|
14 |
|
15 |
## Citation
|
16 |
**BibTeX:**
|
@@ -40,7 +40,7 @@ filename = wget.download(
|
|
40 |
|
41 |
speech_data, _ = librosa.load(filename, sr=16_000)
|
42 |
|
43 |
-
model = AutoModel.from_pretrained("WillHeld/DiVA-llama-3-
|
44 |
|
45 |
print(model.generate([speech_data]))
|
46 |
print(model.generate([speech_data], ["Reply Briefly Like A Pirate"]))
|
@@ -86,19 +86,19 @@ This model was trained on the [CommonVoice](https://huggingface.co/datasets/mozi
|
|
86 |
|
87 |
### Training Procedure
|
88 |
|
89 |
-
This model was trained for
|
90 |
|
91 |
### Environmental Impact
|
92 |
|
93 |
-
- **Hardware Type:** V4-
|
94 |
-
- **Hours used:**
|
95 |
- **Cloud Provider:** Google Cloud.
|
96 |
- **Compute Region:** US Central C
|
97 |
|
98 |
|
99 |
### Hardware
|
100 |
|
101 |
-
This model was trained on at V4-
|
102 |
|
103 |
### Software
|
104 |
|
|
|
3 |
datasets:
|
4 |
- mozilla-foundation/common_voice_17_0
|
5 |
base_model:
|
6 |
+
- meta-llama/Llama-3.2-1B-Instruct
|
7 |
---
|
8 |
# Model Card for Diva Llama 3
|
9 |
|
10 |
<!-- Provide a quick summary of what the model is/does. [Optional] -->
|
11 |
This is an end-to-end Voice Assistant Model which can handle speech and text as inputs. It is trained using distillation loss. More details in the [pre-print](https://arxiv.org/abs/2410.02678) here.
|
12 |
|
13 |
+
See the model in action at [diva-audio.github.io](https://diva-audio.github.io) or look at the full training logs on [Weights&Biases](https://wandb.ai/i18nlp/levanter/runs/jnxp463y?nw=nwuserheld).
|
14 |
|
15 |
## Citation
|
16 |
**BibTeX:**
|
|
|
40 |
|
41 |
speech_data, _ = librosa.load(filename, sr=16_000)
|
42 |
|
43 |
+
model = AutoModel.from_pretrained("WillHeld/DiVA-llama-3.2-1b", trust_remote_code=True)
|
44 |
|
45 |
print(model.generate([speech_data]))
|
46 |
print(model.generate([speech_data], ["Reply Briefly Like A Pirate"]))
|
|
|
86 |
|
87 |
### Training Procedure
|
88 |
|
89 |
+
This model was trained for 4.3k gradient steps with a batch size of 512 Recordings and a linearly decaying learning rate from 5e-4 to zero, with a linear warmup of 70 steps.
|
90 |
|
91 |
### Environmental Impact
|
92 |
|
93 |
+
- **Hardware Type:** V4-64 TPU
|
94 |
+
- **Hours used:** 3 Hours
|
95 |
- **Cloud Provider:** Google Cloud.
|
96 |
- **Compute Region:** US Central C
|
97 |
|
98 |
|
99 |
### Hardware
|
100 |
|
101 |
+
This model was trained on at V4-64 TPU on Google Cloud.
|
102 |
|
103 |
### Software
|
104 |
|