Safetensors
custom_code
WillHeld commited on
Commit
e83a546
·
verified ·
1 Parent(s): f754f08

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -3,14 +3,14 @@ license: mpl-2.0
3
  datasets:
4
  - mozilla-foundation/common_voice_17_0
5
  base_model:
6
- - meta-llama/Llama-3.1-8B-Instruct
7
  ---
8
  # Model Card for Diva Llama 3
9
 
10
  <!-- Provide a quick summary of what the model is/does. [Optional] -->
11
  This is an end-to-end Voice Assistant Model which can handle speech and text as inputs. It is trained using distillation loss. More details in the [pre-print](https://arxiv.org/abs/2410.02678) here.
12
 
13
- See the model in action at [diva-audio.github.io](https://diva-audio.github.io) or look at the full training logs on [Weights&Biases](https://wandb.ai/i18nlp/DiVA%20Training%20Runs/runs/gqpwnd99?nw=nwuserheld).
14
 
15
  ## Citation
16
  **BibTeX:**
@@ -40,7 +40,7 @@ filename = wget.download(
40
 
41
  speech_data, _ = librosa.load(filename, sr=16_000)
42
 
43
- model = AutoModel.from_pretrained("WillHeld/DiVA-llama-3-v0-8b", trust_remote_code=True)
44
 
45
  print(model.generate([speech_data]))
46
  print(model.generate([speech_data], ["Reply Briefly Like A Pirate"]))
@@ -86,19 +86,19 @@ This model was trained on the [CommonVoice](https://huggingface.co/datasets/mozi
86
 
87
  ### Training Procedure
88
 
89
- This model was trained for 7k gradient steps with a batch size of 512 Recordings and a linearly decaying learning rate from 5e-5 to zero, with a linear warmup of 70 steps.
90
 
91
  ### Environmental Impact
92
 
93
- - **Hardware Type:** V4-256 TPU
94
- - **Hours used:** 11 Hours
95
  - **Cloud Provider:** Google Cloud.
96
  - **Compute Region:** US Central C
97
 
98
 
99
  ### Hardware
100
 
101
- This model was trained on at V4-256 TPU on Google Cloud.
102
 
103
  ### Software
104
 
 
3
  datasets:
4
  - mozilla-foundation/common_voice_17_0
5
  base_model:
6
+ - meta-llama/Llama-3.2-1B-Instruct
7
  ---
8
  # Model Card for Diva Llama 3
9
 
10
  <!-- Provide a quick summary of what the model is/does. [Optional] -->
11
  This is an end-to-end Voice Assistant Model which can handle speech and text as inputs. It is trained using distillation loss. More details in the [pre-print](https://arxiv.org/abs/2410.02678) here.
12
 
13
+ See the model in action at [diva-audio.github.io](https://diva-audio.github.io) or look at the full training logs on [Weights&Biases](https://wandb.ai/i18nlp/levanter/runs/jnxp463y?nw=nwuserheld).
14
 
15
  ## Citation
16
  **BibTeX:**
 
40
 
41
  speech_data, _ = librosa.load(filename, sr=16_000)
42
 
43
+ model = AutoModel.from_pretrained("WillHeld/DiVA-llama-3.2-1b", trust_remote_code=True)
44
 
45
  print(model.generate([speech_data]))
46
  print(model.generate([speech_data], ["Reply Briefly Like A Pirate"]))
 
86
 
87
  ### Training Procedure
88
 
89
+ This model was trained for 4.3k gradient steps with a batch size of 512 Recordings and a linearly decaying learning rate from 5e-4 to zero, with a linear warmup of 70 steps.
90
 
91
  ### Environmental Impact
92
 
93
+ - **Hardware Type:** V4-64 TPU
94
+ - **Hours used:** 3 Hours
95
  - **Cloud Provider:** Google Cloud.
96
  - **Compute Region:** US Central C
97
 
98
 
99
  ### Hardware
100
 
101
+ This model was trained on at V4-64 TPU on Google Cloud.
102
 
103
  ### Software
104