WillHeld
/

DiVA-llama-3.2-1b

Model card Files Files and versions

WillHeld commited on Oct 30, 2024

Commit

e83a546

·

verified ·

1 Parent(s): f754f08

Update README.md

Files changed (1) hide show

README.md +7 -7

README.md CHANGED Viewed

@@ -3,14 +3,14 @@ license: mpl-2.0
 datasets:
 - mozilla-foundation/common_voice_17_0
 base_model:
-- meta-llama/Llama-3.1-8B-Instruct
 ---
 # Model Card for Diva Llama 3
 <!-- Provide a quick summary of what the model is/does. [Optional] -->
 This is an end-to-end Voice Assistant Model which can handle speech and text as inputs. It is trained using distillation loss. More details in the [pre-print](https://arxiv.org/abs/2410.02678) here.
-See the model in action at [diva-audio.github.io](https://diva-audio.github.io) or look at the full training logs on [Weights&Biases](https://wandb.ai/i18nlp/DiVA%20Training%20Runs/runs/gqpwnd99?nw=nwuserheld).
 ## Citation
 **BibTeX:**
@@ -40,7 +40,7 @@ filename = wget.download(
 speech_data, _ = librosa.load(filename, sr=16_000)
-model = AutoModel.from_pretrained("WillHeld/DiVA-llama-3-v0-8b", trust_remote_code=True)
 print(model.generate([speech_data]))
 print(model.generate([speech_data], ["Reply Briefly Like A Pirate"]))
@@ -86,19 +86,19 @@ This model was trained on the [CommonVoice](https://huggingface.co/datasets/mozi
 ### Training Procedure
-This model was trained for 7k gradient steps with a batch size of 512 Recordings and a linearly decaying learning rate from 5e-5 to zero, with a linear warmup of 70 steps.
 ### Environmental Impact
-- **Hardware Type:** V4-256 TPU
-- **Hours used:** 11 Hours
 - **Cloud Provider:** Google Cloud.
 - **Compute Region:** US Central C
 ### Hardware
-This model was trained on at V4-256 TPU on Google Cloud.
 ### Software

 datasets:
 - mozilla-foundation/common_voice_17_0
 base_model:
+- meta-llama/Llama-3.2-1B-Instruct
 ---
 # Model Card for Diva Llama 3
 <!-- Provide a quick summary of what the model is/does. [Optional] -->
 This is an end-to-end Voice Assistant Model which can handle speech and text as inputs. It is trained using distillation loss. More details in the [pre-print](https://arxiv.org/abs/2410.02678) here.
+See the model in action at [diva-audio.github.io](https://diva-audio.github.io) or look at the full training logs on [Weights&Biases](https://wandb.ai/i18nlp/levanter/runs/jnxp463y?nw=nwuserheld).
 ## Citation
 **BibTeX:**
 speech_data, _ = librosa.load(filename, sr=16_000)
+model = AutoModel.from_pretrained("WillHeld/DiVA-llama-3.2-1b", trust_remote_code=True)
 print(model.generate([speech_data]))
 print(model.generate([speech_data], ["Reply Briefly Like A Pirate"]))
 ### Training Procedure
+This model was trained for 4.3k gradient steps with a batch size of 512 Recordings and a linearly decaying learning rate from 5e-4 to zero, with a linear warmup of 70 steps.
 ### Environmental Impact
+- **Hardware Type:** V4-64 TPU
+- **Hours used:** 3 Hours
 - **Cloud Provider:** Google Cloud.
 - **Compute Region:** US Central C
 ### Hardware
+This model was trained on at V4-64 TPU on Google Cloud.
 ### Software