homebrewltd/llama3-s-base-v0.2

Model Details

We have developed and released the family llama3s. This family is natively understanding audio and text input.

We continual pretrain on the expanded vocabulary homebrewltd/llama3.1-s-whispervq-init with 900M tokens from homebrewltd/raw-speech-whispervq-v1 dataset.

Model developers Homebrew Research.

Input Text and sound.

Output Text.

Model Architecture Llama-3.

Language(s): English.

Intended Use

Intended Use Cases This family is primarily intended for research applications. This version aims to further improve the LLM on sound understanding capabilities.

Out-of-scope The use of llama3-s in any manner that violates applicable laws or regulations is strictly prohibited.

Training process

Training Metrics Image: Below is a snapshot of the training loss curve visualized.

Hardware

GPU Configuration: Cluster of 10x NVIDIA A6000-48GB.

GPU Usage:

Continual Training: 30 hours.

Training Arguments

We utilize torchtune library for the latest FSDP2 training code implementation.

Parameter	Continual Training
Epoch	1
Global batch size	480
Learning Rate	2e-4
Learning Scheduler	Cosine with warmup
Optimizer	AdamW fused
Warmup Steps	50
Weight Decay	0.01
Max Sequence Length	512

Citation Information

BibTeX:

@article{Llama3-S: Sound Instruction Language Model 2024,
  title={Llama3-S},
  author={Homebrew Research},
  year=2024,
  month=August},
  url={https://huggingface.co/homebrewltd/llama3.1-s-2024-08-15}

Acknowledgement

WhisperSpeech
Meta-Llama-3.1-8B-Instruct

homebrewltd
/

llama3-s-base-v0.2