homebrewltd
/

llama3.1-s-instruct-2024-08-15-cp2000

sound language model

Model card Files Files and versions Community

llama3.1-s-instruct-2024-08-15-cp2000 / README.md

jan-hq's picture

Create README.md

3673df3 verified 4 months ago

|

history blame contribute delete

2.82 kB

	---
	datasets:
	- homebrewltd/instruction-speech-whispervq-v2
	language:
	- en
	license: apache-2.0
	tags:
	- sound language model
	---
	## Caution

	This is an intermediate checkpoint.

	## Model Details

	We have developed and released the family [llama3s](https://huggingface.co/collections/homebrew-research/llama3-s-669df2139f0576abc6eb7405). This family is natively understanding audio and text input.

	We continue to supervised finetune our last checkpoint using WhisperVQ as a tokenizer for audio files [homebrewltd/...](...) with 2B tokens from [Instruction Speech WhisperVQ v2](https://huggingface.co/datasets/homebrewltd/instruction-speech-whispervq-v2) dataset.

	Model developers Homebrew Research.

	Input Text and sound.

	Output Text.

	Model Architecture Llama-3.

	Language(s): English.

	## Intended Use

	Intended Use Cases This family is primarily intended for research applications. This version aims to further improve the LLM on sound understanding capabilities.

	Out-of-scope The use of llama3-s in any manner that violates applicable laws or regulations is strictly prohibited.

	## How to Get Started with the Model

	First, we need to convert the audio file to sound tokens

	```python

	```

	Then, we can inference the model the same as any other LLM.

	```python

	```

	## Training process
	Training Metrics Image: Below is a snapshot of the training loss curve visualized.

	![training_loss](https://cdn-uploads.huggingface.co/production/uploads/65713d70f56f9538679e5a56/Mo_FGQvhkcHl3y1REf76f.png)

	### Hardware

	GPU Configuration: Cluster of 8x NVIDIA H100-SXM-80GB.
	GPU Usage:
	- Continual Training: 6 hours.

	### Training Arguments

	We utilize [torchtune](https://github.com/pytorch/torchtune) library for the latest FSDP2 training code implementation.

	\| Parameter \| Continual Training \|
	\|----------------------------\|-------------------------\|
	\| Epoch \| 1 \|
	\| Global batch size \| 128 \|
	\| Learning Rate \| 0.5e-4 \|
	\| Learning Scheduler \| Cosine with warmup \|
	\| Optimizer \| Adam torch fused \|
	\| Warmup Ratio \| 0.01 \|
	\| Weight Decay \| 0.005 \|
	\| Max Sequence Length \| 1024 \|


	## Citation Information

	BibTeX:

	```
	@article{Llama3-S: Sound Instruction Language Model 2024,
	title={Llama3-S},
	author={Homebrew Research},
	year=2024,
	month=August},
	url={https://huggingface.co/homebrewltd/llama3.1-s-2024-08-15}
	```

	## Acknowledgement

	- [WhisperSpeech](https://github.com/collabora/WhisperSpeech)

	- [Meta-Llama-3.1-8B-Instruct ](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)