PyTorch
English
llama
sound language model
jan-hq commited on
Commit
3673df3
1 Parent(s): 51a7134

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -0
README.md ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - homebrewltd/instruction-speech-whispervq-v2
4
+ language:
5
+ - en
6
+ license: apache-2.0
7
+ tags:
8
+ - sound language model
9
+ ---
10
+ ## Caution
11
+
12
+ This is an intermediate checkpoint.
13
+
14
+ ## Model Details
15
+
16
+ We have developed and released the family [llama3s](https://huggingface.co/collections/homebrew-research/llama3-s-669df2139f0576abc6eb7405). This family is natively understanding audio and text input.
17
+
18
+ We continue to supervised finetune our last checkpoint using WhisperVQ as a tokenizer for audio files [homebrewltd/...](...) with 2B tokens from [Instruction Speech WhisperVQ v2](https://huggingface.co/datasets/homebrewltd/instruction-speech-whispervq-v2) dataset.
19
+
20
+ **Model developers** Homebrew Research.
21
+
22
+ **Input** Text and sound.
23
+
24
+ **Output** Text.
25
+
26
+ **Model Architecture** Llama-3.
27
+
28
+ **Language(s):** English.
29
+
30
+ ## Intended Use
31
+
32
+ **Intended Use Cases** This family is primarily intended for research applications. This version aims to further improve the LLM on sound understanding capabilities.
33
+
34
+ **Out-of-scope** The use of llama3-s in any manner that violates applicable laws or regulations is strictly prohibited.
35
+
36
+ ## How to Get Started with the Model
37
+
38
+ First, we need to convert the audio file to sound tokens
39
+
40
+ ```python
41
+
42
+ ```
43
+
44
+ Then, we can inference the model the same as any other LLM.
45
+
46
+ ```python
47
+
48
+ ```
49
+
50
+ ## Training process
51
+ **Training Metrics Image**: Below is a snapshot of the training loss curve visualized.
52
+
53
+ ![training_loss](https://cdn-uploads.huggingface.co/production/uploads/65713d70f56f9538679e5a56/Mo_FGQvhkcHl3y1REf76f.png)
54
+
55
+ ### Hardware
56
+
57
+ **GPU Configuration**: Cluster of 8x NVIDIA H100-SXM-80GB.
58
+ **GPU Usage**:
59
+ - **Continual Training**: 6 hours.
60
+
61
+ ### Training Arguments
62
+
63
+ We utilize [torchtune](https://github.com/pytorch/torchtune) library for the latest FSDP2 training code implementation.
64
+
65
+ | Parameter | Continual Training |
66
+ |----------------------------|-------------------------|
67
+ | **Epoch** | 1 |
68
+ | **Global batch size** | 128 |
69
+ | **Learning Rate** | 0.5e-4 |
70
+ | **Learning Scheduler** | Cosine with warmup |
71
+ | **Optimizer** | Adam torch fused |
72
+ | **Warmup Ratio** | 0.01 |
73
+ | **Weight Decay** | 0.005 |
74
+ | **Max Sequence Length** | 1024 |
75
+
76
+
77
+ ## Citation Information
78
+
79
+ **BibTeX:**
80
+
81
+ ```
82
+ @article{Llama3-S: Sound Instruction Language Model 2024,
83
+ title={Llama3-S},
84
+ author={Homebrew Research},
85
+ year=2024,
86
+ month=August},
87
+ url={https://huggingface.co/homebrewltd/llama3.1-s-2024-08-15}
88
+ ```
89
+
90
+ ## Acknowledgement
91
+
92
+ - **[WhisperSpeech](https://github.com/collabora/WhisperSpeech)**
93
+
94
+ - **[Meta-Llama-3.1-8B-Instruct ](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)**