homebrewltd
/

llama3-s-2024-07-08

Text Generation

sound language model

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

jan-hq commited on Jul 10, 2024

Commit

3398a60

·

verified ·

1 Parent(s): c8bbc93

Update README.md

Files changed (1) hide show

README.md +72 -1

README.md CHANGED Viewed

@@ -32,7 +32,78 @@ We continue to expand [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-lla
 ## How to Get Started with the Model
-> TODO
 ## Training process
 **Training Metrics Image**: Below is a snapshot of the training loss curve visualized.

 ## How to Get Started with the Model
+```
+import torch
+import torchaudio
+from encodec import EncodecModel
+from encodec.utils import convert_audio
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline
+# Audio to Sound Tokens
+def audio_to_sound_tokens(audio_path, target_bandwidth=1.5, device="cuda"):
+    model = EncodecModel.encodec_model_24khz()
+    model.set_target_bandwidth(target_bandwidth)
+    model.to(device)
+    wav, sr = torchaudio.load(audio_path)
+    wav = convert_audio(wav, sr, model.sample_rate, model.channels)
+    wav = wav.unsqueeze(0).to(device)
+    with torch.no_grad():
+        encoded_frames = model.encode(wav)
+    codes = torch.cat([encoded[0] for encoded in encoded_frames], dim=-1)
+    audio_code1, audio_code2 = codes[0][0], codes[0][1]
+    flatten_tokens = torch.stack((audio_code1, audio_code2), dim=1).flatten().tolist()
+    result = ''.join(f'<|sound_{num}|>' for num in flatten_tokens)
+    return f'<|sound_start|>{result}<|sound_end|>'
+# LLM Pipeline Setup
+def setup_pipeline(model_path, use_4bit=True):
+    tokenizer = AutoTokenizer.from_pretrained(model_path)
+    model_kwargs = {"device_map": "auto"}
+    if use_4bit:
+        model_kwargs["quantization_config"] = BitsAndBytesConfig(
+            load_in_4bit=True,
+            bnb_4bit_compute_dtype=torch.bfloat16,
+            bnb_4bit_use_double_quant=True,
+            bnb_4bit_quant_type="nf4",
+        )
+    model = AutoModelForCausalLM.from_pretrained(model_path, **model_kwargs)
+    return pipeline("text-generation", model=model, tokenizer=tokenizer)
+# Text Generation
+def generate_text(pipe, messages, max_new_tokens=64, temperature=0.0, do_sample=False):
+    generation_args = {
+        "max_new_tokens": max_new_tokens,
+        "return_full_text": False,
+        "temperature": temperature,
+        "do_sample": do_sample,
+    }
+    output = pipe(messages, **generation_args)
+    return output[0]['generated_text']
+# Main process
+def audio_to_text(audio_path, model_path, use_4bit=True):
+    # Convert audio to sound tokens
+    sound_tokens = audio_to_sound_tokens(audio_path)
+    # Setup LLM pipeline
+    pipe = setup_pipeline(model_path, use_4bit)
+    # Generate text
+    messages = [{"role": "user", "content": sound_tokens}]
+    return generate_text(pipe, messages)
+# Usage example
+audio_path = "/path/to/your/audio/file"
+model_path = "jan-hq/Jan-Llama3-0708"
+generated_text = audio_to_text(audio_path, model_path)
+```
 ## Training process
 **Training Metrics Image**: Below is a snapshot of the training loss curve visualized.