Audio-LLaMA: LoRA Adapter for Audio Understanding

Model Details

  • Base Model: meta-llama/Llama-3.2-3B-Instruct
  • Audio Model: openai/whisper-large-v3-turbo
  • LoRA Rank: 32
  • Task: Audio transcription from LibriSpeech dataset
  • Training Framework: PEFT (Parameter-Efficient Fine-Tuning)

Usage

This is a PEFT (LoRA) adapter that needs to be combined with the base Llama model to work:

import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the LoRA configuration
config = PeftConfig.from_pretrained("cdreetz/audio-llama")

# Load the base model
model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load the LoRA adapter
model = PeftModel.from_pretrained(model, "cdreetz/audio-llama")

# Run inference
prompt = "Transcribe this audio:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training

This model was fine-tuned using LoRA on audio transcription tasks. It starts with a Llama 3 base model and uses Whisper-processed audio features for audio understanding.

Limitations

This model requires special code for audio processing with Whisper before passing to the Llama model. See the Audio-LLaMA repository for full usage instructions.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train cdreetz/audio-llama