Uploaded model

  • Developed by: EpistemeAI
  • License: apache-2.0
  • Finetuned from model : EpistemeAI/Fireball-Llama-3.1-8B-Instruct-v1-16bit

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

Fireball-Llama-3.1-V1-Instruct

How to use

This repository contains Fireball-Llama-3.11-V1-Instruct , for use with transformers and with the original llama codebase.

Use with transformers

Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.

Make sure to update your transformers installation via pip install --upgrade transformers. Example:

!pip install -U transformers trl peft accelerate bitsandbytes
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
)

base_model = "EpistemeAI/Fireball-Llama-3.1-8B-Instruct-v1dpo"
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(base_model)

sys = "You are help assistant " \
    "(Advanced Natural-based interaction for the language)."

messages = [
    {"role": "system", "content": sys},
    {"role": "user", "content": "What is DPO and ORPO fine tune?"},
]

#Method 1
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
for k,v in inputs.items():
    inputs[k] = v.cuda()
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, top_p=0.9, temperature=0.6)
results = tokenizer.batch_decode(outputs)[0]
print(results)

#Method 2
import transformers
pipe = transformers.pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=False, # langchain expects the full text
    task='text-generation',
    max_new_tokens=512, # max number of tokens to generate in the output
    temperature=0.6,  #temperature for more or less creative answers
    do_sample=True,
    top_p=0.9,
)

sequences = pipe(messages)
for seq in sequences:
    print(f"{seq['generated_text']}")
Downloads last month
13
Safetensors
Model size
4.65B params
Tensor type
FP16
F32
U8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for EpistemeAI/Fireball-Llama-3.1-8B-Instruct-v1dpo