Uploaded model
- Developed by: EpistemeAI
- License: apache-2.0
- Finetuned from model : EpistemeAI/Fireball-Llama-3.1-8B-Instruct-v1-16bit
This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
Fireball-Llama-3.1-V1-Instruct
How to use
This repository contains Fireball-Llama-3.11-V1-Instruct , for use with transformers and with the original llama codebase.
Use with transformers
Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.
Make sure to update your transformers installation via pip install --upgrade transformers. Example:
!pip install -U transformers trl peft accelerate bitsandbytes
import torch
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
)
base_model = "EpistemeAI/Fireball-Llama-3.1-8B-Instruct-v1dpo"
model = AutoModelForCausalLM.from_pretrained(
base_model,
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(base_model)
sys = "You are help assistant " \
"(Advanced Natural-based interaction for the language)."
messages = [
{"role": "system", "content": sys},
{"role": "user", "content": "What is DPO and ORPO fine tune?"},
]
#Method 1
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
for k,v in inputs.items():
inputs[k] = v.cuda()
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, top_p=0.9, temperature=0.6)
results = tokenizer.batch_decode(outputs)[0]
print(results)
#Method 2
import transformers
pipe = transformers.pipeline(
model=model,
tokenizer=tokenizer,
return_full_text=False, # langchain expects the full text
task='text-generation',
max_new_tokens=512, # max number of tokens to generate in the output
temperature=0.6, #temperature for more or less creative answers
do_sample=True,
top_p=0.9,
)
sequences = pipe(messages)
for seq in sequences:
print(f"{seq['generated_text']}")
- Downloads last month
- 13
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for EpistemeAI/Fireball-Llama-3.1-8B-Instruct-v1dpo
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct