🍷 Llama-3.2-Nemotron-3B

This is a finetune of meta-llama/Llama-3.2-3B (specifically, unsloth/Llama-3.2-3B-bnb-4bit).

It was trained on the nvidia/HelpSteer2 dataset, similar to nvidia/Llama-3.1-Nemotron-70B-Instruct-HF, using Unsloth.

πŸ’» Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "itsnebulalol/Llama-3.2-Nemotron-3B"
messages = [{"role": "user", "content": "How many r in strawberry?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
20
Safetensors
Model size
3.21B params
Tensor type
BF16
Β·
Inference Examples
Inference API (serverless) has been turned off for this model.

Model tree for itsnebulalol/Llama-3.2-Nemotron-3B

Finetuned
(117)
this model
Quantizations
2 models

Dataset used to train itsnebulalol/Llama-3.2-Nemotron-3B