We fine-tune nvidia/Llama-3.1-Minitron-4B-Depth-Base with the LLM-Neo method, which combines LoRA and KD. Training data is sampled from BAAI/Infinity-Instruct for 100k lines.

This repository contains the model described in the paper LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models. The project page is available here and the Github repository is available here.

Basic Usage

This example demonstrates generating text using the model. You'll need to install the necessary libraries first: pip install transformers.

from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
import torch

model_path = "yang31210999/Llama-3.1-Minitron-4B-Depth-Neo-10w"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, device_map="auto", torch_dtype=torch.bfloat16)

prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
generation_config = GenerationConfig(
    max_new_tokens=50, do_sample=True, temperature=0.7
)

outputs = model.generate(**inputs, generation_config=generation_config)
generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print(generated_text)

Benchmarks

In this section, we report the results for Llama-3.1-Minitron-4B-Depth-Neo-10w on standard automatic benchmarks. For all the evaluations, we use the lm-evaluation-harness library.

Evaluation results

Category Benchmark Version n-shot Metric Value Stderr
BBH BBH (General) N/A 3 exact_match 0.4729 ± 0.0055
BBH (Boolean Expressions) 2 3 exact_match 0.8120 ± 0.0248
BBH (Date Understanding) 2 3 exact_match 0.6600 ± 0.0300
CEVAL CEVAL (General) N/A 0 acc 0.4413 ± 0.0135
CEVAL (Accountant) 1 0 acc 0.3469 ± 0.0687
CEVAL (Advanced Mathematics) 1 0 acc 0.4737 ± 0.1177
CEVAL (Art Studies) 1 0 acc 0.4545 ± 0.0880
MMLU MMLU (General) N/A 0 acc 0.6048 ± 0.0039
MMLU (Humanities) N/A 0 acc 0.5552 ± 0.0067
MMLU (STEM) N/A 0 acc 0.5214 ± 0.0086
CMMLU CMMLU (General) N/A 0 acc 0.3548 ± 0.0044
CMMLU (Normalized) N/A 0 acc_norm 0.3548 ± 0.0044
Downloads last month
13
Safetensors
Model size
4.54B params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for yang31210999/Llama-3.1-Minitron-4B-Depth-Neo-BAAI-100k

Finetuned
(4)
this model
Quantizations
1 model

Dataset used to train yang31210999/Llama-3.1-Minitron-4B-Depth-Neo-BAAI-100k

Collection including yang31210999/Llama-3.1-Minitron-4B-Depth-Neo-BAAI-100k