yang31210999/Llama-3.1-Minitron-4B-Depth-Neo-BAAI-100k

We fine-tune nvidia/Llama-3.1-Minitron-4B-Depth-Base with the LLM-Neo method, which combines LoRA and KD. Training data is sampled from BAAI/Infinity-Instruct for 100k lines.

This repository contains the model described in the paper LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models. The project page is available here and the Github repository is available here.

Basic Usage

This example demonstrates generating text using the model. You'll need to install the necessary libraries first: pip install transformers.

from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
import torch

model_path = "yang31210999/Llama-3.1-Minitron-4B-Depth-Neo-10w"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, device_map="auto", torch_dtype=torch.bfloat16)

prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
generation_config = GenerationConfig(
    max_new_tokens=50, do_sample=True, temperature=0.7
)

outputs = model.generate(**inputs, generation_config=generation_config)
generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print(generated_text)

Benchmarks

In this section, we report the results for Llama-3.1-Minitron-4B-Depth-Neo-10w on standard automatic benchmarks. For all the evaluations, we use the lm-evaluation-harness library.

Evaluation results

Category	Benchmark	Version	n-shot	Metric	Value	Stderr
BBH	BBH (General)	N/A	3	exact_match	0.4729	± 0.0055
	BBH (Boolean Expressions)	2	3	exact_match	0.8120	± 0.0248
	BBH (Date Understanding)	2	3	exact_match	0.6600	± 0.0300
CEVAL	CEVAL (General)	N/A	0	acc	0.4413	± 0.0135
	CEVAL (Accountant)	1	0	acc	0.3469	± 0.0687
	CEVAL (Advanced Mathematics)	1	0	acc	0.4737	± 0.1177
	CEVAL (Art Studies)	1	0	acc	0.4545	± 0.0880
MMLU	MMLU (General)	N/A	0	acc	0.6048	± 0.0039
	MMLU (Humanities)	N/A	0	acc	0.5552	± 0.0067
	MMLU (STEM)	N/A	0	acc	0.5214	± 0.0086
CMMLU	CMMLU (General)	N/A	0	acc	0.3548	± 0.0044
CMMLU	CMMLU (Normalized)	N/A	0	acc_norm	0.3548	± 0.0044

yang31210999
/

Llama-3.1-Minitron-4B-Depth-Neo-BAAI-100k

Basic Usage

Benchmarks

Evaluation results

Model tree for yang31210999/Llama-3.1-Minitron-4B-Depth-Neo-BAAI-100k

Dataset used to train yang31210999/Llama-3.1-Minitron-4B-Depth-Neo-BAAI-100k

Collection including yang31210999/Llama-3.1-Minitron-4B-Depth-Neo-BAAI-100k

LLM-Neo