|
--- |
|
language: |
|
- en |
|
tags: |
|
- llama |
|
- instruct |
|
- instruction |
|
- empirischtech |
|
pipeline_tag: text-generation |
|
base_model: |
|
- meta-llama/Llama-3.1-8B-Instruct |
|
license: llama3.1 |
|
--- |
|
# LLaMa-10b-instruct model card |
|
|
|
## Model Details |
|
|
|
* **Developed by**: [EmpirischTech](https://empirischtech.at)/[ChaperoneAI](https://chaperoneai.net) |
|
* **Backbone Model**: [LLaMA](https://github.com/meta-llama/llama3) |
|
* **Language(s)**: English |
|
* **Library**: [HuggingFace Transformers](https://github.com/huggingface/transformers) |
|
* **License**: This model is under a **Non-commercial** Bespoke License and governed by the Meta license. You should only use this repository if you have been granted access to the model by filling out [this form](https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform), but have either lost your copy of the weights or encountered issues converting them to the Transformers format |
|
* **Where to send comments**: Instructions on how to provide feedback or comments on a model can be found by opening an issue in the [Hugging Face community's model repository](https://huggingface.co/upstage/llama-30b-instruct-2048/discussions) |
|
* **Contact**: For questions and comments about the model, please email [contact-us](https://chaperoneai.net/contact) |
|
|
|
## Training |
|
Bigger models, more data, and better hardware have consistently improved deep learning performance. Whether in NLP or computer vision, larger models have led to major breakthroughs. However, most cutting-edge models are still trained from scratch, meaning they start with randomly initialized weights. The problem? Training costs are skyrocketing. |
|
To address the escalating computational costs of training large-scale models, various approaches have been proposed. |
|
We present our results validating depth up-scaling—a method that combines depthwise scaling with continued pretraining. Unlike other LLM up-scaling approaches that rely on mixture-of-experts, DUS requires no complex modifications for efficient training and inference, making it a simple yet effective strategy for scaling high-performance LLMs from smaller models. |
|
|
|
In this work, we take a step toward realizing such an approach. Specifically, we extend an existing **8B**-parameter model to **10B** parameters by initializing the |
|
additional layers with pretrained weights, followed by continued pretraining on a smaller dataset across multiple epochs. Due to budget constraints, we were unable to |
|
surpass the foundational model on the **EleutherAI** evaluation benchmark. However, the average scores are very clsoe, demonstrating potential for cost-efficient scaling strategies in large language model development. |
|
|
|
|
|
## Usage |
|
|
|
- Tested on A100 80GB |
|
- Our model can handle up to 132k input tokens as supported by the Llama-3.1 architecture. |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer |
|
model_id="empirischtech/Llama-3.1-10B-Instruct" |
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_id, |
|
device_map="auto", |
|
torch_dtype=torch.float16 |
|
) |
|
|
|
prompt = "### User:\nEmma feels perfectly fine, yet she still has an appointment at the hospital. What might be the reasons?\n\n### Assistant:\n" |
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
del inputs["token_type_ids"] |
|
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True) |
|
|
|
output = model.generate(**inputs, streamer=streamer, use_cache=True, max_new_tokens=1024) |
|
output_text = tokenizer.decode(output[0], skip_special_tokens=True) |
|
``` |
|
|
|
## Hardware and Software |
|
|
|
* **Hardware**: We utilized an A100x8 for training our model |
|
* **Training Factors**: The model was pretrained using a combination of the [DeepSpeed library](https://github.com/microsoft/DeepSpeed) and the [HuggingFace Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) |
|
|
|
## Evaluation Results |
|
|
|
<!-- |
|
The following two different evaluations are performed. |
|
|
|
|
|
### Preplexity as Evaluation Metric |
|
|
|
Perplexity (PPL) is a metric used to evaluate the performance of language models. It measures how well a probability distribution or a language model predicts a sample. A **lower perplexity** score indicates better performance (i.e., the model is more confident in its predictions). |
|
|
|
|
|
#### Main Results |
|
|
|
| Model | Perplexity Score | |
|
|---------------------------------------------|----------| |
|
| **Llama-3.1-8B-Instruct** | 842611366.59 | |
|
| **Llama-3.1-10B-Instruct** | 2890.31 | |
|
|
|
|
|
|
|
#### Scripts to generate evalution results |
|
```python |
|
from evaluate import load |
|
import datasets |
|
|
|
|
|
perplexity = load("perplexity", module_type="metric") |
|
input_texts = datasets.load_dataset("wikitext", |
|
"wikitext-2-raw-v1", |
|
split="test")["text"] |
|
|
|
input_texts = [s for s in input_texts if s!=''] |
|
|
|
model_path='empirischtech/Llama-3.1-10B-Instruct' |
|
results = perplexity.compute(model_id=model_name_or_path, |
|
add_start_token=False, |
|
predictions=input_texts) |
|
|
|
|
|
print(round(results["mean_perplexity"], 2)) |
|
``` |
|
--> |
|
|
|
|
|
|
|
### Harness Evaluation |
|
|
|
- The performance evaluation is based on the tasks being evaluated on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard). |
|
The model is evaluated on three benchmark datasets, which include `ARC-Challenge`, `HellaSwag`, `MMLU` and `IFEval`. |
|
The library used is [lm-evaluation-harness repository](https://github.com/EleutherAI/lm-evaluation-harness) |
|
|
|
|
|
#### Main Results |
|
| Benchmark | **Llama-3.1-8B-Instruct** | **Llama-3.1-10B-Instruct** | |
|
|------------|:------------------------:|:------------------------:| |
|
| ARC | 55.05 | 52.47 | |
|
| HellaSwag | 79.28 | 77.08 | |
|
| MMLU-Pro | 40.34 | 33.59 | |
|
| IFEval | 59.95 | 54.80 | |
|
| **average** | **58.66** | **54.49** | |
|
|
|
|
|
#### Scripts to generate evalution results |
|
|
|
```python |
|
# install from https://github.com/EleutherAI/lm-evaluation-harness |
|
pip install lm-eval>=0.4.7 |
|
|
|
from lm_eval import evaluator |
|
|
|
tasks_list = ["arc_challenge", "ifeval", "mmlu_pro", "hellaswag"] # Benchmark dataset |
|
|
|
model_path="empirischtech/Llama-3.1-10B-Instruct" |
|
|
|
# Run evaluation |
|
results = evaluator.simple_evaluate( |
|
model="hf", # Hugging Face model |
|
cache_requests=False, |
|
model_args=f"pretrained={model_path}", |
|
tasks=tasks_list, |
|
batch_size=4, |
|
device="cuda:0" |
|
) |
|
|
|
# Extract results |
|
results = results['results'] |
|
json_string = json.dumps(results, indent=4) |
|
|
|
``` |
|
|
|
## Ethical Issues |
|
|
|
### Ethical Considerations |
|
- There were no ethical issues involved, as we did not include the benchmark test set or the training set in the model's training process |
|
|
|
## Contact Us |
|
|
|
### Why Our LLMs? |
|
- [EmpirischTech](https://empirischtech.at)/[ChaperoneAI](https://chaperoneai.net) Unlock the full potential of private LLMs for your business with ease. Customize and fine-tune them using your own data for a solution that fits your unique needs. Want a seamless integration? Let’s connect! ► [Get in touch](https://chaperoneai.net/contact) |