File size: 7,291 Bytes
f62ce55 4b6ddbb f62ce55 c261d05 16470c5 f62ce55 4b6ddbb f62ce55 c261d05 f62ce55 8b9a10b f62ce55 c261d05 9e4bc24 2ac7171 4a137e1 9e4bc24 f62ce55 c261d05 f62ce55 feede7d 9e4bc24 f62ce55 9e4bc24 f62ce55 9e4bc24 f62ce55 e02397e f62ce55 e02397e f62ce55 c261d05 f62ce55 3b3375e feede7d 019c2df feede7d 019c2df feede7d 3b3375e feede7d 7b682b4 a385b17 7b682b4 f62ce55 feede7d 5b9c86f 7434793 f62ce55 019c2df 7b682b4 c261d05 ae9f646 c261d05 1e4fcc5 feede7d c261d05 f62ce55 c261d05 f62ce55 f6523d8 c261d05 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
---
language:
- en
tags:
- llama
- instruct
- instruction
- empirischtech
pipeline_tag: text-generation
base_model:
- meta-llama/Llama-3.1-8B-Instruct
license: llama3.1
---
# LLaMa-10b-instruct model card
## Model Details
* **Developed by**: [EmpirischTech](https://empirischtech.at)/[ChaperoneAI](https://chaperoneai.net)
* **Backbone Model**: [LLaMA](https://github.com/meta-llama/llama3)
* **Language(s)**: English
* **Library**: [HuggingFace Transformers](https://github.com/huggingface/transformers)
* **License**: This model is under a **Non-commercial** Bespoke License and governed by the Meta license. You should only use this repository if you have been granted access to the model by filling out [this form](https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform), but have either lost your copy of the weights or encountered issues converting them to the Transformers format
* **Where to send comments**: Instructions on how to provide feedback or comments on a model can be found by opening an issue in the [Hugging Face community's model repository](https://huggingface.co/upstage/llama-30b-instruct-2048/discussions)
* **Contact**: For questions and comments about the model, please email [contact-us](https://chaperoneai.net/contact)
## Training
Bigger models, more data, and better hardware have consistently improved deep learning performance. Whether in NLP or computer vision, larger models have led to major breakthroughs. However, most cutting-edge models are still trained from scratch, meaning they start with randomly initialized weights. The problem? Training costs are skyrocketing.
To address the escalating computational costs of training large-scale models, various approaches have been proposed.
We present our results validating depth up-scaling—a method that combines depthwise scaling with continued pretraining. Unlike other LLM up-scaling approaches that rely on mixture-of-experts, DUS requires no complex modifications for efficient training and inference, making it a simple yet effective strategy for scaling high-performance LLMs from smaller models.
In this work, we take a step toward realizing such an approach. Specifically, we extend an existing **8B**-parameter model to **10B** parameters by initializing the
additional layers with pretrained weights, followed by continued pretraining on a smaller dataset across multiple epochs. Due to budget constraints, we were unable to
surpass the foundational model on the **EleutherAI** evaluation benchmark. However, the average scores are very clsoe, demonstrating potential for cost-efficient scaling strategies in large language model development.
## Usage
- Tested on A100 80GB
- Our model can handle up to 132k input tokens as supported by the Llama-3.1 architecture.
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
model_id="empirischtech/Llama-3.1-10B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.float16
)
prompt = "### User:\nEmma feels perfectly fine, yet she still has an appointment at the hospital. What might be the reasons?\n\n### Assistant:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
del inputs["token_type_ids"]
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
output = model.generate(**inputs, streamer=streamer, use_cache=True, max_new_tokens=1024)
output_text = tokenizer.decode(output[0], skip_special_tokens=True)
```
## Hardware and Software
* **Hardware**: We utilized an A100x8 for training our model
* **Training Factors**: The model was pretrained using a combination of the [DeepSpeed library](https://github.com/microsoft/DeepSpeed) and the [HuggingFace Trainer](https://huggingface.co/docs/transformers/main_classes/trainer)
## Evaluation Results
<!--
The following two different evaluations are performed.
### Preplexity as Evaluation Metric
Perplexity (PPL) is a metric used to evaluate the performance of language models. It measures how well a probability distribution or a language model predicts a sample. A **lower perplexity** score indicates better performance (i.e., the model is more confident in its predictions).
#### Main Results
| Model | Perplexity Score |
|---------------------------------------------|----------|
| **Llama-3.1-8B-Instruct** | 842611366.59 |
| **Llama-3.1-10B-Instruct** | 2890.31 |
#### Scripts to generate evalution results
```python
from evaluate import load
import datasets
perplexity = load("perplexity", module_type="metric")
input_texts = datasets.load_dataset("wikitext",
"wikitext-2-raw-v1",
split="test")["text"]
input_texts = [s for s in input_texts if s!='']
model_path='empirischtech/Llama-3.1-10B-Instruct'
results = perplexity.compute(model_id=model_name_or_path,
add_start_token=False,
predictions=input_texts)
print(round(results["mean_perplexity"], 2))
```
-->
### Harness Evaluation
- The performance evaluation is based on the tasks being evaluated on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
The model is evaluated on three benchmark datasets, which include `ARC-Challenge`, `HellaSwag`, `MMLU` and `IFEval`.
The library used is [lm-evaluation-harness repository](https://github.com/EleutherAI/lm-evaluation-harness)
#### Main Results
| Benchmark | **Llama-3.1-8B-Instruct** | **Llama-3.1-10B-Instruct** |
|------------|:------------------------:|:------------------------:|
| ARC | 55.05 | 52.47 |
| HellaSwag | 79.28 | 77.08 |
| MMLU-Pro | 40.34 | 33.59 |
| IFEval | 59.95 | 54.80 |
| **average** | **58.66** | **54.49** |
#### Scripts to generate evalution results
```python
# install from https://github.com/EleutherAI/lm-evaluation-harness
pip install lm-eval>=0.4.7
from lm_eval import evaluator
tasks_list = ["arc_challenge", "ifeval", "mmlu_pro", "hellaswag"] # Benchmark dataset
model_path="empirischtech/Llama-3.1-10B-Instruct"
# Run evaluation
results = evaluator.simple_evaluate(
model="hf", # Hugging Face model
cache_requests=False,
model_args=f"pretrained={model_path}",
tasks=tasks_list,
batch_size=4,
device="cuda:0"
)
# Extract results
results = results['results']
json_string = json.dumps(results, indent=4)
```
## Ethical Issues
### Ethical Considerations
- There were no ethical issues involved, as we did not include the benchmark test set or the training set in the model's training process
## Contact Us
### Why Our LLMs?
- [EmpirischTech](https://empirischtech.at)/[ChaperoneAI](https://chaperoneai.net) Unlock the full potential of private LLMs for your business with ease. Customize and fine-tune them using your own data for a solution that fits your unique needs. Want a seamless integration? Let’s connect! ► [Get in touch](https://chaperoneai.net/contact) |