rwmasood's picture
Update README.md
3b3375e verified
---
language:
- en
tags:
- llama
- instruct
- instruction
- empirischtech
pipeline_tag: text-generation
base_model:
- meta-llama/Llama-3.1-8B-Instruct
license: llama3.1
---
# LLaMa-10b-instruct model card
## Model Details
* **Developed by**: [EmpirischTech](https://empirischtech.at)/[ChaperoneAI](https://chaperoneai.net)
* **Backbone Model**: [LLaMA](https://github.com/meta-llama/llama3)
* **Language(s)**: English
* **Library**: [HuggingFace Transformers](https://github.com/huggingface/transformers)
* **License**: This model is under a **Non-commercial** Bespoke License and governed by the Meta license. You should only use this repository if you have been granted access to the model by filling out [this form](https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform), but have either lost your copy of the weights or encountered issues converting them to the Transformers format
* **Where to send comments**: Instructions on how to provide feedback or comments on a model can be found by opening an issue in the [Hugging Face community's model repository](https://huggingface.co/upstage/llama-30b-instruct-2048/discussions)
* **Contact**: For questions and comments about the model, please email [contact-us](https://chaperoneai.net/contact)
## Training
Bigger models, more data, and better hardware have consistently improved deep learning performance. Whether in NLP or computer vision, larger models have led to major breakthroughs. However, most cutting-edge models are still trained from scratch, meaning they start with randomly initialized weights. The problem? Training costs are skyrocketing.
To address the escalating computational costs of training large-scale models, various approaches have been proposed.
We present our results validating depth up-scaling—a method that combines depthwise scaling with continued pretraining. Unlike other LLM up-scaling approaches that rely on mixture-of-experts, DUS requires no complex modifications for efficient training and inference, making it a simple yet effective strategy for scaling high-performance LLMs from smaller models.
In this work, we take a step toward realizing such an approach. Specifically, we extend an existing **8B**-parameter model to **10B** parameters by initializing the
additional layers with pretrained weights, followed by continued pretraining on a smaller dataset across multiple epochs. Due to budget constraints, we were unable to
surpass the foundational model on the **EleutherAI** evaluation benchmark. However, the average scores are very clsoe, demonstrating potential for cost-efficient scaling strategies in large language model development.
## Usage
- Tested on A100 80GB
- Our model can handle up to 132k input tokens as supported by the Llama-3.1 architecture.
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
model_id="empirischtech/Llama-3.1-10B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.float16
)
prompt = "### User:\nEmma feels perfectly fine, yet she still has an appointment at the hospital. What might be the reasons?\n\n### Assistant:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
del inputs["token_type_ids"]
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
output = model.generate(**inputs, streamer=streamer, use_cache=True, max_new_tokens=1024)
output_text = tokenizer.decode(output[0], skip_special_tokens=True)
```
## Hardware and Software
* **Hardware**: We utilized an A100x8 for training our model
* **Training Factors**: The model was pretrained using a combination of the [DeepSpeed library](https://github.com/microsoft/DeepSpeed) and the [HuggingFace Trainer](https://huggingface.co/docs/transformers/main_classes/trainer)
## Evaluation Results
<!--
The following two different evaluations are performed.
### Preplexity as Evaluation Metric
Perplexity (PPL) is a metric used to evaluate the performance of language models. It measures how well a probability distribution or a language model predicts a sample. A **lower perplexity** score indicates better performance (i.e., the model is more confident in its predictions).
#### Main Results
| Model | Perplexity Score |
|---------------------------------------------|----------|
| **Llama-3.1-8B-Instruct** | 842611366.59 |
| **Llama-3.1-10B-Instruct** | 2890.31 |
#### Scripts to generate evalution results
```python
from evaluate import load
import datasets
perplexity = load("perplexity", module_type="metric")
input_texts = datasets.load_dataset("wikitext",
"wikitext-2-raw-v1",
split="test")["text"]
input_texts = [s for s in input_texts if s!='']
model_path='empirischtech/Llama-3.1-10B-Instruct'
results = perplexity.compute(model_id=model_name_or_path,
add_start_token=False,
predictions=input_texts)
print(round(results["mean_perplexity"], 2))
```
-->
### Harness Evaluation
- The performance evaluation is based on the tasks being evaluated on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
The model is evaluated on three benchmark datasets, which include `ARC-Challenge`, `HellaSwag`, `MMLU` and `IFEval`.
The library used is [lm-evaluation-harness repository](https://github.com/EleutherAI/lm-evaluation-harness)
#### Main Results
| Benchmark | **Llama-3.1-8B-Instruct** | **Llama-3.1-10B-Instruct** |
|------------|:------------------------:|:------------------------:|
| ARC | 55.05 | 52.47 |
| HellaSwag | 79.28 | 77.08 |
| MMLU-Pro | 40.34 | 33.59 |
| IFEval | 59.95 | 54.80 |
| **average** | **58.66** | **54.49** |
#### Scripts to generate evalution results
```python
# install from https://github.com/EleutherAI/lm-evaluation-harness
pip install lm-eval>=0.4.7
from lm_eval import evaluator
tasks_list = ["arc_challenge", "ifeval", "mmlu_pro", "hellaswag"] # Benchmark dataset
model_path="empirischtech/Llama-3.1-10B-Instruct"
# Run evaluation
results = evaluator.simple_evaluate(
model="hf", # Hugging Face model
cache_requests=False,
model_args=f"pretrained={model_path}",
tasks=tasks_list,
batch_size=4,
device="cuda:0"
)
# Extract results
results = results['results']
json_string = json.dumps(results, indent=4)
```
## Ethical Issues
### Ethical Considerations
- There were no ethical issues involved, as we did not include the benchmark test set or the training set in the model's training process
## Contact Us
### Why Our LLMs?
- [EmpirischTech](https://empirischtech.at)/[ChaperoneAI](https://chaperoneai.net) Unlock the full potential of private LLMs for your business with ease. Customize and fine-tune them using your own data for a solution that fits your unique needs. Want a seamless integration? Let’s connect! ► [Get in touch](https://chaperoneai.net/contact)