|
--- |
|
base_model: meta-llama/Meta-Llama-3.1-70B-Instruct |
|
language: |
|
- en |
|
- de |
|
- fr |
|
- it |
|
- pt |
|
- hi |
|
- es |
|
- th |
|
library_name: transformers |
|
license: llama3.1 |
|
pipeline_tag: text-generation |
|
tags: |
|
- facebook |
|
- meta |
|
- pytorch |
|
- llama |
|
- llama-3 |
|
--- |
|
|
|
## Model Information |
|
The Llama 3.1 instruction tuned text only 70B model is optimized for multilingual dialogue use cases |
|
and outperform many of the available open source and closed chat models on common industry benchmarks. |
|
|
|
This repository stores a experimental IQ_1S quantized GGUF Llama 3.1 instruction tuned 70B model. |
|
|
|
**Model developer**: Meta |
|
|
|
**Model Architecture**: Llama 3.1 is an auto-regressive language model |
|
that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) |
|
and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness |
|
and safety. |
|
|
|
| |Training Data |Params|Input modalities |Output modalities |Context length|GQA|Token count|Knowledge cutoff| |
|
|---------------------|--------------------------------------------|------|-----------------|--------------------------|--------------|---|-----------|----------------| |
|
|Llama 3.1 (text only)|A new mix of publicly available online data.|70B |Multilingual Text|Multilingual Text and code|128k |Yes|15T+ |December 2023 | |
|
|
|
**Supported languages**: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. |
|
|
|
# Quantization Information |
|
|Weight Quantization| PPL | |
|
|-------------------|--------------------| |
|
| FP16 | 4.1892 +/- 0.01430 | |
|
| IQ_1S | 8.5005 +/- 0.03298 | |
|
|
|
Dataset used for re-calibration: Mix of [standard_cal_data](https://github.com/turboderp/exllamav2/tree/master/exllamav2/conversion/standard_cal_data) |
|
|
|
The generated `imatrix` can be downloaded from [imatrix.dat](https://huggingface.co/npc0/Meta-Llama-3.1-70B-Instruct-IQ_1S/resolve/main/imatrix.dat) |
|
|
|
**Usage**: with `llama-cpp-python` |
|
```python |
|
from llama_cpp import Llama |
|
|
|
llm = Llama.from_pretrained( |
|
repo_id="npc0/Meta-Llama-3.1-70B-Instruct-IQ_1S", |
|
filename="GGUF_FILE", |
|
) |
|
|
|
llm.create_chat_completion( |
|
messages = [ |
|
{ |
|
"role": "user", |
|
"content": "What is the capital of France?" |
|
} |
|
] |
|
) |
|
``` |