---
language:
- en
pipeline_tag: text-generation
---

# Meta-Llama-3-8B-Instruct-quantized.w8a16

## Model Overview
- **Model Architecture:** Meta-Llama-3
  - **Input:** Text
  - **Output:** Text
- **Model Optimizations:**
  - **Quantized:** INT8 weights
- **Release Date:** 7/2/2024
- **Version:** 1.0
- **Model Developers:** Neural Magic

Quantized version of [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).
It achieves an average score of 68.69% on the OpenLLM benchmark (version 1), whereas the unquantized model achieves 68.54%.

## Model Optimizations

This model was obtained by quantizing the weights of [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) to INT8 data type.
Only the weights of the linear operators within transformers blocks are quantized. Symmetric per-channel quantization is applied, in which a linear scaling per output dimension maps the INT8 and floating point representations of the quantized weights.
[AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ) is used for quantization.
This optimization reduces the number of bits per parameter from 16 to 8, reducing the disk size and GPU memory requirements by approximately 50%.

## Evaluation

The model was evaluated with the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) using the [vLLM](https://docs.vllm.ai/en/stable/) engine.

## Accuracy

### Open LLM Leaderboard evaluation scores
|                      | [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | Meta-Llama-3-8B-Instruct-quantized.w8a16<br>(this model) |
| :------------------: | :----------------------: | :------------------------------------------------: |
| arc-c<br>25-shot     | 62.63%                    | 61.52%                                          |
| hellaswag<br>10-shot | 78.81%                    | 78.69%                                              |
| mmlu<br>5-shot       | 66.54%                    | 66.55%                                              |
| truthfulqa<br>0-shot | 52.49%                    | 52.60%                                              |
| winogrande<br>5-shot | 76.48%                    | 76.01%                                              |
| gsm8k<br>5-shot      | 75.21%                    | 75.89%                                              |
| **Average<br>Accuracy**  | **68.69%**                    |              **68.54%**                                     |
| **Recovery**             | **100%**                     |              **99.78%**                                     |