--- license: other license_name: exaone license_link: LICENSE language: - en - ko tags: - lg-ai - exaone - exaone-3.5 pipeline_tag: text-generation library_name: transformers --- [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory) # QuantFactory/EXAONE-3.5-7.8B-Instruct-GGUF This is quantized version of [LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct](https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct) created using llama.cpp # Original Model Card
# EXAONE-3.5-7.8B-Instruct
## Introduction
We introduce EXAONE 3.5, a collection of instruction-tuned bilingual (English and Korean) generative models ranging from 2.4B to 32B parameters, developed and released by LG AI Research. EXAONE 3.5 language models include: 1) **2.4B model** optimized for deployment on small or resource-constrained devices, 2) **7.8B model** matching the size of its predecessor but offering improved performance, and 3) **32B model** delivering powerful performance. All models support long-context processing of up to 32K tokens. Each model demonstrates state-of-the-art performance in real-world use cases and long-context understanding, while remaining competitive in general domains compared to recently released models of similar sizes.
For more details, please refer to our [technical report](https://arxiv.org/abs/2412.04862), [blog](https://www.lgresearch.ai/blog/view?seq=507) and [GitHub](https://github.com/LG-AI-EXAONE/EXAONE-3.5).
This repository contains the instruction-tuned 7.8B language model with the following features:
- Number of Parameters (without embeddings): 6.98B
- Number of Layers: 32
- Number of Attention Heads: GQA with 32 Q-heads and 8 KV-heads
- Vocab Size: 102,400
- Context Length: 32,768 tokens
## Quickstart
We recommend to use `transformers` v4.43 or later.
Here is the code snippet to run conversational inference with the model:
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Choose your prompt
prompt = "Explain how wonderful you are" # English example
prompt = "스스로를 자랑해 봐" # Korean example
messages = [
{"role": "system",
"content": "You are EXAONE model from LG AI Research, a helpful assistant."},
{"role": "user", "content": prompt}
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
)
output = model.generate(
input_ids.to("cuda"),
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=128,
do_sample=False,
)
print(tokenizer.decode(output[0]))
```
> ### Note
> The EXAONE 3.5 instruction-tuned language models were trained to utilize the system prompt,
> so we highly recommend using the system prompts provided in the code snippet above.
## Evaluation
The following table shows the evaluation results of real-world use cases. The full evaluation results can be found in the [technical report](https://arxiv.org/abs/https://www.lgresearch.ai/data/upload/tech_report/en/Technical_report_EXAONE_3.5.pdf).
Models | MT-Bench | LiveBench | Arena-Hard | AlpacaEval | IFEval | KoMT-Bench[1] | LogicKor |
---|---|---|---|---|---|---|---|
EXAONE 3.5 7.8B | 8.29 | 39.8 | 68.7 | 54.2 | 78.9 | 7.96 | 9.08 |
Qwen 2.5 7B | 6.48 | 35.6 | 48.9 | 31.7 | 72.5 | 5.19 | 6.38 |
Llama 3.1 8B | 7.59 | 28.3 | 27.7 | 25.7 | 74.5 | 4.85 | 5.99 |
Gemma 2 9B | 7.64 | 32.1 | 43.6 | 47.3 | 54.7 | 7.10 | 8.05 |
Phi 3 small (7B) | 7.63 | 27.9 | 26.8 | 29.2 | 59.5 | 3.22 | 3.99 |