|
--- |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
tags: |
|
- facebook |
|
- meta |
|
- pytorch |
|
- llama |
|
- llama-3 |
|
--- |
|
|
|
# Meta-Llama-3-8B-Instruct-64k-PoSE |
|
|
|
<img src="https://huggingface.co/winglian/Llama-3-8b-64k-PoSE/resolve/main/output.png" /> |
|
|
|
This is a custom version of the Meta Llama 3 8B instruction-tuned language model with an extended context length of up to 64,000 tokens. It was created by merging the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model with a LoRA adapter finetuned using [PoSE](https://huggingface.co/papers/2309.10400) by [Wing Lian](https://huggingface.co/winglian) to extend Llama's context length from 8k to 64k @ rope_theta: 500000.0. |
|
They used PoSE with continued pretraining on 300M tokens from the RedPajama V1 dataset using data between 6k-8k tokens. |
|
|
|
They have further set rope_theta to 2M after continued pre-training to potentially further extend the context past 64k. |
|
|
|
This was trained on a subset of the RedPajama v1 dataset with text between 6k-8k context. They trained a rank stabilized LoRA of rank 256. [WandB](https://wandb.ai/oaaic/llama-3-64k/runs/tkcyjt37) |
|
|
|
### Model Details |
|
- **Base Model**: Meta Llama 3 8B instruction-tuned model |
|
- **Context Length**: Up to 64,000 tokens (increased from original 8,192 token limit) |
|
- **Adapter Training**: PoSE adapter finetuned on 300M tokens from the RedPajama V1 dataset with 6k-8k token sequences. |
|
- **Adapter Rank**: 256 rank stabilized LoRA adapter |
|
|
|
This extended context model allows for much longer form inputs and generation compared to the original base model. It maintains the strong instruction-following and safety capabilities of Llama 3 while greatly increasing the applicable use cases. |
|
See the Original Repo by Wing Lian for more details on the adapter training process. |
|
|
|
### Usage |
|
This model can be used just like the base Llama 3 8B model, but with the increased context length enabling much longer prompts and outputs. See the example usage with the Transformers library: |
|
|
|
```python |
|
import transformers |
|
import torch |
|
|
|
model_id = "Azma-AI/Meta-Llama-3-8B-Instruct-64k-PoSE" |
|
pipeline = transformers.pipeline( |
|
"text-generation", model=model_id, |
|
model_kwargs={"torch_dtype": torch.bfloat16}, |
|
device_map="auto" |
|
) |
|
|
|
long_prompt = "..." # Your prompt up to 64k tokens |
|
output = pipeline(long_prompt) |
|
``` |
|
|
|
### Citation |
|
If you use this model, please cite the original Meta Llama 3 model card and the PoSE adapter paper: |
|
|
|
```code |
|
@article{llama3modelcard, |
|
title={Llama 3 Model Card}, |
|
author={AI@Meta}, |
|
year={2024}, |
|
url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md} |
|
} |
|
``` |
|
|
|
### Acknowledgments |
|
[Wing Lian](https://huggingface.co/winglian) |
|
[MetaAI](https://huggingface.co/meta-llama) |