Azma-AI
/

Meta-Llama-3-8B-Instruct-64k-PoSE

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Meta-Llama-3-8B-Instruct-64k-PoSE / README.md

Aswanth-Azma's picture

Update README.md

ddd3f90 verified 10 months ago

|

history blame contribute delete

2.76 kB

	---
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- facebook
	- meta
	- pytorch
	- llama
	- llama-3
	---

	# Meta-Llama-3-8B-Instruct-64k-PoSE

	<img src="https://huggingface.co/winglian/Llama-3-8b-64k-PoSE/resolve/main/output.png" />

	This is a custom version of the Meta Llama 3 8B instruction-tuned language model with an extended context length of up to 64,000 tokens. It was created by merging the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model with a LoRA adapter finetuned using [PoSE](https://huggingface.co/papers/2309.10400) by [Wing Lian](https://huggingface.co/winglian) to extend Llama's context length from 8k to 64k @ rope_theta: 500000.0.
	They used PoSE with continued pretraining on 300M tokens from the RedPajama V1 dataset using data between 6k-8k tokens.

	They have further set rope_theta to 2M after continued pre-training to potentially further extend the context past 64k.

	This was trained on a subset of the RedPajama v1 dataset with text between 6k-8k context. They trained a rank stabilized LoRA of rank 256. [WandB](https://wandb.ai/oaaic/llama-3-64k/runs/tkcyjt37)

	### Model Details
	- Base Model: Meta Llama 3 8B instruction-tuned model
	- Context Length: Up to 64,000 tokens (increased from original 8,192 token limit)
	- Adapter Training: PoSE adapter finetuned on 300M tokens from the RedPajama V1 dataset with 6k-8k token sequences.
	- Adapter Rank: 256 rank stabilized LoRA adapter

	This extended context model allows for much longer form inputs and generation compared to the original base model. It maintains the strong instruction-following and safety capabilities of Llama 3 while greatly increasing the applicable use cases.
	See the Original Repo by Wing Lian for more details on the adapter training process.

	### Usage
	This model can be used just like the base Llama 3 8B model, but with the increased context length enabling much longer prompts and outputs. See the example usage with the Transformers library:

	```python
	import transformers
	import torch

	model_id = "Azma-AI/Meta-Llama-3-8B-Instruct-64k-PoSE"
	pipeline = transformers.pipeline(
	"text-generation", model=model_id,
	model_kwargs={"torch_dtype": torch.bfloat16},
	device_map="auto"
	)

	long_prompt = "..." # Your prompt up to 64k tokens
	output = pipeline(long_prompt)
	```

	### Citation
	If you use this model, please cite the original Meta Llama 3 model card and the PoSE adapter paper:

	```code
	@article{llama3modelcard,
	title={Llama 3 Model Card},
	author={AI@Meta},
	year={2024},
	url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
	}
	```

	### Acknowledgments
	[Wing Lian](https://huggingface.co/winglian)
	[MetaAI](https://huggingface.co/meta-llama)