dacorvo's picture
dacorvo HF staff
Upload folder using huggingface_hub
cce0689
|
raw
history blame
1.86 kB
---
language:
- en
pipeline_tag: text-generation
inference: false
tags:
- facebook
- meta
- pytorch
- llama
- llama-2
- inferentia2
- neuron
---
# Neuronx model for [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)
This repository contains [**AWS Inferentia2**](https://aws.amazon.com/ec2/instance-types/inf2/) and [`neuronx`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) compatible checkpoints for [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf).
You can find detailed information about the base model on its [Model Card](https://huggingface.co/meta-llama/Llama-2-7b-hf).
This model has been exported to the `neuron` format using specific `input_shapes` and `compiler` parameters detailed in the paragraphs below.
Please refer to the πŸ€— `optimum-neuron` [documentation](https://huggingface.co/docs/optimum-neuron/main/en/guides/models#configuring-the-export-of-a-generative-model) for an explanation of these parameters.
## Usage on Amazon SageMaker
_coming soon_
## Usage with πŸ€— `optimum-neuron`
```python
>>> from optimum.neuron import pipeline
>>> p = pipeline('text-generation', 'aws-neuron/Llama-2-7b-hf-neuron-latency')
>>> p("My favorite place on earth is", max_new_tokens=64, do_sample=True, top_k=50)
[{'generated_text': 'My favorite place on earth is the ocean. It is where I feel most
at peace. I love to travel and see new places. I have a'}]
```
This repository contains tags specific to versions of `neuronx`. When using with πŸ€— `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints.
## Arguments passed during export
**input_shapes**
```json
{
"batch_size": 1,
"sequence_length": 2048,
}
```
**compiler_args**
```json
{
"auto_cast_type": "fp16",
"num_cores": 24,
}
```