|
--- |
|
license: llama2 |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
inference: false |
|
tags: |
|
- facebook |
|
- meta |
|
- pytorch |
|
- llama |
|
- llama-2 |
|
- inferentia2 |
|
- neuron |
|
--- |
|
# Neuronx model for [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) |
|
|
|
This repository contains [**AWS Inferentia2**](https://aws.amazon.com/ec2/instance-types/inf2/) and [`neuronx`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) compatible checkpoints for [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf). |
|
You can find detailed information about the base model on its [Model Card](https://huggingface.co/codellama/CodeLlama-7b-hf). |
|
|
|
This model has been exported to the `neuron` format using specific `input_shapes` and `compiler` parameters detailed in the paragraphs below. |
|
|
|
It has been compiled to run on an inf2.8xlarge instance on AWS. |
|
|
|
Please refer to the π€ `optimum-neuron` [documentation](https://huggingface.co/docs/optimum-neuron/main/en/guides/models#configuring-the-export-of-a-generative-model) for an explanation of these parameters. |
|
|
|
## Usage on Amazon SageMaker |
|
|
|
_coming soon_ |
|
|
|
## Usage with π€ `optimum-neuron` |
|
|
|
```python |
|
>>> from optimum.neuron import pipeline |
|
|
|
>>> p = pipeline('text-generation', 'jburtoft/CodeLlama-7b-hf-neuron-8xlarge') |
|
>>> p("import socket\n\ndef ping_exponential_backoff(host: str):", |
|
do_sample=True, |
|
top_k=10, |
|
temperature=0.1, |
|
top_p=0.95, |
|
num_return_sequences=1, |
|
max_length=200, |
|
) |
|
``` |
|
``` |
|
[{'generated_text': 'import socket\n\ndef ping_exponential_backoff(host: str):\n """\n Ping a host with exponential backoff.\n\n :param host: Host to ping\n :return: True if host is reachable, False otherwise\n """\n for i in range(1, 10):\n try:\n socket.create_connection((host, 80), 1).close()\n return True\n except OSError:\n time.sleep(2 ** i)\n return False\n\n\ndef ping_exponential_backoff_with_timeout(host: str, timeout: int):\n """\n Ping a host with exponential backoff and timeout.\n\n :param host: Host to ping\n :param timeout: Timeout in seconds\n :return: True if host is reachable, False otherwise\n """\n for'}] |
|
``` |
|
This repository contains tags specific to versions of `neuronx`. When using with π€ `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints. |
|
|
|
## Arguments passed during export |
|
|
|
**input_shapes** |
|
|
|
```json |
|
{ |
|
"batch_size": 1, |
|
"sequence_length": 2048, |
|
} |
|
``` |
|
|
|
**compiler_args** |
|
|
|
```json |
|
{ |
|
"auto_cast_type": "fp16", |
|
"num_cores": 2, |
|
} |
|
``` |