metadata
license: llama2
language:
- en
pipeline_tag: text-generation
inference: false
tags:
- facebook
- meta
- pytorch
- llama
- llama-2
- inferentia2
- neuron
Neuronx model for codellama/CodeLlama-7b-hf
This repository contains AWS Inferentia2 and neuronx
compatible checkpoints for codellama/CodeLlama-7b-hf.
You can find detailed information about the base model on its Model Card.
This model has been exported to the neuron
format using specific input_shapes
and compiler
parameters detailed in the paragraphs below.
It has been compiled to run on an inf2.8xlarge instance on AWS.
Please refer to the 🤗 optimum-neuron
documentation for an explanation of these parameters.
Usage on Amazon SageMaker
coming soon
Usage with 🤗 optimum-neuron
>>> from optimum.neuron import pipeline
>>> p = pipeline('text-generation', 'jburtoft/CodeLlama-7b-hf-neuron-8xlarge')
>>> p("import socket\n\ndef ping_exponential_backoff(host: str):",
do_sample=True,
top_k=10,
temperature=0.1,
top_p=0.95,
num_return_sequences=1,
max_length=200,
)
[{'generated_text': 'import socket\n\ndef ping_exponential_backoff(host: str):\n """\n Ping a host with exponential backoff.\n\n :param host: Host to ping\n :return: True if host is reachable, False otherwise\n """\n for i in range(1, 10):\n try:\n socket.create_connection((host, 80), 1).close()\n return True\n except OSError:\n time.sleep(2 ** i)\n return False\n\n\ndef ping_exponential_backoff_with_timeout(host: str, timeout: int):\n """\n Ping a host with exponential backoff and timeout.\n\n :param host: Host to ping\n :param timeout: Timeout in seconds\n :return: True if host is reachable, False otherwise\n """\n for'}]
This repository contains tags specific to versions of neuronx
. When using with 🤗 optimum-neuron
, use the repo revision specific to the version of neuronx
you are using, to load the right serialized checkpoints.
Arguments passed during export
input_shapes
{
"batch_size": 1,
"sequence_length": 2048,
}
compiler_args
{
"auto_cast_type": "fp16",
"num_cores": 2,
}