aws-neuron
/

CodeLlama-7b-hf-neuron-8xlarge

Text Generation

Model card Files Files and versions Community

CodeLlama-7b-hf-neuron-8xlarge / README.md

jburtoft's picture

Update README.md

7e3e96c 10 months ago

|

2.63 kB

	---
	license: llama2
	language:
	- en
	pipeline_tag: text-generation
	inference: false
	tags:
	- facebook
	- meta
	- pytorch
	- llama
	- llama-2
	- inferentia2
	- neuron
	---
	# Neuronx model for [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf)

	This repository contains [AWS Inferentia2](https://aws.amazon.com/ec2/instance-types/inf2/) and [`neuronx`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) compatible checkpoints for [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf).
	You can find detailed information about the base model on its [Model Card](https://huggingface.co/codellama/CodeLlama-7b-hf).

	This model has been exported to the `neuron` format using specific `input_shapes` and `compiler` parameters detailed in the paragraphs below.

	It has been compiled to run on an inf2.8xlarge instance on AWS.

	Please refer to the 🤗 `optimum-neuron` [documentation](https://huggingface.co/docs/optimum-neuron/main/en/guides/models#configuring-the-export-of-a-generative-model) for an explanation of these parameters.

	## Usage on Amazon SageMaker

	_coming soon_

	## Usage with 🤗 `optimum-neuron`

	```python
	>>> from optimum.neuron import pipeline

	>>> p = pipeline('text-generation', 'jburtoft/CodeLlama-7b-hf-neuron-8xlarge')
	>>> p("import socket\n\ndef ping_exponential_backoff(host: str):",
	do_sample=True,
	top_k=10,
	temperature=0.1,
	top_p=0.95,
	num_return_sequences=1,
	max_length=200,
	)
	```
	```
	[{'generated_text': 'import socket\n\ndef ping_exponential_backoff(host: str):\n """\n Ping a host with exponential backoff.\n\n :param host: Host to ping\n :return: True if host is reachable, False otherwise\n """\n for i in range(1, 10):\n try:\n socket.create_connection((host, 80), 1).close()\n return True\n except OSError:\n time.sleep(2 ** i)\n return False\n\n\ndef ping_exponential_backoff_with_timeout(host: str, timeout: int):\n """\n Ping a host with exponential backoff and timeout.\n\n :param host: Host to ping\n :param timeout: Timeout in seconds\n :return: True if host is reachable, False otherwise\n """\n for'}]
	```
	This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints.

	## Arguments passed during export

	input_shapes

	```json
	{
	"batch_size": 1,
	"sequence_length": 2048,
	}
	```

	compiler_args

	```json
	{
	"auto_cast_type": "fp16",
	"num_cores": 2,
	}
	```