jburtoft commited on
Commit
7e3e96c
1 Parent(s): b1d557c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -0
README.md CHANGED
@@ -1,3 +1,69 @@
1
  ---
2
  license: llama2
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: llama2
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ inference: false
7
+ tags:
8
+ - facebook
9
+ - meta
10
+ - pytorch
11
+ - llama
12
+ - llama-2
13
+ - inferentia2
14
+ - neuron
15
  ---
16
+ # Neuronx model for [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf)
17
+
18
+ This repository contains [**AWS Inferentia2**](https://aws.amazon.com/ec2/instance-types/inf2/) and [`neuronx`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) compatible checkpoints for [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf).
19
+ You can find detailed information about the base model on its [Model Card](https://huggingface.co/codellama/CodeLlama-7b-hf).
20
+
21
+ This model has been exported to the `neuron` format using specific `input_shapes` and `compiler` parameters detailed in the paragraphs below.
22
+
23
+ It has been compiled to run on an inf2.8xlarge instance on AWS.
24
+
25
+ Please refer to the 🤗 `optimum-neuron` [documentation](https://huggingface.co/docs/optimum-neuron/main/en/guides/models#configuring-the-export-of-a-generative-model) for an explanation of these parameters.
26
+
27
+ ## Usage on Amazon SageMaker
28
+
29
+ _coming soon_
30
+
31
+ ## Usage with 🤗 `optimum-neuron`
32
+
33
+ ```python
34
+ >>> from optimum.neuron import pipeline
35
+
36
+ >>> p = pipeline('text-generation', 'jburtoft/CodeLlama-7b-hf-neuron-8xlarge')
37
+ >>> p("import socket\n\ndef ping_exponential_backoff(host: str):",
38
+ do_sample=True,
39
+ top_k=10,
40
+ temperature=0.1,
41
+ top_p=0.95,
42
+ num_return_sequences=1,
43
+ max_length=200,
44
+ )
45
+ ```
46
+ ```
47
+ [{'generated_text': 'import socket\n\ndef ping_exponential_backoff(host: str):\n """\n Ping a host with exponential backoff.\n\n :param host: Host to ping\n :return: True if host is reachable, False otherwise\n """\n for i in range(1, 10):\n try:\n socket.create_connection((host, 80), 1).close()\n return True\n except OSError:\n time.sleep(2 ** i)\n return False\n\n\ndef ping_exponential_backoff_with_timeout(host: str, timeout: int):\n """\n Ping a host with exponential backoff and timeout.\n\n :param host: Host to ping\n :param timeout: Timeout in seconds\n :return: True if host is reachable, False otherwise\n """\n for'}]
48
+ ```
49
+ This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints.
50
+
51
+ ## Arguments passed during export
52
+
53
+ **input_shapes**
54
+
55
+ ```json
56
+ {
57
+ "batch_size": 1,
58
+ "sequence_length": 2048,
59
+ }
60
+ ```
61
+
62
+ **compiler_args**
63
+
64
+ ```json
65
+ {
66
+ "auto_cast_type": "fp16",
67
+ "num_cores": 2,
68
+ }
69
+ ```