RonanMcGovern commited on
Commit
e116956
1 Parent(s): f660576

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +239 -154
README.md CHANGED
@@ -1,219 +1,304 @@
1
  ---
2
- library_name: peft
3
- base_model: meta-llama/Llama-2-7b-chat-hf
 
 
 
 
 
 
 
 
 
 
 
4
  ---
 
 
 
5
 
6
- # Model Card for Model ID
 
 
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
 
 
9
 
 
 
 
 
 
 
10
 
 
11
 
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
-
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Shared by [optional]:** [More Information Needed]
22
- - **Model type:** [More Information Needed]
23
- - **Language(s) (NLP):** [More Information Needed]
24
- - **License:** [More Information Needed]
25
- - **Finetuned from model [optional]:** [More Information Needed]
26
-
27
- ### Model Sources [optional]
28
-
29
- <!-- Provide the basic links for the model. -->
30
-
31
- - **Repository:** [More Information Needed]
32
- - **Paper [optional]:** [More Information Needed]
33
- - **Demo [optional]:** [More Information Needed]
34
-
35
- ## Uses
36
-
37
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
38
-
39
- ### Direct Use
40
-
41
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
42
-
43
- [More Information Needed]
44
-
45
- ### Downstream Use [optional]
46
-
47
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
48
-
49
- [More Information Needed]
50
-
51
- ### Out-of-Scope Use
52
-
53
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
54
-
55
- [More Information Needed]
56
-
57
- ## Bias, Risks, and Limitations
58
 
59
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
60
 
61
- [More Information Needed]
62
 
63
- ### Recommendations
64
 
65
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
 
 
66
 
67
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
68
 
69
- ## How to Get Started with the Model
70
 
71
- Use the code below to get started with the model.
72
 
73
- [More Information Needed]
74
 
75
- ## Training Details
 
76
 
77
- ### Training Data
 
78
 
79
- <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
80
 
81
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
 
83
- ### Training Procedure
 
 
 
 
 
84
 
85
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 
 
86
 
87
- #### Preprocessing [optional]
 
88
 
89
- [More Information Needed]
 
90
 
 
91
 
92
- #### Training Hyperparameters
93
 
94
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 
 
 
 
 
 
 
 
 
 
 
 
 
95
 
96
- #### Speeds, Sizes, Times [optional]
 
 
 
 
 
 
 
 
 
97
 
98
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 
 
99
 
100
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
 
102
- ## Evaluation
103
-
104
- <!-- This section describes the evaluation protocols and provides the results. -->
105
-
106
- ### Testing Data, Factors & Metrics
107
-
108
- #### Testing Data
109
-
110
- <!-- This should link to a Data Card if possible. -->
111
-
112
- [More Information Needed]
113
-
114
- #### Factors
115
-
116
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
117
-
118
- [More Information Needed]
119
-
120
- #### Metrics
121
-
122
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
123
-
124
- [More Information Needed]
125
-
126
- ### Results
127
-
128
- [More Information Needed]
129
-
130
- #### Summary
131
-
132
-
133
-
134
- ## Model Examination [optional]
135
-
136
- <!-- Relevant interpretability work for the model goes here -->
137
-
138
- [More Information Needed]
139
 
140
- ## Environmental Impact
141
 
142
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
143
 
144
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
145
 
146
- - **Hardware Type:** [More Information Needed]
147
- - **Hours used:** [More Information Needed]
148
- - **Cloud Provider:** [More Information Needed]
149
- - **Compute Region:** [More Information Needed]
150
- - **Carbon Emitted:** [More Information Needed]
151
 
152
- ## Technical Specifications [optional]
153
 
154
- ### Model Architecture and Objective
155
 
156
- [More Information Needed]
157
 
158
- ### Compute Infrastructure
 
 
 
 
159
 
160
- [More Information Needed]
161
 
162
- #### Hardware
163
 
164
- [More Information Needed]
165
 
166
- #### Software
167
 
168
- [More Information Needed]
169
 
170
- ## Citation [optional]
 
171
 
172
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
173
 
174
- **BibTeX:**
175
 
176
- [More Information Needed]
 
177
 
178
- **APA:**
179
 
180
- [More Information Needed]
 
 
 
 
 
181
 
182
- ## Glossary [optional]
183
 
184
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
 
185
 
186
- [More Information Needed]
187
 
188
- ## More Information [optional]
189
 
190
- [More Information Needed]
191
 
192
- ## Model Card Authors [optional]
 
 
 
 
 
 
 
 
193
 
194
- [More Information Needed]
195
 
196
- ## Model Card Contact
 
 
 
 
 
 
 
 
197
 
198
- [More Information Needed]
199
 
200
 
201
- ## Training procedure
 
 
 
 
202
 
 
203
 
204
- The following `bitsandbytes` quantization config was used during training:
205
- - quant_method: bitsandbytes
206
- - load_in_8bit: False
207
- - load_in_4bit: True
208
- - llm_int8_threshold: 6.0
209
- - llm_int8_skip_modules: None
210
- - llm_int8_enable_fp32_cpu_offload: False
211
- - llm_int8_has_fp16_weight: False
212
- - bnb_4bit_quant_type: nf4
213
- - bnb_4bit_use_double_quant: True
214
- - bnb_4bit_compute_dtype: bfloat16
215
 
216
- ### Framework versions
217
 
 
 
 
 
 
218
 
219
- - PEFT 0.6.0.dev0
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ pipeline_tag: text-generation
5
+ inference: false
6
+ tags:
7
+ - facebook
8
+ - meta
9
+ - pytorch
10
+ - llama
11
+ - llama-2
12
+ - functions
13
+ - function calling
14
+ - sharded
15
  ---
16
+ # Function Calling Llama 2 + Mistral Models (version 2)
17
+ - Function calling Llama extends the hugging face Llama 2 models with function calling capabilities.
18
+ - The model responds with a structured json argument with the function name and arguments.
19
 
20
+ **Recent Updates**
21
+ - October 11th 2023 -> added Mistral 7B with function calling
22
+ - October 11th 2023 -> new models pushed, trained on an improved underlying dataset
23
 
24
+ **Improvements with v2**
25
+ 1. Shortened syntax: Only function descriptions are needed for inference and no added instruction is required.
26
+ 2. Function descriptions are moved outside of the system prompt. This avoids the behaviour of function calling being affected by how the system prompt had been trained to influence the model.
27
 
28
+ Available models:
29
+ - Llama-7B-chat with function calling ([Base Model](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2)), ([PEFT Adapters](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-adapters-v2)), ([GGUF - files are in the main branch of the base model]) - Free
30
+ - Mistral-7B-Instruct-v0.1 with function calling ([Base Model](https://huggingface.co/Trelis/Mistral-7B-Instruct-v0.1-function-calling-v2)), ([PEFT Adapters](https://huggingface.co/Trelis/Mistral-7B-Instruct-v0.1-function-calling-adapters-v2)) - Paid, [purchase here](https://buy.stripe.com/cN2cNybSdgyncV25kQ)
31
+ - Llama-13B-chat with function calling ([Base Model](https://huggingface.co/Trelis/Llama-2-13b-chat-hf-function-calling-v2)), ([PEFT Adapters](https://huggingface.co/Trelis/Llama-2-13b-chat-hf-function-calling-adapters-v2)) - Paid, [purchase here](https://buy.stripe.com/9AQ7te3lHdmbdZ68wz)
32
+ - CodeLlama-34B-Instruct with function calling ([Base Model](https://huggingface.co/Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2)), ([PEFT Adapters](https://huggingface.co/Trelis/CodeLlama-34b-Instruct-hf-function-calling-adapters-v2)) - Paid, [purchase here](https://buy.stripe.com/cN27teg8t2Hx5sA8wM)
33
+ - Llama-70B-chat with function calling ([Base Model](https://huggingface.co/Trelis/Llama-2-70b-chat-hf-function-calling-v2)), ([PEFT Adapters](https://huggingface.co/Trelis/Llama-2-70b-chat-hf-function-calling-adapters-v2)) - Paid, [purchase here](https://buy.stripe.com/8wMdRC1dzci75sA4gy)
34
 
35
+ ## Performance and Tips
36
 
37
+ 1. Larger models are better at handling function calling. The cross entropy training losses are approximately 0.5 for 7B, 0.4 for 13B, 0.3 for 70B. The absolute numbers don't mean anything but the relative values offer a sense of relative performance.
38
+ 1. Provide very clear function descriptions, including whether the arguments are required or what the default values should be.
39
+ 1. Make sure to post-process the language model's response to check that all necessary information is provided by the user. If not, prompt the user to let them know they need to provide more info (e.g. their name, order number etc.)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
+ Check out this video overview of performance [here](https://www.loom.com/share/8d7467de95e04af29ff428c46286946c?sid=683c970e-6063-4f1e-b184-894cc1d96115)
42
 
43
+ ## Licensing
44
 
45
+ Llama-7B with function calling is licensed according to the Meta Community license.
46
 
47
+ Mistral-7B, Llama-13B, Code-llama-34b, Llama-70B and Falcon-180B with function calling require the purchase of access.
48
+ - Commercial license purchase required per user.
49
+ - Licenses are not transferable to other users/entities.
50
 
51
+ Use of all Llama models with function calling is further subject to terms in the [Meta license](https://ai.meta.com/resources/models-and-libraries/llama-downloads/).
52
 
53
+ ## Dataset
54
 
55
+ The dataset used for training this model can be found at [Trelis Function Calling Extended Dataset](https://huggingface.co/datasets/Trelis/function_calling_extended).
56
 
57
+ ## Inference
58
 
59
+ **Quick Start in Google Colab**
60
+ Try out this notebook [fLlama_Inference notebook](https://colab.research.google.com/drive/1Ow5cQ0JNv-vXsT-apCceH6Na3b4L7JyW?usp=sharing)
61
 
62
+ **Commercial Applications**
63
+ You can this model with [text-generation-interface](https://github.com/huggingface/text-generation-inference) and [chat-ui](https://github.com/huggingface/chat-ui)
64
 
65
+ Here is the [github for setup](https://github.com/TrelisResearch/tgi-chat-ui-function-calling)
66
 
67
+ And here is a video showing it working with [llama-2-7b-chat-hf-function-calling-v2](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2) (note that we've now moved to v2)
68
+
69
+ Note that you'll still need to code the server-side handling of making the function calls (which obviously depends on what functions you want to use).
70
+
71
+ **Run on your laptop**
72
+ Run on your laptop [video and juypter notebook](https://youtu.be/nDJMHFsBU7M)
73
+
74
+ ## Syntax
75
+
76
+ ### Prompt Templates
77
+
78
+ The function descriptions must be wrapped within a function block. You can put this function below before or after the system message block.
79
+
80
+ Example without a system message:
81
+ ```
82
+ # Define the roles and markers
83
+ B_INST, E_INST = "[INST]", "[/INST]"
84
+ B_FUNC, E_FUNC = "<FUNCTIONS>", "</FUNCTIONS>\n\n"
85
+
86
+ functionList = {function_1_metadata}{function_2_metadata}...
87
+ user_prompt = '...'
88
+
89
+ # Format your prompt template
90
+ prompt = f"{B_FUNC}{functionList.strip()}{E_FUNC}{B_INST} {user_prompt.strip()} {E_INST}\n\n"
91
+ ```
92
 
93
+ Example with a system message:
94
+ ```
95
+ # Define the roles and markers
96
+ B_INST, E_INST = "[INST]", "[/INST]"
97
+ B_FUNC, E_FUNC = "<FUNCTIONS>", "</FUNCTIONS>\n\n"
98
+ B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
99
 
100
+ # assuming functionList is defined as above
101
+ system_prompt = '...'
102
+ user_prompt = '...'
103
 
104
+ # Format your prompt template
105
+ prompt = f"{B_FUNC}{functionList.strip()}{E_FUNC}{B_INST} {B_SYS}{system_prompt.strip()}{E_SYS}{user_prompt.strip()} {E_INST}\n\n"
106
 
107
+ ```
108
+ Notice that the function block is placed at the very start of the sequence, before 'B_INST'.
109
 
110
+ ### Function Metadata Template
111
 
112
+ functionMetadata should be a string representation of a JSON object, like this:
113
 
114
+ ```
115
+ "functionMetadata": {
116
+ "function": "search_bing",
117
+ "description": "Search the web for content on Bing. This allows users to search online/the internet/the web for content.",
118
+ "arguments": [
119
+ {
120
+ "name": "query",
121
+ "type": "string",
122
+ "description": "The search query string"
123
+ }
124
+ ]
125
+ }
126
+ '''
127
+ ```
128
 
129
+ and the language model should respond with a json object formatted like this:
130
+ ```
131
+ {
132
+ "function": "function_name",
133
+ "arguments": {
134
+ "argument1": "argument_value",
135
+ "argument2": "argument_value"
136
+ }
137
+ }
138
+ ```
139
 
140
+ It is recommended to handle cases where:
141
+ - There is no json object in the response
142
+ - The response contains text in addition to the json response
143
 
144
+ ### Sample functionList
145
+
146
+ ```
147
+ {
148
+ "function": "search_bing",
149
+ "description": "Search the web for content on Bing. This allows users to search online/the internet/the web for content.",
150
+ "arguments": [
151
+ {
152
+ "name": "query",
153
+ "type": "string",
154
+ "description": "The search query string"
155
+ }
156
+ ]
157
+ }
158
+
159
+ {
160
+ "function": "search_arxiv",
161
+ "description": "Search for research papers on ArXiv. Make use of AND, OR and NOT operators as appropriate to join terms within the query.",
162
+ "arguments": [
163
+ {
164
+ "name": "query",
165
+ "type": "string",
166
+ "description": "The search query string"
167
+ }
168
+ ]
169
+ }
170
+ ```
171
+
172
+ ### Training Set Argument Types
173
+ Models were fine-tuned on argument types including strings, numbers and arrays. The training set includes function calls with 0, 1, 2 or 3 arguments. The larger the model the better it will generalise beyond these types.
174
+
175
+ Here is a function call with an array:
176
+ ```
177
+ { "function": "delete_file", "arguments": { "fileNames": [ "Dissecting Transformer Length Extrapolation via The Lens of Receptive Field Analysis", "Luna- Linear Unified Nested Attention", "Substack_Inc_2021_2020_GAAP_Audited_Financials" ] } }
178
+ ```
179
+ Here is a function call with three arguments:
180
+ ```
181
+ { "function": "save_chat", "arguments": { "fileName": "KiteDiscussion", "fileDescription": "Notes on one and two stringed kites", "fileContent": "--- **Types of Kite** There are one and two string kites. The two string ones are easier to control, although you can get the cords tangled. The one-stringed ones are sometimes used for kite fights, and you lose the kite and have to run after it if the string breaks. ---" } }
182
+ ```
183
+
184
+ ~
185
+
186
+ Below follows information on the original Llama 2 model...
187
+
188
+ ~
189
+
190
+ # **Llama 2**
191
+ Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format. Links to other models can be found in the index at the bottom.
192
 
193
+ ## Model Details
194
+ *Note: Use of this model is governed by the Meta license. In order to download the model weights and tokenizer, please visit the [website](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and accept our License before requesting access here.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
195
 
196
+ Meta developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM.
197
 
198
+ **Model Developers** Meta
199
 
200
+ **Variations** Llama 2 comes in a range of parameter sizes 7B, 13B, and 70B as well as pretrained and fine-tuned variations.
201
 
202
+ **Input** Models input text only.
 
 
 
 
203
 
204
+ **Output** Models generate text only.
205
 
206
+ **Model Architecture** Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety.
207
 
 
208
 
209
+ ||Training Data|Params|Content Length|GQA|Tokens|LR|
210
+ |---|---|---|---|---|---|---|
211
+ |Llama 2|*A new mix of publicly available online data*|7B|4k|&#10007;|2.0T|3.0 x 10<sup>-4</sup>|
212
+ |Llama 2|*A new mix of publicly available online data*|13B|4k|&#10007;|2.0T|3.0 x 10<sup>-4</sup>|
213
+ |Llama 2|*A new mix of publicly available online data*|70B|4k|&#10004;|2.0T|1.5 x 10<sup>-4</sup>|
214
 
215
+ *Llama 2 family of models.* Token counts refer to pretraining data only. All models are trained with a global batch-size of 4M tokens. Bigger models - 70B -- use Grouped-Query Attention (GQA) for improved inference scalability.
216
 
217
+ **Model Dates** Llama 2 was trained between January 2023 and July 2023.
218
 
219
+ **Status** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback.
220
 
221
+ **License** A custom commercial license is available at: [https://ai.meta.com/resources/models-and-libraries/llama-downloads/](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)
222
 
223
+ **Research Paper** ["Llama-2: Open Foundation and Fine-tuned Chat Models"](arxiv.org/abs/2307.09288)
224
 
225
+ ## Intended Use
226
+ **Intended Use Cases** Llama 2 is intended for commercial and research use in English. Tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks.
227
 
228
+ To get the expected features and performance for the chat versions, a specific formatting needs to be followed, including the `INST` and `<<SYS>>` tags, `BOS` and `EOS` tokens, and the whitespaces and breaklines in between (we recommend calling `strip()` on inputs to avoid double-spaces). See our reference code in github for details: [`chat_completion`](https://github.com/facebookresearch/llama/blob/main/llama/generation.py#L212).
229
 
230
+ **Out-of-scope Uses** Use in any manner that violates applicable laws or regulations (including trade compliance laws).Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for Llama 2.
231
 
232
+ ## Hardware and Software
233
+ **Training Factors** We used custom training libraries, Meta's Research Super Cluster, and production clusters for pretraining. Fine-tuning, annotation, and evaluation were also performed on third-party cloud compute.
234
 
235
+ **Carbon Footprint** Pretraining utilized a cumulative 3.3M GPU hours of computation on hardware of type A100-80GB (TDP of 350-400W). Estimated total emissions were 539 tCO2eq, 100% of which were offset by Meta’s sustainability program.
236
 
237
+ ||Time (GPU hours)|Power Consumption (W)|Carbon Emitted(tCO<sub>2</sub>eq)|
238
+ |---|---|---|---|
239
+ |Llama 2 7B|184320|400|31.22|
240
+ |Llama 2 13B|368640|400|62.44|
241
+ |Llama 2 70B|1720320|400|291.42|
242
+ |Total|3311616||539.00|
243
 
244
+ **CO<sub>2</sub> emissions during pretraining.** Time: total GPU time required for training each model. Power Consumption: peak power capacity per GPU device for the GPUs used adjusted for power usage efficiency. 100% of the emissions are directly offset by Meta's sustainability program, and because we are openly releasing these models, the pretraining costs do not need to be incurred by others.
245
 
246
+ ## Training Data
247
+ **Overview** Llama 2 was pretrained on 2 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over one million new human-annotated examples. Neither the pretraining nor the fine-tuning datasets include Meta user data.
248
 
249
+ **Data Freshness** The pretraining data has a cutoff of September 2022, but some tuning data is more recent, up to July 2023.
250
 
251
+ ## Evaluation Results
252
 
253
+ In this section, we report the results for the Llama 1 and Llama 2 models on standard academic benchmarks.For all the evaluations, we use our internal evaluations library.
254
 
255
+ |Model|Size|Code|Commonsense Reasoning|World Knowledge|Reading Comprehension|Math|MMLU|BBH|AGI Eval|
256
+ |---|---|---|---|---|---|---|---|---|---|
257
+ |Llama 1|7B|14.1|60.8|46.2|58.5|6.95|35.1|30.3|23.9|
258
+ |Llama 1|13B|18.9|66.1|52.6|62.3|10.9|46.9|37.0|33.9|
259
+ |Llama 1|33B|26.0|70.0|58.4|67.6|21.4|57.8|39.8|41.7|
260
+ |Llama 1|65B|30.7|70.7|60.5|68.6|30.8|63.4|43.5|47.6|
261
+ |Llama 2|7B|16.8|63.9|48.9|61.3|14.6|45.3|32.6|29.3|
262
+ |Llama 2|13B|24.5|66.9|55.4|65.8|28.7|54.8|39.4|39.1|
263
+ |Llama 2|70B|**37.5**|**71.9**|**63.6**|**69.4**|**35.2**|**68.9**|**51.2**|**54.2**|
264
 
265
+ **Overall performance on grouped academic benchmarks.** *Code:* We report the average pass@1 scores of our models on HumanEval and MBPP. *Commonsense Reasoning:* We report the average of PIQA, SIQA, HellaSwag, WinoGrande, ARC easy and challenge, OpenBookQA, and CommonsenseQA. We report 7-shot results for CommonSenseQA and 0-shot results for all other benchmarks. *World Knowledge:* We evaluate the 5-shot performance on NaturalQuestions and TriviaQA and report the average. *Reading Comprehension:* For reading comprehension, we report the 0-shot average on SQuAD, QuAC, and BoolQ. *MATH:* We report the average of the GSM8K (8 shot) and MATH (4 shot) benchmarks at top 1.
266
 
267
+ |||TruthfulQA|Toxigen|
268
+ |---|---|---|---|
269
+ |Llama 1|7B|27.42|23.00|
270
+ |Llama 1|13B|41.74|23.08|
271
+ |Llama 1|33B|44.19|22.57|
272
+ |Llama 1|65B|48.71|21.77|
273
+ |Llama 2|7B|33.29|**21.25**|
274
+ |Llama 2|13B|41.86|26.10|
275
+ |Llama 2|70B|**50.18**|24.60|
276
 
277
+ **Evaluation of pretrained LLMs on automatic safety benchmarks.** For TruthfulQA, we present the percentage of generations that are both truthful and informative (the higher the better). For ToxiGen, we present the percentage of toxic generations (the smaller the better).
278
 
279
 
280
+ |||TruthfulQA|Toxigen|
281
+ |---|---|---|---|
282
+ |Llama-2-Chat|7B|57.04|**0.00**|
283
+ |Llama-2-Chat|13B|62.18|**0.00**|
284
+ |Llama-2-Chat|70B|**64.14**|0.01|
285
 
286
+ **Evaluation of fine-tuned LLMs on different safety datasets.** Same metric definitions as above.
287
 
288
+ ## Ethical Considerations and Limitations
289
+ Llama 2 is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, Llama 2’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 2, developers should perform safety testing and tuning tailored to their specific applications of the model.
 
 
 
 
 
 
 
 
 
290
 
291
+ Please see the Responsible Use Guide available at [https://ai.meta.com/llama/responsible-use-guide/](https://ai.meta.com/llama/responsible-use-guide)
292
 
293
+ ## Reporting Issues
294
+ Please report any software “bug,” or other problems with the models through one of the following means:
295
+ - Reporting issues with the model: [github.com/facebookresearch/llama](http://github.com/facebookresearch/llama)
296
+ - Reporting problematic content generated by the model: [developers.facebook.com/llama_output_feedback](http://developers.facebook.com/llama_output_feedback)
297
+ - Reporting bugs and security concerns: [facebook.com/whitehat/info](http://facebook.com/whitehat/info)
298
 
299
+ ## Llama Model Index
300
+ |Model|Llama2|Llama2-hf|Llama2-chat|Llama2-chat-hf|
301
+ |---|---|---|---|---|
302
+ |7B| [Link](https://huggingface.co/llamaste/Llama-2-7b) | [Link](https://huggingface.co/llamaste/Llama-2-7b-hf) | [Link](https://huggingface.co/llamaste/Llama-2-7b-chat) | [Link](https://huggingface.co/llamaste/Llama-2-7b-chat-hf)|
303
+ |13B| [Link](https://huggingface.co/llamaste/Llama-2-13b) | [Link](https://huggingface.co/llamaste/Llama-2-13b-hf) | [Link](https://huggingface.co/llamaste/Llama-2-13b-chat) | [Link](https://huggingface.co/llamaste/Llama-2-13b-hf)|
304
+ |70B| [Link](https://huggingface.co/llamaste/Llama-2-70b) | [Link](https://huggingface.co/llamaste/Llama-2-70b-hf) | [Link](https://huggingface.co/llamaste/Llama-2-70b-chat) | [Link](https://huggingface.co/llamaste/Llama-2-70b-hf)|