sohi-g commited on
Commit
481ced6
·
1 Parent(s): 2209007

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +243 -15
README.md CHANGED
@@ -1,21 +1,249 @@
1
  ---
2
- library_name: peft
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
- ## Training procedure
5
 
 
6
 
7
- The following `bitsandbytes` quantization config was used during training:
8
- - quant_method: bitsandbytes
9
- - load_in_8bit: False
10
- - load_in_4bit: True
11
- - llm_int8_threshold: 6.0
12
- - llm_int8_skip_modules: None
13
- - llm_int8_enable_fp32_cpu_offload: False
14
- - llm_int8_has_fp16_weight: False
15
- - bnb_4bit_quant_type: nf4
16
- - bnb_4bit_use_double_quant: True
17
- - bnb_4bit_compute_dtype: float16
18
- ### Framework versions
19
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
- - PEFT 0.4.0
 
1
  ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - briefai/LongShort-Dataset
5
+ language:
6
+ - en
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - pytorch
10
+ - llama-2
11
+ - Gen-AI
12
+ - Finance
13
+ - KPI Extraction
14
  ---
15
+ # LongShort-Llama-2-13B
16
 
17
+ ### Model Description
18
 
19
+ LongShort-Llama-2-13B is a large language model fine-tuned on earnings call documents to extract financial KPIs from the earnings call documents. It is based on the Llama-2-7B Architecture.
20
+ - Model creator: [Brief AI](https://huggingface.co/briefai)
21
+ - Original model: [Llama 2 13B Chat](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf)
22
+
23
+ ### Dataset Description
24
+ - Data Source: Factiva
25
+ - Data Description: 28K+ Earnings Call Documents
26
+ - Data Scope: 1K+ public companies
27
+ - Fine Tuning Data: Collection of 60K+ samples.
 
 
 
28
 
29
+ ## Prompt template: LongShort-Llama-2-13B
30
+
31
+ ```
32
+ [INST]Given the context, answer the question.
33
+
34
+ ### Question:
35
+ Extract all the finance-based performance indicators and evaluation metrics.
36
+
37
+ ### Context:
38
+ {context}
39
+
40
+ ### Answer:
41
+ [/INST]
42
+
43
+ ```
44
+
45
+ ## Basics
46
+ *This section provides information about the model type, version, license, funders, release date, developers, and contact information.*
47
+ *It is useful for anyone who wants to reference the model.*
48
+
49
+
50
+ **Developed by:** [Brief AI Team](https://huggingface.co/briefai)
51
+
52
+ **Model Type:** Transformer-based Large Language Model
53
+
54
+ **Version:** 1.0.0
55
+
56
+ **Languages:** English
57
+
58
+ **License:** Apache 2.0
59
+
60
+ **Release Date Estimate:** Wednesday, 29.November.2023
61
+
62
+ **Send Questions to:** [email protected]
63
+
64
+ **Cite as:** Brief AI LongShort Language Model
65
+
66
+ **Funded by:** UChicago Data Science Institute
67
+
68
+ **Mentored by:** Nick Kadochnikov
69
+
70
+ ## Technical Specifications
71
+ *This section includes details about the model objective and architecture, and the compute infrastructure.*
72
+ *It is useful for people interested in model development.*
73
+
74
+ Please see [the LongShort training README](https://github.com/brief-ai-uchicago/LongShort-Dataset) for full details on replicating training.
75
+
76
+ ### Model Architecture and Objective
77
+
78
+ * Modified from Llama-2-13B
79
+
80
+ **Objective:** Financial KPI extraction from earnings call documents.
81
+
82
+ ### Hardware and Software - Compute Infrastructure
83
+
84
+ * 4 NVIDIA L4 GPUs & 48 vCPUs
85
+
86
+ * Environment: PyTorch (pytorch-2.0 w/ CUDA-11.8; see [Github link](https://github.com/pytorch/pytorch))
87
+
88
+ * CPU: GCP G2 Standard 48 (Platform: Intel Cascade Lake) (Accelerator Optimized)
89
+
90
+ * CPU memory: 192GB RAM
91
+
92
+ * GPU memory: 30GB per GPU
93
+
94
+ ## Training
95
+ *This section provides information about the training.*
96
+ *It is useful for people who want to learn more about the model inputs and training footprint.*
97
+
98
+ The following bits and bytes quantization config was used during training:
99
+
100
+ * quant_method: bitsandbytes
101
+ * load_in_8bit: False
102
+ * load_in_4bit: True
103
+ * llm_int8_threshold: 6.0
104
+ * llm_int8_skip_modules: None
105
+ * llm_int8_enable_fp32_cpu_offload: False
106
+ * llm_int8_has_fp16_weight: False
107
+ * bnb_4bit_quant_type: nf4
108
+ * bnb_4bit_use_double_quant: True
109
+ * bnb_4bit_compute_dtype: float16
110
+
111
+ Framework versions
112
+ * PEFT 0.4.0
113
+
114
+
115
+ ### Training Data
116
+ *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
117
+
118
+ Details for the dataset can be found in [LongShort Dataset](https://github.com/brief-ai-uchicago/LongShort-Dataset)
119
+
120
+ Training data includes:
121
+
122
+ - 5000 Earnings Call Documents
123
+
124
+ ## How to use
125
+
126
+ This model can be easily used and deployed using HuggingFace's ecosystem. This needs `transformers` and `accelerate` installed. The model can be downloaded as follows:
127
+
128
+ [LongShort-Llama-2-13B](https://huggingface.co/briefai/LongShort-Llama-2-13B)
129
+
130
+ ## Intended Use
131
+
132
+ This model is being created in order to enable public research on large language models (LLMs). LLMs are intended to be used for language generation or as a pre-trained base model that can be further fine-tuned for specific tasks. The use cases below are not exhaustive.
133
+
134
+ ### Direct Use
135
+
136
+ - Text generation
137
+
138
+ - Exploring characteristics of language generated by a language model
139
+
140
+ - Examples: Cloze tests, counterfactuals, generations with reframings
141
+
142
+ ### Downstream Use
143
+
144
+ - Tasks that leverage language models include: Information Extraction, Question Answering, Summarization
145
+
146
+
147
+ #### Out-of-scope Uses
148
+
149
+ Using the model in [high-stakes](#high-stakes) settings is out of scope for this model. The model is not designed for [critical decisions](#critical-decisions) nor uses with any material consequences on an individual's livelihood or wellbeing. The model outputs content that appears factual but may not be correct.
150
+
151
+ Out-of-scope Uses Include:
152
+
153
+ - Usage for evaluating or scoring individuals, such as for employment, education, or credit
154
+
155
+ - Applying the model for critical automatic decisions, generating factual content, creating reliable summaries, or generating predictions that must be correct
156
+
157
+ #### Misuse
158
+
159
+ Intentionally using the model for harm, violating [human rights](#human-rights), or other kinds of malicious activities, is a misuse of this model. This includes:
160
+
161
+ - Spam generation
162
+
163
+ - Disinformation and influence operations
164
+
165
+ - Disparagement and defamation
166
+
167
+ - Harassment and abuse
168
+
169
+ - [Deception](#deception)
170
+
171
+ - Unconsented impersonation and imitation
172
+
173
+ - Unconsented surveillance
174
+
175
+ - Generating content without attribution to the model, as specified in the [RAIL License, Use Restrictions](https://huggingface.co/spaces/bigscience/license)
176
+
177
+ ## Intended Users
178
+
179
+ ### Direct Users
180
+
181
+ - General Public
182
+
183
+ - Researchers
184
+
185
+ - Students
186
+
187
+ - Educators
188
+
189
+ - Engineers/developers
190
+
191
+ - Non-commercial entities
192
+
193
+ - Financial Industry
194
+
195
+ # Risks and Limitations
196
+ *This section identifies foreseeable harms and misunderstandings.*
197
+
198
+ Model may:
199
+
200
+ - Overrepresent some viewpoints and underrepresent others
201
+
202
+ - Contain stereotypes
203
+
204
+ - Contain [personal information](#personal-data-and-information)
205
+
206
+ - Generate:
207
+
208
+ - Hateful, abusive, or violent language
209
+
210
+ - Discriminatory or prejudicial language
211
+
212
+ - Content that may not be appropriate for all settings, including sexual content
213
+
214
+ - Make errors, including producing incorrect information as if it were factual
215
+
216
+ - Generate irrelevant or repetitive outputs
217
+
218
+ - Induce users into attributing human traits to it, such as sentience or consciousness
219
+
220
+
221
+ # Evaluation
222
+ *This section describes the evaluation protocols and provides the results.*
223
+
224
+ Result: LongShort-Llama-2-13B gives 43.4% accuracy on a validation set of 10% of the original training dataset.
225
+
226
+
227
+
228
+ **Train-time Evaluation:**
229
+
230
+ Final checkpoint after 700 epochs:
231
+
232
+ - Training Loss: 1.187
233
+
234
+
235
+
236
+ # Recommendations
237
+ *This section provides information on warnings and potential mitigations.*
238
+
239
+
240
+ - Indirect users should be made aware when the content they're working with is created by the LLM.
241
+
242
+ - Users should be aware of [Risks and Limitations](#risks-and-limitations), and include an appropriate age disclaimer or blocking interface as necessary.
243
+
244
+ - Users of the model should provide mechanisms for those affected to provide feedback, such as an email address for comments.
245
+
246
+
247
+ # Model Card Authors
248
+ Vishal Parameshwaran, Garima Sohi, Jose Gerala, Sanchit Narayan Kumar
249