prince-canuma commited on
Commit
cb5f15f
·
verified ·
1 Parent(s): dc36d1e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -10
README.md CHANGED
@@ -41,15 +41,12 @@ This is the model card of a 🤗 transformers model that has been pushed on the
41
  ## Uses
42
 
43
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
-
45
- ### Direct Use
46
-
47
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
 
49
  You can use this model to build local/cloud RAG applications.
50
  It can serve as the:
51
  - Answer synthesizer,
52
- - Summarizer
53
  - Or query rewriter model.
54
 
55
  ### Limitations
@@ -160,7 +157,6 @@ Output:
160
  I used [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) dataset, a new curated subset of our OpenOrca data.
161
  In the course of this study, the [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) dataset was used, representing a meticulously curated subset derived from the broader OpenOrca dataset. This release provides an efficient means of reaching performance on-par with using larger slices of the [OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca), while only including ~500k GPT-4 completions.
162
 
163
-
164
  Subsequently, two distinct subsets were crafted, comprising 102,000 and 1,000 samples, denoted as:
165
 
166
  - [prince-canuma/SmallOrca](https://huggingface.co/datasets/prince-canuma/SmallOrca)
@@ -173,24 +169,39 @@ succinct answers to prompts are often favored, especially for straightforward qu
173
 
174
  ### Training Procedure
175
 
176
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
177
- [TODO]
178
-
179
  #### Preprocessing
180
 
181
  1. Convert dataset to chatML format
182
  2. Remove all samples with more than 2048 tokens (Phi-2 context size)
183
  3. Mask instructions (System and User) at training time.
184
 
 
 
 
 
 
 
 
185
 
186
  #### Training Hyperparameters
187
 
188
- - **Training regime:** bf16 mixed precision <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 
 
 
 
 
 
 
 
189
 
190
- [TODO]
 
 
191
 
192
  ## Evaluation
193
 
 
194
  <!-- This section describes the evaluation protocols and provides the results. -->
195
 
196
  We evaluate models on 7 key benchmarks using the Eleuther AI Language Model Evaluation Harness , a unified framework to test generative language models on a large number of different evaluation tasks.
 
41
  ## Uses
42
 
43
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 
 
 
44
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
45
 
46
  You can use this model to build local/cloud RAG applications.
47
  It can serve as the:
48
  - Answer synthesizer,
49
+ - Summarizer,
50
  - Or query rewriter model.
51
 
52
  ### Limitations
 
157
  I used [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) dataset, a new curated subset of our OpenOrca data.
158
  In the course of this study, the [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) dataset was used, representing a meticulously curated subset derived from the broader OpenOrca dataset. This release provides an efficient means of reaching performance on-par with using larger slices of the [OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca), while only including ~500k GPT-4 completions.
159
 
 
160
  Subsequently, two distinct subsets were crafted, comprising 102,000 and 1,000 samples, denoted as:
161
 
162
  - [prince-canuma/SmallOrca](https://huggingface.co/datasets/prince-canuma/SmallOrca)
 
169
 
170
  ### Training Procedure
171
 
 
 
 
172
  #### Preprocessing
173
 
174
  1. Convert dataset to chatML format
175
  2. Remove all samples with more than 2048 tokens (Phi-2 context size)
176
  3. Mask instructions (System and User) at training time.
177
 
178
+ #### LoRA Config
179
+ - **lora_alpha:** 128,
180
+ - **lora_dropout:** 0.05,
181
+ - **r:** 256,
182
+ - **bias:** "none",
183
+ - **target_modules:** "all-linear",
184
+ - **task_type:** "CAUSAL_LM",
185
 
186
  #### Training Hyperparameters
187
 
188
+ - **Training regime:** bf16 mixed precision, <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
189
+ - **max_steps:** 100,
190
+ - **per_device_train_batch_size:** 2,
191
+ - **gradient_accumulation_steps:** 2,
192
+ - **optim:** "adamw_torch_fused",
193
+ - **learning_rate:** 2e-4,
194
+ - **max_grad_norm:** 0.3,
195
+ - **warmup_ratio:** 0.03,
196
+ - **lr_scheduler_type:** "constant",
197
 
198
+ #### Trainer
199
+ - **max_seq_length:** 1744,
200
+ - **data_collator:** DataCollatorForCompletionOnlyLM
201
 
202
  ## Evaluation
203
 
204
+ <img src="truthfulQA.png" width="800" alt="Damysus-2.7B-chat truthfulQA benchmark results"/>
205
  <!-- This section describes the evaluation protocols and provides the results. -->
206
 
207
  We evaluate models on 7 key benchmarks using the Eleuther AI Language Model Evaluation Harness , a unified framework to test generative language models on a large number of different evaluation tasks.