prince-canuma
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -41,15 +41,12 @@ This is the model card of a 🤗 transformers model that has been pushed on the
|
|
41 |
## Uses
|
42 |
|
43 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
44 |
-
|
45 |
-
### Direct Use
|
46 |
-
|
47 |
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
48 |
|
49 |
You can use this model to build local/cloud RAG applications.
|
50 |
It can serve as the:
|
51 |
- Answer synthesizer,
|
52 |
-
- Summarizer
|
53 |
- Or query rewriter model.
|
54 |
|
55 |
### Limitations
|
@@ -160,7 +157,6 @@ Output:
|
|
160 |
I used [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) dataset, a new curated subset of our OpenOrca data.
|
161 |
In the course of this study, the [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) dataset was used, representing a meticulously curated subset derived from the broader OpenOrca dataset. This release provides an efficient means of reaching performance on-par with using larger slices of the [OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca), while only including ~500k GPT-4 completions.
|
162 |
|
163 |
-
|
164 |
Subsequently, two distinct subsets were crafted, comprising 102,000 and 1,000 samples, denoted as:
|
165 |
|
166 |
- [prince-canuma/SmallOrca](https://huggingface.co/datasets/prince-canuma/SmallOrca)
|
@@ -173,24 +169,39 @@ succinct answers to prompts are often favored, especially for straightforward qu
|
|
173 |
|
174 |
### Training Procedure
|
175 |
|
176 |
-
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
177 |
-
[TODO]
|
178 |
-
|
179 |
#### Preprocessing
|
180 |
|
181 |
1. Convert dataset to chatML format
|
182 |
2. Remove all samples with more than 2048 tokens (Phi-2 context size)
|
183 |
3. Mask instructions (System and User) at training time.
|
184 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
185 |
|
186 |
#### Training Hyperparameters
|
187 |
|
188 |
-
- **Training regime:** bf16 mixed precision
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
189 |
|
190 |
-
|
|
|
|
|
191 |
|
192 |
## Evaluation
|
193 |
|
|
|
194 |
<!-- This section describes the evaluation protocols and provides the results. -->
|
195 |
|
196 |
We evaluate models on 7 key benchmarks using the Eleuther AI Language Model Evaluation Harness , a unified framework to test generative language models on a large number of different evaluation tasks.
|
|
|
41 |
## Uses
|
42 |
|
43 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
|
|
|
|
|
|
44 |
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
45 |
|
46 |
You can use this model to build local/cloud RAG applications.
|
47 |
It can serve as the:
|
48 |
- Answer synthesizer,
|
49 |
+
- Summarizer,
|
50 |
- Or query rewriter model.
|
51 |
|
52 |
### Limitations
|
|
|
157 |
I used [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) dataset, a new curated subset of our OpenOrca data.
|
158 |
In the course of this study, the [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca) dataset was used, representing a meticulously curated subset derived from the broader OpenOrca dataset. This release provides an efficient means of reaching performance on-par with using larger slices of the [OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca), while only including ~500k GPT-4 completions.
|
159 |
|
|
|
160 |
Subsequently, two distinct subsets were crafted, comprising 102,000 and 1,000 samples, denoted as:
|
161 |
|
162 |
- [prince-canuma/SmallOrca](https://huggingface.co/datasets/prince-canuma/SmallOrca)
|
|
|
169 |
|
170 |
### Training Procedure
|
171 |
|
|
|
|
|
|
|
172 |
#### Preprocessing
|
173 |
|
174 |
1. Convert dataset to chatML format
|
175 |
2. Remove all samples with more than 2048 tokens (Phi-2 context size)
|
176 |
3. Mask instructions (System and User) at training time.
|
177 |
|
178 |
+
#### LoRA Config
|
179 |
+
- **lora_alpha:** 128,
|
180 |
+
- **lora_dropout:** 0.05,
|
181 |
+
- **r:** 256,
|
182 |
+
- **bias:** "none",
|
183 |
+
- **target_modules:** "all-linear",
|
184 |
+
- **task_type:** "CAUSAL_LM",
|
185 |
|
186 |
#### Training Hyperparameters
|
187 |
|
188 |
+
- **Training regime:** bf16 mixed precision, <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
189 |
+
- **max_steps:** 100,
|
190 |
+
- **per_device_train_batch_size:** 2,
|
191 |
+
- **gradient_accumulation_steps:** 2,
|
192 |
+
- **optim:** "adamw_torch_fused",
|
193 |
+
- **learning_rate:** 2e-4,
|
194 |
+
- **max_grad_norm:** 0.3,
|
195 |
+
- **warmup_ratio:** 0.03,
|
196 |
+
- **lr_scheduler_type:** "constant",
|
197 |
|
198 |
+
#### Trainer
|
199 |
+
- **max_seq_length:** 1744,
|
200 |
+
- **data_collator:** DataCollatorForCompletionOnlyLM
|
201 |
|
202 |
## Evaluation
|
203 |
|
204 |
+
<img src="truthfulQA.png" width="800" alt="Damysus-2.7B-chat truthfulQA benchmark results"/>
|
205 |
<!-- This section describes the evaluation protocols and provides the results. -->
|
206 |
|
207 |
We evaluate models on 7 key benchmarks using the Eleuther AI Language Model Evaluation Harness , a unified framework to test generative language models on a large number of different evaluation tasks.
|