Update README.md
Browse files
README.md
CHANGED
@@ -32,6 +32,7 @@ Please note, this is part of ongoing research and exploration, aimed at highligh
|
|
32 |
|
33 |
An overview of the finetuned models and benchmarking results are shared at [Link](TODO Link to Blogposts)
|
34 |
|
|
|
35 |
<!-- - **Developed by:** [More Information Needed]
|
36 |
- **Funded by [optional]:** [More Information Needed]
|
37 |
- **Shared by [optional]:** [More Information Needed]
|
@@ -92,28 +93,61 @@ Use the code below to get started with the model.
|
|
92 |
|
93 |
[More Information Needed] -->
|
94 |
|
95 |
-
|
96 |
|
97 |
-
### Training Data -->
|
98 |
|
99 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
100 |
|
101 |
-
<!-- [More Information Needed]
|
102 |
|
103 |
-
### Training Procedure
|
104 |
|
105 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
106 |
|
107 |
<!-- #### Preprocessing [optional]
|
108 |
|
109 |
[More Information Needed]
|
110 |
-->
|
111 |
|
112 |
-
|
113 |
|
114 |
-
- **Training regime:**
|
115 |
<!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
116 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
117 |
<!-- #### Speeds, Sizes, Times [optional] -->
|
118 |
|
119 |
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
|
|
32 |
|
33 |
An overview of the finetuned models and benchmarking results are shared at [Link](TODO Link to Blogposts)
|
34 |
|
35 |
+
|
36 |
<!-- - **Developed by:** [More Information Needed]
|
37 |
- **Funded by [optional]:** [More Information Needed]
|
38 |
- **Shared by [optional]:** [More Information Needed]
|
|
|
93 |
|
94 |
[More Information Needed] -->
|
95 |
|
96 |
+
## Training Details
|
97 |
|
98 |
+
<!-- ### Training Data -->
|
99 |
|
100 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
101 |
|
102 |
+
<!-- [More Information Needed]-->
|
103 |
|
104 |
+
### Training Procedure
|
105 |
|
106 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
107 |
+
Used RunPod with following setup:
|
108 |
+
|
109 |
+
* 1 x A100 PCIe
|
110 |
+
* 31 vCPU 117 GB RAM
|
111 |
+
* runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04
|
112 |
+
* On-Demand - Secure Cloud
|
113 |
+
* 60 GB Disk
|
114 |
+
* 60 GB Pod Volume
|
115 |
+
* ~16 hours
|
116 |
+
* $30
|
117 |
|
118 |
<!-- #### Preprocessing [optional]
|
119 |
|
120 |
[More Information Needed]
|
121 |
-->
|
122 |
|
123 |
+
#### Training Hyperparameters
|
124 |
|
125 |
+
<!-- - **Training regime:** -->
|
126 |
<!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
127 |
+
* lora_config = LoraConfig(
|
128 |
+
r=64,
|
129 |
+
lora_alpha=64,
|
130 |
+
target_modules=target_modules,
|
131 |
+
lora_dropout=0.05,
|
132 |
+
bias="none",
|
133 |
+
task_type="CAUSAL_LM",
|
134 |
+
)
|
135 |
+
* sft_config = SFTConfig(
|
136 |
+
dataset_text_field=dataset_text_field,
|
137 |
+
per_device_train_batch_size=4,
|
138 |
+
gradient_accumulation_steps=8,
|
139 |
+
dataset_num_proc=16,
|
140 |
+
max_seq_length=1600,
|
141 |
+
logging_dir="./logs",
|
142 |
+
num_train_epochs=1,
|
143 |
+
learning_rate=2e-5,
|
144 |
+
save_steps=5,
|
145 |
+
save_total_limit=1,
|
146 |
+
logging_steps=5,
|
147 |
+
output_dir="outputs",
|
148 |
+
optim="paged_adamw_8bit",
|
149 |
+
save_strategy="steps",
|
150 |
+
)
|
151 |
<!-- #### Speeds, Sizes, Times [optional] -->
|
152 |
|
153 |
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|