mgoNeo4j commited on
Commit
7bd3ab7
1 Parent(s): 5c2d033

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -7
README.md CHANGED
@@ -32,6 +32,7 @@ Please note, this is part of ongoing research and exploration, aimed at highligh
32
 
33
  An overview of the finetuned models and benchmarking results are shared at [Link](TODO Link to Blogposts)
34
 
 
35
  <!-- - **Developed by:** [More Information Needed]
36
  - **Funded by [optional]:** [More Information Needed]
37
  - **Shared by [optional]:** [More Information Needed]
@@ -92,28 +93,61 @@ Use the code below to get started with the model.
92
 
93
  [More Information Needed] -->
94
 
95
- <!-- ## Training Details
96
 
97
- ### Training Data -->
98
 
99
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
100
 
101
- <!-- [More Information Needed]
102
 
103
- ### Training Procedure -->
104
 
105
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 
 
 
 
 
 
 
 
 
 
106
 
107
  <!-- #### Preprocessing [optional]
108
 
109
  [More Information Needed]
110
  -->
111
 
112
- <!-- #### Training Hyperparameters
113
 
114
- - **Training regime:** [More Information Needed] -->
115
  <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
116
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
  <!-- #### Speeds, Sizes, Times [optional] -->
118
 
119
  <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 
32
 
33
  An overview of the finetuned models and benchmarking results are shared at [Link](TODO Link to Blogposts)
34
 
35
+
36
  <!-- - **Developed by:** [More Information Needed]
37
  - **Funded by [optional]:** [More Information Needed]
38
  - **Shared by [optional]:** [More Information Needed]
 
93
 
94
  [More Information Needed] -->
95
 
96
+ ## Training Details
97
 
98
+ <!-- ### Training Data -->
99
 
100
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
101
 
102
+ <!-- [More Information Needed]-->
103
 
104
+ ### Training Procedure
105
 
106
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
107
+ Used RunPod with following setup:
108
+
109
+ * 1 x A100 PCIe
110
+ * 31 vCPU 117 GB RAM
111
+ * runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04
112
+ * On-Demand - Secure Cloud
113
+ * 60 GB Disk
114
+ * 60 GB Pod Volume
115
+ * ~16 hours
116
+ * $30
117
 
118
  <!-- #### Preprocessing [optional]
119
 
120
  [More Information Needed]
121
  -->
122
 
123
+ #### Training Hyperparameters
124
 
125
+ <!-- - **Training regime:** -->
126
  <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
127
+ * lora_config = LoraConfig(
128
+ r=64,
129
+ lora_alpha=64,
130
+ target_modules=target_modules,
131
+ lora_dropout=0.05,
132
+ bias="none",
133
+ task_type="CAUSAL_LM",
134
+ )
135
+ * sft_config = SFTConfig(
136
+ dataset_text_field=dataset_text_field,
137
+ per_device_train_batch_size=4,
138
+ gradient_accumulation_steps=8,
139
+ dataset_num_proc=16,
140
+ max_seq_length=1600,
141
+ logging_dir="./logs",
142
+ num_train_epochs=1,
143
+ learning_rate=2e-5,
144
+ save_steps=5,
145
+ save_total_limit=1,
146
+ logging_steps=5,
147
+ output_dir="outputs",
148
+ optim="paged_adamw_8bit",
149
+ save_strategy="steps",
150
+ )
151
  <!-- #### Speeds, Sizes, Times [optional] -->
152
 
153
  <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->