shreyasmeher commited on
Commit
31fea59
·
verified ·
1 Parent(s): 104ef23

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -32
README.md CHANGED
@@ -38,6 +38,7 @@ inference:
38
  do_sample: true
39
  ---
40
 
 
41
 
42
  # ConflLlama: GTD-Finetuned LLaMA-3 8B
43
  - **Model Type:** GGUF quantized (q4_k_m and q8_0)
@@ -95,20 +96,6 @@ inference:
95
  - 4-bit Quantization: Enabled
96
  - Max Sequence Length: 1024
97
 
98
- ## Model Architecture
99
- The model uses a combination of efficient fine-tuning techniques and optimizations for handling conflict event classification:
100
-
101
- <p align="center">
102
- <img src="images/model-arch.png" alt="Model Training Architecture" width="800"/>
103
- </p>
104
-
105
- ### Data Processing Pipeline
106
- The preprocessing pipeline transforms raw GTD data into a format suitable for fine-tuning:
107
-
108
- <p align="center">
109
- <img src="images/preprocessing.png" alt="Data Preprocessing Pipeline" width="800"/>
110
- </p>
111
-
112
  ### Memory Optimizations
113
  - Used 4-bit quantization
114
  - Gradient accumulation steps: 8
@@ -136,24 +123,6 @@ This model is designed for:
136
  4. Results should be interpreted with appropriate context
137
 
138
 
139
- ## Training Logs
140
- <p align="center">
141
- <img src="images/training.png" alt="Training Logs" width="800"/>
142
- </p>
143
-
144
- The training logs show a successful training run with healthy convergence patterns:
145
-
146
- **Loss & Learning Rate:**
147
- - Loss decreases from 1.95 to ~0.90, with rapid initial improvement
148
- - Learning rate uses warmup/decay schedule, peaking at ~1.5x10^-4
149
-
150
- **Training Stability:**
151
- - Stable gradient norms (0.4-0.6 range)
152
- - Consistent GPU memory usage (~5800MB allocated, 7080MB reserved)
153
- - Steady training speed (~3.5s/step) with brief interruption at step 800
154
-
155
- The graphs indicate effective model training with good optimization dynamics and resource utilization. The loss vs. learning rate plot suggests optimal learning around 10^-4.
156
-
157
  ## Citation
158
  ```bibtex
159
  @misc{conflllama,
 
38
  do_sample: true
39
  ---
40
 
41
+ **This model is a alternative to my main ConflLlama model, the only difference being a more neutral chat template**
42
 
43
  # ConflLlama: GTD-Finetuned LLaMA-3 8B
44
  - **Model Type:** GGUF quantized (q4_k_m and q8_0)
 
96
  - 4-bit Quantization: Enabled
97
  - Max Sequence Length: 1024
98
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
  ### Memory Optimizations
100
  - Used 4-bit quantization
101
  - Gradient accumulation steps: 8
 
123
  4. Results should be interpreted with appropriate context
124
 
125
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
126
  ## Citation
127
  ```bibtex
128
  @misc{conflllama,