Update README.md
Browse files
README.md
CHANGED
@@ -38,6 +38,7 @@ inference:
|
|
38 |
do_sample: true
|
39 |
---
|
40 |
|
|
|
41 |
|
42 |
# ConflLlama: GTD-Finetuned LLaMA-3 8B
|
43 |
- **Model Type:** GGUF quantized (q4_k_m and q8_0)
|
@@ -95,20 +96,6 @@ inference:
|
|
95 |
- 4-bit Quantization: Enabled
|
96 |
- Max Sequence Length: 1024
|
97 |
|
98 |
-
## Model Architecture
|
99 |
-
The model uses a combination of efficient fine-tuning techniques and optimizations for handling conflict event classification:
|
100 |
-
|
101 |
-
<p align="center">
|
102 |
-
<img src="images/model-arch.png" alt="Model Training Architecture" width="800"/>
|
103 |
-
</p>
|
104 |
-
|
105 |
-
### Data Processing Pipeline
|
106 |
-
The preprocessing pipeline transforms raw GTD data into a format suitable for fine-tuning:
|
107 |
-
|
108 |
-
<p align="center">
|
109 |
-
<img src="images/preprocessing.png" alt="Data Preprocessing Pipeline" width="800"/>
|
110 |
-
</p>
|
111 |
-
|
112 |
### Memory Optimizations
|
113 |
- Used 4-bit quantization
|
114 |
- Gradient accumulation steps: 8
|
@@ -136,24 +123,6 @@ This model is designed for:
|
|
136 |
4. Results should be interpreted with appropriate context
|
137 |
|
138 |
|
139 |
-
## Training Logs
|
140 |
-
<p align="center">
|
141 |
-
<img src="images/training.png" alt="Training Logs" width="800"/>
|
142 |
-
</p>
|
143 |
-
|
144 |
-
The training logs show a successful training run with healthy convergence patterns:
|
145 |
-
|
146 |
-
**Loss & Learning Rate:**
|
147 |
-
- Loss decreases from 1.95 to ~0.90, with rapid initial improvement
|
148 |
-
- Learning rate uses warmup/decay schedule, peaking at ~1.5x10^-4
|
149 |
-
|
150 |
-
**Training Stability:**
|
151 |
-
- Stable gradient norms (0.4-0.6 range)
|
152 |
-
- Consistent GPU memory usage (~5800MB allocated, 7080MB reserved)
|
153 |
-
- Steady training speed (~3.5s/step) with brief interruption at step 800
|
154 |
-
|
155 |
-
The graphs indicate effective model training with good optimization dynamics and resource utilization. The loss vs. learning rate plot suggests optimal learning around 10^-4.
|
156 |
-
|
157 |
## Citation
|
158 |
```bibtex
|
159 |
@misc{conflllama,
|
|
|
38 |
do_sample: true
|
39 |
---
|
40 |
|
41 |
+
**This model is a alternative to my main ConflLlama model, the only difference being a more neutral chat template**
|
42 |
|
43 |
# ConflLlama: GTD-Finetuned LLaMA-3 8B
|
44 |
- **Model Type:** GGUF quantized (q4_k_m and q8_0)
|
|
|
96 |
- 4-bit Quantization: Enabled
|
97 |
- Max Sequence Length: 1024
|
98 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
99 |
### Memory Optimizations
|
100 |
- Used 4-bit quantization
|
101 |
- Gradient accumulation steps: 8
|
|
|
123 |
4. Results should be interpreted with appropriate context
|
124 |
|
125 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
126 |
## Citation
|
127 |
```bibtex
|
128 |
@misc{conflllama,
|