Update README.md
Browse files
README.md
CHANGED
@@ -135,6 +135,25 @@ This model is designed for:
|
|
135 |
3. Not intended for operational security decisions
|
136 |
4. Results should be interpreted with appropriate context
|
137 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
138 |
## Citation
|
139 |
```bibtex
|
140 |
@misc{conflllama,
|
|
|
135 |
3. Not intended for operational security decisions
|
136 |
4. Results should be interpreted with appropriate context
|
137 |
|
138 |
+
|
139 |
+
## Training Logs
|
140 |
+
<p align="center">
|
141 |
+
<img src="images/training-logs.png" alt="Training Logs" width="800"/>
|
142 |
+
</p>
|
143 |
+
|
144 |
+
The training logs show a successful training run with healthy convergence patterns:
|
145 |
+
|
146 |
+
**Loss & Learning Rate:**
|
147 |
+
- Loss decreases from 1.95 to ~0.90, with rapid initial improvement
|
148 |
+
- Learning rate uses warmup/decay schedule, peaking at ~1.5x10^-4
|
149 |
+
|
150 |
+
**Training Stability:**
|
151 |
+
- Stable gradient norms (0.4-0.6 range)
|
152 |
+
- Consistent GPU memory usage (~5800MB allocated, 7080MB reserved)
|
153 |
+
- Steady training speed (~3.5s/step) with brief interruption at step 800
|
154 |
+
|
155 |
+
The graphs indicate effective model training with good optimization dynamics and resource utilization. The loss vs. learning rate plot suggests optimal learning around 10^-4.
|
156 |
+
|
157 |
## Citation
|
158 |
```bibtex
|
159 |
@misc{conflllama,
|