shreyasmeher
/

ConflLlama-Alt

Text Classification

Model card Files Files and versions Community

shreyasmeher commited on Nov 19, 2024

Commit

31fea59

·

verified ·

1 Parent(s): 104ef23

Update README.md

Files changed (1) hide show

README.md +1 -32

README.md CHANGED Viewed

@@ -38,6 +38,7 @@ inference:
     do_sample: true
 ---
 # ConflLlama: GTD-Finetuned LLaMA-3 8B
 - **Model Type:** GGUF quantized (q4_k_m and q8_0)
@@ -95,20 +96,6 @@ inference:
   - 4-bit Quantization: Enabled
   - Max Sequence Length: 1024
-## Model Architecture
-The model uses a combination of efficient fine-tuning techniques and optimizations for handling conflict event classification:
-<p align="center">
-  <img src="images/model-arch.png" alt="Model Training Architecture" width="800"/>
-</p>
-### Data Processing Pipeline
-The preprocessing pipeline transforms raw GTD data into a format suitable for fine-tuning:
-<p align="center">
-  <img src="images/preprocessing.png" alt="Data Preprocessing Pipeline" width="800"/>
-</p>
 ### Memory Optimizations
 - Used 4-bit quantization
 - Gradient accumulation steps: 8
@@ -136,24 +123,6 @@ This model is designed for:
 4. Results should be interpreted with appropriate context
-## Training Logs
-<p align="center">
-  <img src="images/training.png" alt="Training Logs" width="800"/>
-</p>
-The training logs show a successful training run with healthy convergence patterns:
-**Loss & Learning Rate:**
-- Loss decreases from 1.95 to ~0.90, with rapid initial improvement
-- Learning rate uses warmup/decay schedule, peaking at ~1.5x10^-4
-**Training Stability:**
-- Stable gradient norms (0.4-0.6 range)
-- Consistent GPU memory usage (~5800MB allocated, 7080MB reserved)
-- Steady training speed (~3.5s/step) with brief interruption at step 800
-The graphs indicate effective model training with good optimization dynamics and resource utilization. The loss vs. learning rate plot suggests optimal learning around 10^-4.
 ## Citation
 ```bibtex
 @misc{conflllama,

     do_sample: true
 ---
+**This model is a alternative to my main ConflLlama model, the only difference being a more neutral chat template**
 # ConflLlama: GTD-Finetuned LLaMA-3 8B
 - **Model Type:** GGUF quantized (q4_k_m and q8_0)
   - 4-bit Quantization: Enabled
   - Max Sequence Length: 1024
 ### Memory Optimizations
 - Used 4-bit quantization
 - Gradient accumulation steps: 8
 4. Results should be interpreted with appropriate context
 ## Citation
 ```bibtex
 @misc{conflllama,