diabolic6045
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -17,4 +17,211 @@ tags:
|
|
17 |
- llama
|
18 |
- llama-3
|
19 |
license: llama3.2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
---
|
|
|
17 |
- llama
|
18 |
- llama-3
|
19 |
license: llama3.2
|
20 |
+
---
|
21 |
+
|
22 |
+
# Evolution Learning Network (ELN) with QLoRA and Genetic Algorithms For LLM
|
23 |
+
|
24 |
+
## Overview
|
25 |
+
|
26 |
+
This project implements an **Evolution Learning Network (ELN)** to fine-tune transformer-based models like LLaMA using a combination of **Quantized Low-Rank Adaptation (QLoRA)** and **Genetic Algorithms (GA)**. The primary objective is to evolve a population of models across multiple generations to optimize for performance (fitness) and specialization, while maintaining diversity.
|
27 |
+
|
28 |
+
### Key Features
|
29 |
+
- Efficient model fine-tuning using **QLoRA** with 4-bit quantization
|
30 |
+
- Evolutionary strategies with tournament selection and blended crossover
|
31 |
+
- Adaptive mutation rates based on generation progress
|
32 |
+
- Comprehensive experiment tracking with **WandB**
|
33 |
+
- Diversity maintenance through LoRA weight fingerprinting
|
34 |
+
|
35 |
+
## Model Details
|
36 |
+
|
37 |
+
### Base Model
|
38 |
+
- **Name**: [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)
|
39 |
+
- **Architecture**: Transformer-based causal language model
|
40 |
+
|
41 |
+
### Quantization Configuration
|
42 |
+
- **Type**: 4-bit quantization using `bitsandbytes`
|
43 |
+
- **Parameters**:
|
44 |
+
- Compute Type: `torch.float16`
|
45 |
+
- Quantization Type: `"nf4"` (Nonlinear)
|
46 |
+
- Double Quantization: Enabled
|
47 |
+
- Nested Quantization: Enabled
|
48 |
+
|
49 |
+
### LoRA Configuration
|
50 |
+
- **Rank (r)**: 8
|
51 |
+
- **Alpha**: 16
|
52 |
+
- **Target Modules**: `q_proj`, `v_proj`
|
53 |
+
- **Dropout**: 0.05
|
54 |
+
- **Task Type**: `CAUSAL_LM`
|
55 |
+
|
56 |
+
### Training Configuration
|
57 |
+
- **Optimizer**: `paged_adamw_8bit`
|
58 |
+
- **Precision**: Mixed precision (`fp16`)
|
59 |
+
- **Batch Size Range**: 2-16 (genome-controlled)
|
60 |
+
- **Learning Rate Range**: 1e-6 to 1e-2 (genome-controlled)
|
61 |
+
- **Epochs Range**: 1-4 (genome-controlled)
|
62 |
+
|
63 |
+
## Dataset
|
64 |
+
|
65 |
+
### Source
|
66 |
+
- **Name**: WikiText-2 Raw
|
67 |
+
- **Configuration**: `wikitext-2-raw-v1`
|
68 |
+
- **Processing**:
|
69 |
+
- Max Length: 128 tokens
|
70 |
+
- Padding: Fixed to max length
|
71 |
+
- Splits: train, validation (general), test (specific)
|
72 |
+
|
73 |
+
## Evolution Process
|
74 |
+
|
75 |
+
### Population Management
|
76 |
+
1. **Initialization**:
|
77 |
+
- Population Size: 6 models
|
78 |
+
- Initial random mutations (20% rate)
|
79 |
+
- Randomized hyperparameter genomes
|
80 |
+
|
81 |
+
2. **Selection & Evolution**:
|
82 |
+
- Tournament selection (k=3)
|
83 |
+
- Blended crossover of LoRA weights
|
84 |
+
- Adaptive mutation rates (decreases with generations)
|
85 |
+
- Hyperparameter mutation with controlled ranges
|
86 |
+
|
87 |
+
## Experimental Results
|
88 |
+
|
89 |
+
### Evolution Progress
|
90 |
+
|
91 |
+
The evolutionary learning process was run for 8 generations with a population size of 6 models. The experiment tracked several key metrics across generations:
|
92 |
+
|
93 |
+
Evolution Metrics
|
94 |
+
<div style="display: flex;">
|
95 |
+
<img src="https://huggingface.co/diabolic6045/ELN-llama-1B-adapter/resolve/main/images/output.png" alt="Evolution Metrics" style="width: 50%;height: 50%;"/></div>
|
96 |
+
|
97 |
+
#### Fitness Progression
|
98 |
+
- **Initial Performance**: Best fitness started at ~0.480 (Generation 1)
|
99 |
+
- **Convergence**: Gradual decline to ~0.476 by Generation 8
|
100 |
+
- **Population Stability**: Average fitness closely tracked best fitness after Generation 2, indicating good convergence
|
101 |
+
- **Fitness Range**: Maintained between 0.476-0.480 throughout evolution
|
102 |
+
|
103 |
+
#### Specialization Trends
|
104 |
+
- **High Baseline**: Started at ~0.9975 specialization
|
105 |
+
- **Consistency**: Fluctuated minimally between 0.9975-0.9990
|
106 |
+
- **Peak Performance**: Reached ~0.9991 specialization in Generation 6
|
107 |
+
- **Population Average**: Maintained above 0.997 throughout evolution
|
108 |
+
|
109 |
+
### Comparison with Standard Training
|
110 |
+
|
111 |
+
![Training Comparison](https://huggingface.co/diabolic6045/ELN-llama-1B-adapter/resolve/main/images/comparison.webp)
|
112 |
+
|
113 |
+
The comparison reveals several key differences between ELN and standard training:
|
114 |
+
|
115 |
+
#### Fitness Metrics
|
116 |
+
- **ELN**: 0.4762 final fitness with stable progression
|
117 |
+
- **Standard**: 0.4779 final fitness with steeper learning curve
|
118 |
+
- **Difference**: ~0.3% performance gap, favoring standard training
|
119 |
+
|
120 |
+
#### Training Characteristics
|
121 |
+
- **Loss Reduction**:
|
122 |
+
- Standard: Sharp initial drop followed by gradual improvement
|
123 |
+
- ELN: More controlled, stable descent
|
124 |
+
- **Specialization**:
|
125 |
+
- Standard: More variable specialization scores
|
126 |
+
- ELN: Consistently high specialization maintenance
|
127 |
+
|
128 |
+
#### Key Advantages of ELN
|
129 |
+
1. More stable learning trajectory
|
130 |
+
2. Better maintenance of model diversity
|
131 |
+
3. Consistent specialization scores
|
132 |
+
4. Reduced risk of catastrophic forgetting
|
133 |
+
|
134 |
+
## Hardware & Framework Requirements
|
135 |
+
|
136 |
+
### Hardware
|
137 |
+
- Multi-GPU support via `DistributedDataParallel`
|
138 |
+
- Memory optimization through gradient accumulation
|
139 |
+
- Hardware monitoring (CPU/GPU usage)
|
140 |
+
|
141 |
+
### Dependencies
|
142 |
+
- transformers
|
143 |
+
- peft
|
144 |
+
- bitsandbytes
|
145 |
+
- accelerate
|
146 |
+
- wandb
|
147 |
+
- torch >= 2.0
|
148 |
+
|
149 |
+
## Usage
|
150 |
+
|
151 |
+
```python
|
152 |
+
from peft import PeftModel
|
153 |
+
from transformers import AutoModelForCausalLM
|
154 |
+
|
155 |
+
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
|
156 |
+
model = PeftModel.from_pretrained(base_model, "diabolic6045/ELN-llama-1B-adapter")
|
157 |
+
```
|
158 |
+
|
159 |
+
## Framework Versions
|
160 |
+
- PEFT 0.14.0
|
161 |
+
|
162 |
+
## Future Work
|
163 |
+
- Explore larger population sizes and generations
|
164 |
+
- Implement additional mutation strategies
|
165 |
+
- Test on diverse datasets and tasks
|
166 |
+
- Investigate multi-objective optimization
|
167 |
+
|
168 |
+
---
|
169 |
+
|
170 |
+
## Citation
|
171 |
+
|
172 |
+
If you use this work, please cite:
|
173 |
+
|
174 |
+
```bibtex
|
175 |
+
@misc{eln2024,
|
176 |
+
title={Evolution Learning Network (ELN): Combining QLoRA and Genetic Algorithms for LLM Optimization},
|
177 |
+
year={2024},
|
178 |
+
howpublished={\url{https://github.com/diabolic6045/ELN-llama-1B-adapter}}
|
179 |
+
}
|
180 |
+
```
|
181 |
+
|
182 |
+
### Related Works
|
183 |
+
|
184 |
+
This project builds upon several key papers and techniques:
|
185 |
+
|
186 |
+
```bibtex
|
187 |
+
@article{dettmers2023qlora,
|
188 |
+
title={QLoRA: Efficient Finetuning of Quantized LLMs},
|
189 |
+
author={Dettmers, Tim and Pagnoni, Artidoro and Holtzman, Ari and Zettlemoyer, Luke},
|
190 |
+
journal={arXiv preprint arXiv:2305.14314},
|
191 |
+
year={2023}
|
192 |
+
}
|
193 |
+
|
194 |
+
@article{touvron2023llama,
|
195 |
+
title={Llama: Open and Efficient Foundation Language Models},
|
196 |
+
author={Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timoth{\'e}e and Rozi{\`e}re, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume},
|
197 |
+
journal={arXiv preprint arXiv:2302.13971},
|
198 |
+
year={2023}
|
199 |
+
}
|
200 |
+
|
201 |
+
@article{such2017deep,
|
202 |
+
title={Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning},
|
203 |
+
author={Such, Felipe Petroski and Madhavan, Vashisht and Conti, Edoardo and Lehman, Joel and Stanley, Kenneth O and Clune, Jeff},
|
204 |
+
journal={arXiv preprint arXiv:1712.06567},
|
205 |
+
year={2017}
|
206 |
+
}
|
207 |
+
|
208 |
+
@article{real2019regularized,
|
209 |
+
title={Regularized Evolution for Image Classifier Architecture Search},
|
210 |
+
author={Real, Esteban and Aggarwal, Alok and Huang, Yanping and Le, Quoc V},
|
211 |
+
journal={Proceedings of the AAAI Conference on Artificial Intelligence},
|
212 |
+
volume={33},
|
213 |
+
number={01},
|
214 |
+
pages={4780--4789},
|
215 |
+
year={2019}
|
216 |
+
}
|
217 |
+
```
|
218 |
+
|
219 |
+
These citations cover:
|
220 |
+
1. QLoRA quantization and fine-tuning technique
|
221 |
+
2. The base LLaMA model architecture
|
222 |
+
3. Deep neuroevolution fundamentals
|
223 |
+
4. Regularized evolution in neural networks
|
224 |
+
|
225 |
+
The implementation also draws inspiration from recent advances in evolutionary algorithms and neural architecture search.
|
226 |
+
|
227 |
---
|