|
--- |
|
tags: |
|
- merge |
|
- mergekit |
|
- lazymergekit |
|
- abacaj/phi-2-super |
|
base_model: |
|
- abacaj/phi-2-super |
|
|
|
# phi-2-DLEC |
|
|
|
The DLEC (Distributive Layer Expansion Curve) methodology offers a novel approach to improving neural network models by focusing on the strategic duplication of certain effective layers. |
|
Developed with the aim of enhancing model performance, DLEC carefully identifies and amplifies the impact of key layers within the model's architecture. |
|
|
|
Below is a overview of the method and its implementation, particularly in how it integrates with the Hugging Face Transformers library and utilizes PyTorch and BitsAndBytes for efficient operation. |
|
|
|
Overview |
|
Setting Up: First, the script ensures all necessary components are in place, from libraries to the model and dataset. |
|
|
|
Database for Activations: A SQLite database is established to track layer activations, providing a clear view into how individual neurons react and which layers are most influential — these are our 'beneficial layers.' |
|
|
|
Analyzing and Identifying: By analyzing activation data, the script pinpoints which layers are most valuable to the model's performance. |
|
|
|
Configuring DLEC: A configuration is then created, guiding how the model should incorporate duplicates of these beneficial layers to boost effectiveness without unnecessarily increasing complexity. |
|
|
|
Reconfiguring and Running the Model: Finally, the model is adjusted according to DLEC's insights, focusing enhancement on the identified layers. |
|
|
|
Key Features: |
|
Selective Layer Duplication: DLEC doesn't just add more layers; it doubles down on the ones that really matter. This methodical selection ensures we're making the most of the model's capabilities without wasteful expansion. |
|
|
|
Smart Resource Management: By honing in on specific areas for improvement, DLEC aims to make better use of computational and memory resources, promoting more efficient learning without adding undue complexity to the model. |
|
|
|
This approach is about making informed, strategic enhancements to model architecture, prioritizing efficiency and effectiveness in utilizing neural network capabilities. |
|
|
|
# This Method is still in development and I do not expect "Game Changing" or will I oversell this method, it is purely done for fun. Please let me know how the model works for you. |
|
|
|
## 🧩 Configuration |
|
|
|
```yaml |
|
dtype: bfloat16 |
|
merge_method: passthrough |
|
slices: |
|
- sources: |
|
- model: abacaj/phi-2-super |
|
layer_range: [0, 3] # Introduces 0, 3 |
|
- sources: |
|
- model: abacaj/phi-2-super |
|
layer_range: [3, 8] # Duplicates 3, introduces 4, 7, 8 |
|
- sources: |
|
- model: abacaj/phi-2-super |
|
layer_range: [7, 12] # Duplicates 7, 8, introduces 11, 12 |
|
- sources: |
|
- model: abacaj/phi-2-super |
|
layer_range: [11, 16] # Duplicates 11, 12, introduces 15, 16 |
|
- sources: |
|
- model: abacaj/phi-2-super |
|
layer_range: [15, 20] # Duplicates 15, 16, introduces 19, 20 |
|
- sources: |
|
- model: abacaj/phi-2-super |
|
layer_range: [19, 24] # Duplicates 19, 20, introduces 23, 24 |
|
- sources: |
|
- model: abacaj/phi-2-super |
|
layer_range: [23, 28] # Duplicates 23, 24, introduces 27, 28 |
|
- sources: |
|
- model: abacaj/phi-2-super |
|
layer_range: [27, 32] # Duplicates 27, 28, introduces 31, 32 |
|
|
|
``` |
|
|
|
## 💻 Usage |
|
|
|
```python |
|
!pip install -qU transformers accelerate |
|
|
|
from transformers import AutoTokenizer |
|
import transformers |
|
import torch |
|
|
|
model = "TheSkullery/phi-2-DLEC" |
|
messages = [{"role": "user", "content": "What is a large language model?"}] |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model) |
|
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
pipeline = transformers.pipeline( |
|
"text-generation", |
|
model=model, |
|
torch_dtype=torch.float16, |
|
device_map="auto", |
|
) |
|
|
|
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) |
|
print(outputs[0]["generated_text"]) |
|
``` |