jeffmeloy commited on
Commit
e757ba9
·
verified ·
1 Parent(s): 504a2dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -31
README.md CHANGED
@@ -14,41 +14,25 @@ library_name: transformers
14
 
15
  Model created by analyzing and selecting the optimal layers from other Qwen2.5-7B models based on their dimensional utilization efficiency, measured by the Normalized Effective Rank (NER). Computed like:
16
 
17
- Singular Value Decomposition:
18
- - Input: Weight matrix A ∈ R^(m×n) # m = number of output features, n = number of input features
19
  - Compute singular values σᵢ where σᵢ ≥ 0 # σᵢ represents the importance of each dimension
20
- - Filter values above numerical threshold (>1e-12) # removes numerical noise from computation
21
-
22
- Distribution Normalization:
23
  - Sum all singular values: S = Σσᵢ # S acts as normalization factor
24
  - Create probability distribution: pᵢ = σᵢ/S # converts singular values to probabilities summing to 1
25
-
26
- Entropy Calculation:
27
- - Compute Shannon entropy: H = -Σ(pᵢ * log₂(pᵢ)) # measures information content of distribution
28
- - Calculate maximum possible entropy: H_max = log₂(n) # n = number of singular values
29
- where n is the number of singular values # maximum entropy occurs when all dimensions contribute equally
30
-
31
- Normalization:
32
  - Final NER score = H/H_max # normalizes score to [0,1] range
33
- - Results in value between 0 and 1 # 0 = single dimension dominance, 1 = perfect dimensional utilization
34
- - Higher scores indicate more uniform dimensional utilization
35
 
36
  ## Creating Composite Model
37
 
38
  Code here: https://huggingface.co/jeffmeloy/Qwen2.5-7B-nerd-uncensored-v1.0/blob/main/ner_merge.py
39
 
40
- Layer Analysis:
41
- - Download base and fine-tuned models from Hugging Face Hub
42
  - Calculate Normalized Effective Rank (NER) for each layer within each model
43
-
44
- Layer Selection:
45
- - Identify common layer structures across models
46
  - Define model and layer name pairs that have highest NER for each layer based on their NER scores
47
-
48
- Model Composition:
49
- - Incrementally build a composite model using layer with highest NER from model pool.
50
-
51
- Output Generation:
52
  - Save merge reports documenting layer sources
53
  - Copy config and tokenizer files from base model
54
  - Save the composite model with complete weights # model ready to use
@@ -63,6 +47,8 @@ fine_tuned_models: # uncomment the models you want to merge
63
 
64
  #- "Qwen/Qwen2.5-7B-Instruct"
65
 
 
 
66
  #- "FourOhFour/Vapor_v2_7B"
67
 
68
  #- "Goekdeniz-Guelmez/Josiefied-Qwen2.5-7B-Instruct-abliterated-v2"
@@ -77,21 +63,41 @@ fine_tuned_models: # uncomment the models you want to merge
77
 
78
  #- "Orion-zhen/Meissa-Qwen2.5-7B-Instruct"
79
 
 
 
80
  #- "jeffmeloy/Qwen2.5-7B-nerd-uncensored-v1.0"
81
 
 
 
 
 
 
 
 
 
82
  #- "rombodawg/Rombos-LLM-V2.5-Qwen-7b"
83
 
84
  #- "Cran-May/T.E-8.1"
85
 
86
- #- "thomas-yanxin/XinYuan-Qwen2.5-7B-0917"
87
-
88
  #- "beomi/Qwen2.5-7B-Instruct-kowiki-qa"
89
 
90
  #- "Orion-zhen/Qwen2.5-7B-Gutenberg-KTO"
91
 
92
- #- 'fblgit/cybertron-v4-qw7B-MGS'
93
 
94
- #- 'nguyentd/FinancialAdvice-Qwen2.5-7B'
 
 
 
 
 
 
 
 
 
 
 
 
95
 
96
  #- "Qwen/Qwen2.5-Coder-7B-Instruct"
97
 
@@ -101,11 +107,13 @@ fine_tuned_models: # uncomment the models you want to merge
101
 
102
  #- "Qwen/Qwen2.5-Math-7B"
103
 
104
- #- "WhiteRabbitNeo/WhiteRabbitNeo-2.5-Qwen-2.5-Coder-7B"
105
 
106
- #- "edgerunner-ai/EdgeRunner-Command-Nested"
107
 
108
- #- "katanemo/Arch-Function-7B"
 
 
109
 
110
  models_dir: "./input_models/"
111
 
 
14
 
15
  Model created by analyzing and selecting the optimal layers from other Qwen2.5-7B models based on their dimensional utilization efficiency, measured by the Normalized Effective Rank (NER). Computed like:
16
 
17
+ - Input: Weight matrix for each model layer
 
18
  - Compute singular values σᵢ where σᵢ ≥ 0 # σᵢ represents the importance of each dimension
19
+ - Filter values above numerical threshold (>1e-12)
 
 
20
  - Sum all singular values: S = Σσᵢ # S acts as normalization factor
21
  - Create probability distribution: pᵢ = σᵢ/S # converts singular values to probabilities summing to 1
22
+ - Compute Shannon entropy: H = -Σ(pᵢ * log₂(pᵢ)) # measures information content
23
+ - Calculate maximum possible entropy: H_max = log₂(n)
 
 
 
 
 
24
  - Final NER score = H/H_max # normalizes score to [0,1] range
25
+ - Results in value between 0 and 1 for each model layer
 
26
 
27
  ## Creating Composite Model
28
 
29
  Code here: https://huggingface.co/jeffmeloy/Qwen2.5-7B-nerd-uncensored-v1.0/blob/main/ner_merge.py
30
 
31
+ Code functions:
32
+ - Download selected models from Hugging Face Hub
33
  - Calculate Normalized Effective Rank (NER) for each layer within each model
 
 
 
34
  - Define model and layer name pairs that have highest NER for each layer based on their NER scores
35
+ - Incrementally build a composite model using layer with highest NER from model pool
 
 
 
 
36
  - Save merge reports documenting layer sources
37
  - Copy config and tokenizer files from base model
38
  - Save the composite model with complete weights # model ready to use
 
47
 
48
  #- "Qwen/Qwen2.5-7B-Instruct"
49
 
50
+ #- "EVA-UNIT-01/EVA-Qwen2.5-7B-v0.1"
51
+
52
  #- "FourOhFour/Vapor_v2_7B"
53
 
54
  #- "Goekdeniz-Guelmez/Josiefied-Qwen2.5-7B-Instruct-abliterated-v2"
 
63
 
64
  #- "Orion-zhen/Meissa-Qwen2.5-7B-Instruct"
65
 
66
+ #- "jeffmeloy/Qwen2.5-7B-nerd-uncensored-v0.9"
67
+
68
  #- "jeffmeloy/Qwen2.5-7B-nerd-uncensored-v1.0"
69
 
70
+ #- "jeffmeloy/Qwen2.5-7B-nerd-uncensored-v1.1"
71
+
72
+ #- "jeffmeloy/Qwen2.5-7B-nerd-uncensored-v1.2"
73
+
74
+ #- "AmberYifan/Qwen2.5-7B-dpo-2k"
75
+
76
+ #- "sethuiyer/Qwen2.5-7B-Anvita"
77
+
78
  #- "rombodawg/Rombos-LLM-V2.5-Qwen-7b"
79
 
80
  #- "Cran-May/T.E-8.1"
81
 
 
 
82
  #- "beomi/Qwen2.5-7B-Instruct-kowiki-qa"
83
 
84
  #- "Orion-zhen/Qwen2.5-7B-Gutenberg-KTO"
85
 
86
+ #- "fblgit/cybertron-v4-qw7B-MGS"
87
 
88
+ #- "nguyentd/FinancialAdvice-Qwen2.5-7B"
89
+
90
+ #- "WhiteRabbitNeo/WhiteRabbitNeo-2.5-Qwen-2.5-Coder-7B"
91
+
92
+ #- "edgerunner-ai/EdgeRunner-Command-Nested"
93
+
94
+ #- "katanemo/Arch-Function-7B"
95
+
96
+ #- "DeepGlint-AI/llava-mlcd-qwen2.5-7b"
97
+
98
+ #- "mergekit-community/mergekit-slerp-aflqaqy"
99
+
100
+ #- "mergekit-community/mergekit-ties-inxwsfo"
101
 
102
  #- "Qwen/Qwen2.5-Coder-7B-Instruct"
103
 
 
107
 
108
  #- "Qwen/Qwen2.5-Math-7B"
109
 
110
+ #- "thomas-yanxin/XinYuan-Qwen2.5-7B-0917"
111
 
112
+ #- "jbjeong91/Qwen2.5_7B_IST_StoryGen_vanilla"
113
 
114
+ #- "AmberYifan/Qwen2.5-7B-dpo-2k-hhrlhf"
115
+
116
+ #- "jbjeong91/Qwen2.5_7B_IST_StoryGen_test2"
117
 
118
  models_dir: "./input_models/"
119