mlabonne commited on
Commit
2cb8c65
·
verified ·
1 Parent(s): a99a26a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -11
README.md CHANGED
@@ -5,25 +5,35 @@ library_name: transformers
5
  tags:
6
  - mergekit
7
  - merge
8
-
9
  ---
10
- # merge
11
 
12
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
- ## Merge Details
15
- ### Merge Method
16
 
17
- This model was merged using the passthrough merge method.
18
 
19
- ### Models Merged
20
 
21
- The following models were included in the merge:
22
- * [meta-llama/Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct)
23
 
24
- ### Configuration
25
 
26
- The following YAML configuration was used to produce this model:
27
 
28
  ```yaml
29
  slices:
@@ -45,3 +55,32 @@ slices:
45
  merge_method: passthrough
46
  dtype: bfloat16
47
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  tags:
6
  - mergekit
7
  - merge
 
8
  ---
 
9
 
10
+ # 🦙⛰️ BigLlama-3.1-681B-Instruct
11
+
12
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/DcEFMtuBt8-Pl3chGqxoG.png)
13
+
14
+ <center>🦙✨ <i><a href="https://huggingface.co/mlabonne/BigLlama-3.1-1T-Instruct">mlabonne/BigLlama-3.1-1T-Instruct</a></i></center>
15
+
16
+ This is an experimental self-merge using [meta-llama/Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct) and created with [mergekit](https://github.com/cg123/mergekit).
17
+
18
+ This is the direct successor of [Meta-Llama-3-120B-Instruct](https://huggingface.co/mlabonne/Meta-Llama-3-120B-Instruct), a self-merge of Llama 3 70B that produced a decent 120B model for tasks like creative writing.
19
+
20
+ I tweaked the range of duplicated layers to hopefully make a sensible model. Use it at your own risk!
21
+
22
+ ## 🔍 Applications
23
+
24
+ I recommend using this model for creative writing with the Llama 3 chat template.
25
 
26
+ ## Quantization
 
27
 
28
+ TBD.
29
 
30
+ ## 🏆 Evaluation
31
 
32
+ TBD.
 
33
 
34
+ ## 🧩 Configuration
35
 
36
+ This model was merged using the passthrough merge method. The following YAML configuration was used to produce this model:
37
 
38
  ```yaml
39
  slices:
 
55
  merge_method: passthrough
56
  dtype: bfloat16
57
  ```
58
+
59
+ Here is the code I've used to generate the config calculate the number of layers/parameters after passthrough:
60
+
61
+ ```python
62
+ def generate_yaml_config(range_size, total_layers, nb_parameters):
63
+ new_size = total_layers + total_layers - range_size
64
+ new_param = (nb_parameters / total_layers) * new_size
65
+ print(f"New size = {new_size} layers")
66
+ print(f"New parameters = {new_param:.2f}B")
67
+ yaml_str = "slices:\n"
68
+
69
+ for i in range(0, round(total_layers - range_size + 1), range_size // 2):
70
+ start = i
71
+ end = min(start + range_size, total_layers)
72
+ yaml_str += f"- sources:\n"
73
+ yaml_str += f" - layer_range: [{start}, {end}]\n"
74
+ yaml_str += f" model: meta-llama/Meta-Llama-3.1-405B-Instruct\n"
75
+
76
+ yaml_str += "merge_method: passthrough\n"
77
+ yaml_str += "dtype: bfloat16\n"
78
+
79
+ print(yaml_str)
80
+
81
+ return new_size, new_param
82
+
83
+ # Example usage
84
+ new_size, new_param = generate_yaml_config(42, 126, 410)
85
+ new_size, new_param = generate_yaml_config(105, new_size, new_param)
86
+ ```