chargoddard commited on
Commit
edddb95
·
verified ·
1 Parent(s): 52ad111

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -0
README.md CHANGED
@@ -12,3 +12,31 @@ Full weight fine tuned on two epochs of [SlimOrca](https://huggingface.co/datase
12
  The base model for this came from a variation on Undi's [Mistral 11B recipe](https://huggingface.co/Undi95/Mistral-11B-v0.1). The `o_proj` and `down_proj` tensors were set to zero in the added layers, making the output exactly identical to Mistral 7B before training.
13
 
14
  Benchmarks look good locally but still evaluating actual usefulness.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  The base model for this came from a variation on Undi's [Mistral 11B recipe](https://huggingface.co/Undi95/Mistral-11B-v0.1). The `o_proj` and `down_proj` tensors were set to zero in the added layers, making the output exactly identical to Mistral 7B before training.
13
 
14
  Benchmarks look good locally but still evaluating actual usefulness.
15
+
16
+
17
+ ### Reproducing
18
+
19
+ This [mergekit](https://github.com/cg123/mergekit) config was used to produce the base model:
20
+ ```yml
21
+ slices:
22
+ - sources:
23
+ - model: mistralai/Mistral-7B-v0.1
24
+ layer_range: [0, 24]
25
+ - sources: # add middle layers with residuals scaled to zero
26
+ - model: mistralai/Mistral-7B-v0.1
27
+ layer_range: [8, 24]
28
+ parameters:
29
+ scale:
30
+ - filter: o_proj
31
+ value: 0.0
32
+ - filter: down_proj
33
+ value: 0.0
34
+ - value: 1.0
35
+ - sources:
36
+ - model: mistralai/Mistral-7B-v0.1
37
+ layer_range: [24, 32]
38
+ merge_method: passthrough
39
+ dtype: bfloat16
40
+ ```
41
+
42
+ The axolotl config for fine tuning is available [here](https://huggingface.co/chargoddard/mistral-11b-slimorca/blob/main/axolotl_config.yaml).