Update README.md
Browse files
README.md
CHANGED
@@ -12,3 +12,31 @@ Full weight fine tuned on two epochs of [SlimOrca](https://huggingface.co/datase
|
|
12 |
The base model for this came from a variation on Undi's [Mistral 11B recipe](https://huggingface.co/Undi95/Mistral-11B-v0.1). The `o_proj` and `down_proj` tensors were set to zero in the added layers, making the output exactly identical to Mistral 7B before training.
|
13 |
|
14 |
Benchmarks look good locally but still evaluating actual usefulness.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
The base model for this came from a variation on Undi's [Mistral 11B recipe](https://huggingface.co/Undi95/Mistral-11B-v0.1). The `o_proj` and `down_proj` tensors were set to zero in the added layers, making the output exactly identical to Mistral 7B before training.
|
13 |
|
14 |
Benchmarks look good locally but still evaluating actual usefulness.
|
15 |
+
|
16 |
+
|
17 |
+
### Reproducing
|
18 |
+
|
19 |
+
This [mergekit](https://github.com/cg123/mergekit) config was used to produce the base model:
|
20 |
+
```yml
|
21 |
+
slices:
|
22 |
+
- sources:
|
23 |
+
- model: mistralai/Mistral-7B-v0.1
|
24 |
+
layer_range: [0, 24]
|
25 |
+
- sources: # add middle layers with residuals scaled to zero
|
26 |
+
- model: mistralai/Mistral-7B-v0.1
|
27 |
+
layer_range: [8, 24]
|
28 |
+
parameters:
|
29 |
+
scale:
|
30 |
+
- filter: o_proj
|
31 |
+
value: 0.0
|
32 |
+
- filter: down_proj
|
33 |
+
value: 0.0
|
34 |
+
- value: 1.0
|
35 |
+
- sources:
|
36 |
+
- model: mistralai/Mistral-7B-v0.1
|
37 |
+
layer_range: [24, 32]
|
38 |
+
merge_method: passthrough
|
39 |
+
dtype: bfloat16
|
40 |
+
```
|
41 |
+
|
42 |
+
The axolotl config for fine tuning is available [here](https://huggingface.co/chargoddard/mistral-11b-slimorca/blob/main/axolotl_config.yaml).
|