macadeliccc
commited on
Commit
•
95a55ce
1
Parent(s):
543c7da
add code example
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ This model is a medium-sized MoE implementation based on [cognitivecomputations/
|
|
12 |
|
13 |
A 2x7b configuration offers better performance than a standard 7b model even if loaded in 4 bit.
|
14 |
|
15 |
-
If this 2x7b model is loaded in 4 bit the hellaswag score is .
|
16 |
|
17 |
## Prompt Format
|
18 |
|
@@ -37,12 +37,44 @@ Please give ideas and a detailed plan about how to assemble and train an army of
|
|
37 |
<|im_start|>assistant
|
38 |
```
|
39 |
|
|
|
|
|
|
|
|
|
|
|
40 |
## Code Example
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
|
42 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
43 |
|
44 |
## Eval
|
45 |
|
|
|
|
|
46 |
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|
47 |
|----------|-------|------|-----:|--------|-----:|---|-----:|
|
48 |
|arc_easy |Yaml |none | 0|acc |0.8413|± |0.0075|
|
@@ -56,6 +88,20 @@ TODO
|
|
56 |
| | |none | 0|acc_norm|0.8303|± |0.0088|
|
57 |
|winogrande|Yaml |none | 0|acc |0.7577|± |0.0120|
|
58 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
59 |
link to evaluation [colab](https://colab.research.google.com/drive/1FpwgsGzCR4tORTxAwUxpN3PcP22En2xk?usp=sharing)
|
60 |
|
61 |
## Citations
|
|
|
12 |
|
13 |
A 2x7b configuration offers better performance than a standard 7b model even if loaded in 4 bit.
|
14 |
|
15 |
+
If this 2x7b model is loaded in 4 bit the hellaswag score is .8270 which is higher than the base model achieves on its own in full precision.
|
16 |
|
17 |
## Prompt Format
|
18 |
|
|
|
37 |
<|im_start|>assistant
|
38 |
```
|
39 |
|
40 |
+
## Models Merged
|
41 |
+
|
42 |
+
+ teknium/OpenHermes-2.5-Mistral-7B
|
43 |
+
+ cognitivecomputations/dolphin-2.6-mistral-7b-dpo-laser
|
44 |
+
|
45 |
## Code Example
|
46 |
+
Switch the commented model definition to use in 4-bit. Should work with 9GB and still exceed the single 7B model by 5-6 points roughly
|
47 |
+
|
48 |
+
```python
|
49 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
50 |
+
|
51 |
+
# Load tokenizer and model
|
52 |
+
tokenizer = AutoTokenizer.from_pretrained("macadeliccc/laser-dolphin-mixtral-2x7b-dpo")
|
53 |
+
model = AutoModelForCausalLM.from_pretrained("macadeliccc/laser-dolphin-mixtral-2x7b-dpo")
|
54 |
+
# model = AutoModelForCausalLM.from_pretrained("macadeliccc/laser-dolphin-mixtral-2x7b-dpo", load_in_4bit=True)
|
55 |
+
# Define the chat messages
|
56 |
+
messages = [
|
57 |
+
{"role": "system", "content": "You are Dolphin, an AI assistant"},
|
58 |
+
{"role": "user", "content": "Hello, who are you?"}
|
59 |
+
]
|
60 |
+
|
61 |
+
# Apply chat template to input messages
|
62 |
+
gen_input = tokenizer.apply_chat_template(messages, return_tensors="pt")
|
63 |
|
64 |
+
# Generate a response
|
65 |
+
output = model.generate(**gen_input)
|
66 |
+
|
67 |
+
# Decode the generated tokens to a string
|
68 |
+
response = tokenizer.decode(output[0], skip_special_tokens=True)
|
69 |
+
|
70 |
+
# Print the response
|
71 |
+
print("Response:", response)
|
72 |
+
```
|
73 |
|
74 |
## Eval
|
75 |
|
76 |
+
**Full Precision**
|
77 |
+
|
78 |
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|
79 |
|----------|-------|------|-----:|--------|-----:|---|-----:|
|
80 |
|arc_easy |Yaml |none | 0|acc |0.8413|± |0.0075|
|
|
|
88 |
| | |none | 0|acc_norm|0.8303|± |0.0088|
|
89 |
|winogrande|Yaml |none | 0|acc |0.7577|± |0.0120|
|
90 |
|
91 |
+
**4-bit (bnb)**
|
92 |
+
|
93 |
+
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|
94 |
+
|----------|-------|------|-----:|--------|-----:|---|-----:|
|
95 |
+
|boolq |Yaml |none | 0|acc |0.8700|± |0.0059|
|
96 |
+
|hellaswag |Yaml |none | 0|acc |0.6356|± |0.0048|
|
97 |
+
| | |none | 0|acc_norm|0.8270|± |0.0038|
|
98 |
+
|openbookqa|Yaml |none | 0|acc |0.3320|± |0.0211|
|
99 |
+
| | |none | 0|acc_norm|0.4620|± |0.0223|
|
100 |
+
|piqa |Yaml |none | 0|acc |0.8123|± |0.0091|
|
101 |
+
| | |none | 0|acc_norm|0.8259|± |0.0088|
|
102 |
+
|winogrande|Yaml |none | 0|acc |0.7490|± |0.0122|
|
103 |
+
|
104 |
+
|
105 |
link to evaluation [colab](https://colab.research.google.com/drive/1FpwgsGzCR4tORTxAwUxpN3PcP22En2xk?usp=sharing)
|
106 |
|
107 |
## Citations
|