Divyasreepat commited on
Commit
8bc318a
·
verified ·
1 Parent(s): 9b0ce6c

Update README.md with new model card content

Browse files
Files changed (1) hide show
  1. README.md +206 -0
README.md ADDED
@@ -0,0 +1,206 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: keras-hub
3
+ ---
4
+ ### Model Overview
5
+ # Model Summary
6
+
7
+ Mistral is a set of large language models published by the Mistral AI team. The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. Both pre-trained and instruction tuned models are available with 7 billion activated parameters.
8
+
9
+ Weights are released under the [Apache 2 License](https://github.com/keras-team/keras-hub/blob/master/LICENSE) . Keras model code is released under the [Apache 2 License](https://github.com/keras-team/keras-hub/blob/master/LICENSE).
10
+
11
+ ## Links
12
+
13
+ * [Mixtral Quickstart Notebook](https://www.kaggle.com/code/laxmareddypatlolla/mixtral-quickstart-notebook)
14
+ * [Mixtral API Documentation](https://keras.io/keras_hub/api/models/mixtral/)
15
+ * [Mixtral Model Card](https://mistral.ai/news/mixtral-of-experts)
16
+ * [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
17
+ * [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)
18
+
19
+ ## Installation
20
+
21
+ Keras and KerasHub can be installed with:
22
+
23
+ ```
24
+ pip install -U -q keras-hub
25
+ pip install -U -q keras
26
+ ```
27
+
28
+ Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the [Keras Getting Started](https://keras.io/getting_started/) page.
29
+
30
+ ## Presets
31
+
32
+ The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
33
+
34
+ | Preset name | Parameters | Description |
35
+ |---------------------------------------|------------|--------------------------------------------------------------------------------------------------------------|
36
+ | mixtral_8_7b_en | 7B | 32-layer Mixtral MoE model with 7 billion active parameters and 8 experts per MoE layer. |
37
+ | mixtral_8_instruct_7b_en | 7B | Instruction fine-tuned 32-layer Mixtral MoE model with 7 billion active parameters and 8 experts per MoE layer. |
38
+
39
+ ## Example Usage
40
+ ```Python
41
+
42
+ import keras
43
+ import keras_hub
44
+ import numpy as np
45
+
46
+ # Basic text generation
47
+ mixtral_lm = keras_hub.models.MixtralCausalLM.from_preset("mixtral_8_instruct_7b_en")
48
+ mixtral_lm.generate("[INST] What is Keras? [/INST]", max_length=500)
49
+
50
+ # Generate with batched prompts
51
+ mixtral_lm.generate([
52
+ "[INST] What is Keras? [/INST]",
53
+ "[INST] Give me your best brownie recipe. [/INST]"
54
+ ], max_length=500)
55
+
56
+ # Using different sampling strategies
57
+ mixtral_lm = keras_hub.models.MixtralCausalLM.from_preset("mixtral_8_instruct_7b_en")
58
+ # Greedy sampling
59
+ mixtral_lm.compile(sampler="greedy")
60
+ mixtral_lm.generate("I want to say", max_length=30)
61
+
62
+ # Beam search
63
+ mixtral_lm.compile(
64
+ sampler=keras_hub.samplers.BeamSampler(
65
+ num_beams=2,
66
+ top_k_experts=2, # MoE-specific: number of experts to use per token
67
+ )
68
+ )
69
+ mixtral_lm.generate("I want to say", max_length=30)
70
+
71
+ # Generate without preprocessing
72
+ prompt = {
73
+ "token_ids": np.array([[1, 315, 947, 298, 1315, 0, 0, 0, 0, 0]] * 2),
74
+ "padding_mask": np.array([[1, 1, 1, 1, 1, 0, 0, 0, 0, 0]] * 2),
75
+ }
76
+
77
+ mixtral_lm = keras_hub.models.MixtralCausalLM.from_preset(
78
+ "mixtral_8_instruct_7b_en",
79
+ preprocessor=None,
80
+ dtype="bfloat16"
81
+ )
82
+ mixtral_lm.generate(
83
+ prompt,
84
+ num_experts=8, # Total number of experts per layer
85
+ top_k_experts=2, # Number of experts to use per token
86
+ router_aux_loss_coef=0.02 # Router auxiliary loss coefficient
87
+ )
88
+
89
+ # Training on a single batch
90
+ features = ["The quick brown fox jumped.", "I forgot my homework."]
91
+ mixtral_lm = keras_hub.models.MixtralCausalLM.from_preset(
92
+ "mixtral_8_instruct_7b_en",
93
+ dtype="bfloat16"
94
+ )
95
+ mixtral_lm.fit(
96
+ x=features,
97
+ batch_size=2,
98
+ router_aux_loss_coef=0.02 # MoE-specific: router training loss
99
+ )
100
+
101
+ # Training without preprocessing
102
+ x = {
103
+ "token_ids": np.array([[1, 315, 947, 298, 1315, 369, 315, 837, 0, 0]] * 2),
104
+ "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
105
+ }
106
+ y = np.array([[315, 947, 298, 1315, 369, 315, 837, 0, 0, 0]] * 2)
107
+ sw = np.array([[1, 1, 1, 1, 1, 1, 1, 0, 0, 0]] * 2)
108
+
109
+ mixtral_lm = keras_hub.models.MixtralCausalLM.from_preset(
110
+ "mixtral_8_instruct_7b_en",
111
+ preprocessor=None,
112
+ dtype="bfloat16"
113
+ )
114
+ mixtral_lm.fit(
115
+ x=x,
116
+ y=y,
117
+ sample_weight=sw,
118
+ batch_size=2,
119
+ router_aux_loss_coef=0.02
120
+ )
121
+ ```
122
+
123
+ ## Example Usage with Hugging Face URI
124
+
125
+ ```Python
126
+
127
+ import keras
128
+ import keras_hub
129
+ import numpy as np
130
+
131
+ # Basic text generation
132
+ mixtral_lm = keras_hub.models.MixtralCausalLM.from_preset("hf://keras/mixtral_8_instruct_7b_en")
133
+ mixtral_lm.generate("[INST] What is Keras? [/INST]", max_length=500)
134
+
135
+ # Generate with batched prompts
136
+ mixtral_lm.generate([
137
+ "[INST] What is Keras? [/INST]",
138
+ "[INST] Give me your best brownie recipe. [/INST]"
139
+ ], max_length=500)
140
+
141
+ # Using different sampling strategies
142
+ mixtral_lm = keras_hub.models.MixtralCausalLM.from_preset("hf://keras/mixtral_8_instruct_7b_en")
143
+ # Greedy sampling
144
+ mixtral_lm.compile(sampler="greedy")
145
+ mixtral_lm.generate("I want to say", max_length=30)
146
+
147
+ # Beam search
148
+ mixtral_lm.compile(
149
+ sampler=keras_hub.samplers.BeamSampler(
150
+ num_beams=2,
151
+ top_k_experts=2, # MoE-specific: number of experts to use per token
152
+ )
153
+ )
154
+ mixtral_lm.generate("I want to say", max_length=30)
155
+
156
+ # Generate without preprocessing
157
+ prompt = {
158
+ "token_ids": np.array([[1, 315, 947, 298, 1315, 0, 0, 0, 0, 0]] * 2),
159
+ "padding_mask": np.array([[1, 1, 1, 1, 1, 0, 0, 0, 0, 0]] * 2),
160
+ }
161
+
162
+ mixtral_lm = keras_hub.models.MixtralCausalLM.from_preset(
163
+ "hf://keras/mixtral_8_instruct_7b_en",
164
+ preprocessor=None,
165
+ dtype="bfloat16"
166
+ )
167
+ mixtral_lm.generate(
168
+ prompt,
169
+ num_experts=8, # Total number of experts per layer
170
+ top_k_experts=2, # Number of experts to use per token
171
+ router_aux_loss_coef=0.02 # Router auxiliary loss coefficient
172
+ )
173
+
174
+ # Training on a single batch
175
+ features = ["The quick brown fox jumped.", "I forgot my homework."]
176
+ mixtral_lm = keras_hub.models.MixtralCausalLM.from_preset(
177
+ "hf://keras/mixtral_8_instruct_7b_en",
178
+ dtype="bfloat16"
179
+ )
180
+ mixtral_lm.fit(
181
+ x=features,
182
+ batch_size=2,
183
+ router_aux_loss_coef=0.02 # MoE-specific: router training loss
184
+ )
185
+
186
+ # Training without preprocessing
187
+ x = {
188
+ "token_ids": np.array([[1, 315, 947, 298, 1315, 369, 315, 837, 0, 0]] * 2),
189
+ "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
190
+ }
191
+ y = np.array([[315, 947, 298, 1315, 369, 315, 837, 0, 0, 0]] * 2)
192
+ sw = np.array([[1, 1, 1, 1, 1, 1, 1, 0, 0, 0]] * 2)
193
+
194
+ mixtral_lm = keras_hub.models.MixtralCausalLM.from_preset(
195
+ "hf://keras/mixtral_8_instruct_7b_en",
196
+ preprocessor=None,
197
+ dtype="bfloat16"
198
+ )
199
+ mixtral_lm.fit(
200
+ x=x,
201
+ y=y,
202
+ sample_weight=sw,
203
+ batch_size=2,
204
+ router_aux_loss_coef=0.02
205
+ )
206
+ ```