Update README.md
Browse filesA few edits for improved clarity
README.md
CHANGED
@@ -45,7 +45,7 @@ This version of Cephalo, lamm-mit/Cephalo-Idefics2-3x8b-beta, is a Mixture-of-Ex
|
|
45 |
|
46 |
The model has 20b parameters (3 experts, each 8b each, 8b active parameters during inference).
|
47 |
|
48 |
-
|
49 |
|
50 |
```python
|
51 |
pip install transformers -U
|
@@ -74,7 +74,7 @@ moe_model = AutoModelForCausalLM.from_pretrained(
|
|
74 |
count_parameters(moe_model)
|
75 |
```
|
76 |
|
77 |
-
Now use downloaded model for inference:
|
78 |
|
79 |
```python
|
80 |
from transformers.image_utils import load_image
|
@@ -157,9 +157,12 @@ Download models that will form the experts, as well as the base model. As a simp
|
|
157 |
2) A chatty version: HuggingFaceM4/idefics2-8b-chatty (model_1) (model_2)
|
158 |
3) A basic variant: HuggingFaceM4/idefics2-8b (model_3)
|
159 |
|
|
|
|
|
160 |
```python
|
161 |
from transformers import AutoProcessor, Idefics2ForConditionalGeneration , AutoTokenizer
|
162 |
from transformers import BitsAndBytesConfig
|
|
|
163 |
|
164 |
DEVICE='cuda'
|
165 |
|
@@ -210,6 +213,8 @@ model_3.to(DEVICE)
|
|
210 |
|
211 |
Here we show how a MoE is constructed from the set of expert models loaded earlier. We consider three models, model_1, model_2 and model_3.
|
212 |
|
|
|
|
|
213 |
```python
|
214 |
dtype = torch.bfloat16 # Desired dtype for new layers
|
215 |
base_model = copy.deepcopy(model_1) # Your base model
|
@@ -264,14 +269,14 @@ print(generated_texts)
|
|
264 |
We train the gating layers by providing sample images/prompts for each of the three experts. Here is a simple example training set:
|
265 |
|
266 |
```python
|
267 |
-
image_1 = Image.open("./
|
268 |
-
image_1a =
|
269 |
|
270 |
-
image_2 = Image.open(
|
271 |
-
image_2a =
|
272 |
|
273 |
-
image_3 = Image.open(
|
274 |
-
image_3a =
|
275 |
|
276 |
prompts_per_expert = [
|
277 |
[{"text": "User:<image>What is shown in this image. Explain the importance for materials design.<end_of_utterance>Assistant: The image shows", "image": [image_1]},
|
@@ -282,21 +287,21 @@ prompts_per_expert = [
|
|
282 |
{"text": "User:<image>What is shown in this image, and what does it mean in terms of human history? <end_of_utterance>Assistant: The image shows a historical image of human development.", "image": [image_2a]},
|
283 |
],
|
284 |
|
285 |
-
[{"text": "User:<image>What is shown in this image. Provide a brief answer. <end_of_utterance>Assistant: This is an apple.", "image": [image_3]},
|
286 |
{"text": "User:<image>What is shown in this image. Brief and concise answer. <end_of_utterance>Assistant: The image shows an apple.", "image": [image_3a]},
|
287 |
],
|
288 |
]
|
289 |
|
290 |
-
gating_layer_params = moe_model.train_gating_layer_params_from_hidden_states(processor,
|
291 |
-
|
|
|
292 |
|
293 |
-
# Set parameters for a specific layer
|
294 |
moe_model.set_gating_layer_params(gating_layer_params)
|
295 |
```
|
296 |
|
297 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/mh4eFDuFsTBOYbjc38PYz.png)
|
298 |
|
299 |
-
|
300 |
Now that the MoE model has been trained, we can try inference. Inference after MoE gating layers are trained:
|
301 |
|
302 |
```python
|
@@ -324,7 +329,7 @@ inputs = {k: v.to(DEVICE) for k, v in inputs.items()}
|
|
324 |
generated_ids = moe_model.generate(**inputs, max_new_tokens=500)
|
325 |
generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
|
326 |
|
327 |
-
print(generated_texts)
|
328 |
```
|
329 |
|
330 |
### Push to hub and save locally
|
@@ -343,7 +348,6 @@ Save locally:
|
|
343 |
```python
|
344 |
processor.save_pretrained(moe_name, )
|
345 |
moe_model.save_pretrained(moe_name, )
|
346 |
-
|
347 |
```
|
348 |
|
349 |
Loading the model works as done above. Here included again for completeness:
|
|
|
45 |
|
46 |
The model has 20b parameters (3 experts, each 8b each, 8b active parameters during inference).
|
47 |
|
48 |
+
## Download Idefics-2 MoE Model and Sample inference code
|
49 |
|
50 |
```python
|
51 |
pip install transformers -U
|
|
|
74 |
count_parameters(moe_model)
|
75 |
```
|
76 |
|
77 |
+
Now use the downloaded MoE model for inference:
|
78 |
|
79 |
```python
|
80 |
from transformers.image_utils import load_image
|
|
|
157 |
2) A chatty version: HuggingFaceM4/idefics2-8b-chatty (model_1) (model_2)
|
158 |
3) A basic variant: HuggingFaceM4/idefics2-8b (model_3)
|
159 |
|
160 |
+
One (or another model) must be used as base model, from which the vision model, connector, self-attention, etc. are used. From the list of models provided as experts, the feed forward layers are used. Each model will become one expert.
|
161 |
+
|
162 |
```python
|
163 |
from transformers import AutoProcessor, Idefics2ForConditionalGeneration , AutoTokenizer
|
164 |
from transformers import BitsAndBytesConfig
|
165 |
+
from Idefics2_MoE.moe_idefics2 import *
|
166 |
|
167 |
DEVICE='cuda'
|
168 |
|
|
|
213 |
|
214 |
Here we show how a MoE is constructed from the set of expert models loaded earlier. We consider three models, model_1, model_2 and model_3.
|
215 |
|
216 |
+
First, we designate the base model (here we use a deep copy of model_1) and the list of experts. We first create a config, then the moe_model. The config is based on the Idefics2 config from model_1, loaded above.
|
217 |
+
|
218 |
```python
|
219 |
dtype = torch.bfloat16 # Desired dtype for new layers
|
220 |
base_model = copy.deepcopy(model_1) # Your base model
|
|
|
269 |
We train the gating layers by providing sample images/prompts for each of the three experts. Here is a simple example training set:
|
270 |
|
271 |
```python
|
272 |
+
image_1 = Image.open("./Image_1.jpg")
|
273 |
+
image_1a =Image.open("./Image_1b.jpg")
|
274 |
|
275 |
+
image_2 = Image.open("./Image_2.jpg")
|
276 |
+
image_2a =Image.open("./Image_2b.jpg")
|
277 |
|
278 |
+
image_3 = Image.open("./Image_3.jpg")
|
279 |
+
image_3a =Image.open("./Image_3b.jpg")
|
280 |
|
281 |
prompts_per_expert = [
|
282 |
[{"text": "User:<image>What is shown in this image. Explain the importance for materials design.<end_of_utterance>Assistant: The image shows", "image": [image_1]},
|
|
|
287 |
{"text": "User:<image>What is shown in this image, and what does it mean in terms of human history? <end_of_utterance>Assistant: The image shows a historical image of human development.", "image": [image_2a]},
|
288 |
],
|
289 |
|
290 |
+
[{"text": "User:<image>What is shown in this image. Provide a brief answer. <end_of_utterance>Assistant: This is an apple, a fruit with good flavor.", "image": [image_3]},
|
291 |
{"text": "User:<image>What is shown in this image. Brief and concise answer. <end_of_utterance>Assistant: The image shows an apple.", "image": [image_3a]},
|
292 |
],
|
293 |
]
|
294 |
|
295 |
+
gating_layer_params = moe_model.train_gating_layer_params_from_hidden_states(processor,
|
296 |
+
prompts_per_expert,
|
297 |
+
epochs=1000, loss_steps=100, lr=5e-5, )
|
298 |
|
299 |
+
# Set parameters for a specific layer
|
300 |
moe_model.set_gating_layer_params(gating_layer_params)
|
301 |
```
|
302 |
|
303 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/mh4eFDuFsTBOYbjc38PYz.png)
|
304 |
|
|
|
305 |
Now that the MoE model has been trained, we can try inference. Inference after MoE gating layers are trained:
|
306 |
|
307 |
```python
|
|
|
329 |
generated_ids = moe_model.generate(**inputs, max_new_tokens=500)
|
330 |
generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
|
331 |
|
332 |
+
print(generated_texts[0])
|
333 |
```
|
334 |
|
335 |
### Push to hub and save locally
|
|
|
348 |
```python
|
349 |
processor.save_pretrained(moe_name, )
|
350 |
moe_model.save_pretrained(moe_name, )
|
|
|
351 |
```
|
352 |
|
353 |
Loading the model works as done above. Here included again for completeness:
|