mjbuehler commited on
Commit
13d864f
·
verified ·
1 Parent(s): 919b160

Update README.md

Browse files

A few edits for improved clarity

Files changed (1) hide show
  1. README.md +19 -15
README.md CHANGED
@@ -45,7 +45,7 @@ This version of Cephalo, lamm-mit/Cephalo-Idefics2-3x8b-beta, is a Mixture-of-Ex
45
 
46
  The model has 20b parameters (3 experts, each 8b each, 8b active parameters during inference).
47
 
48
- ### Download Idefics-2 MoE Model and Sample inference code
49
 
50
  ```python
51
  pip install transformers -U
@@ -74,7 +74,7 @@ moe_model = AutoModelForCausalLM.from_pretrained(
74
  count_parameters(moe_model)
75
  ```
76
 
77
- Now use downloaded model for inference:
78
 
79
  ```python
80
  from transformers.image_utils import load_image
@@ -157,9 +157,12 @@ Download models that will form the experts, as well as the base model. As a simp
157
  2) A chatty version: HuggingFaceM4/idefics2-8b-chatty (model_1) (model_2)
158
  3) A basic variant: HuggingFaceM4/idefics2-8b (model_3)
159
 
 
 
160
  ```python
161
  from transformers import AutoProcessor, Idefics2ForConditionalGeneration , AutoTokenizer
162
  from transformers import BitsAndBytesConfig
 
163
 
164
  DEVICE='cuda'
165
 
@@ -210,6 +213,8 @@ model_3.to(DEVICE)
210
 
211
  Here we show how a MoE is constructed from the set of expert models loaded earlier. We consider three models, model_1, model_2 and model_3.
212
 
 
 
213
  ```python
214
  dtype = torch.bfloat16 # Desired dtype for new layers
215
  base_model = copy.deepcopy(model_1) # Your base model
@@ -264,14 +269,14 @@ print(generated_texts)
264
  We train the gating layers by providing sample images/prompts for each of the three experts. Here is a simple example training set:
265
 
266
  ```python
267
- image_1 = Image.open("./VALIDATION/Q15.jpg")
268
- image_1a = Image.open("./VALIDATION/Q31.jpg")
269
 
270
- image_2 = Image.open(requests.get("https://media.wired.com/photos/5aa32b912ba43111d1213e0c/master/w_2240,c_limit/akhacouple.jpg", stream=True).raw)
271
- image_2a = Image.open(requests.get("https://media.wired.com/photos/5aa32b912ba43111d1213e0c/master/w_2240,c_limit/akhacouple.jpg", stream=True).raw)
272
 
273
- image_3 = Image.open(requests.get("https://i5.walmartimages.com/seo/Amazing-Andrea-Apple-Tree-Seeds-20-Seeds-Grow-Fresh-Apples_ff218043-bcd4-4437-8418-6631d8e97bb3.638ac0120ff05c8913e85ebb74f45f6c.jpeg?odnHeight=640&odnWidth=640&odnBg=FFFFFF", stream=True).raw)
274
- image_3a = Image.open(requests.get("https://i5.walmartimages.com/seo/Amazing-Andrea-Apple-Tree-Seeds-20-Seeds-Grow-Fresh-Apples_ff218043-bcd4-4437-8418-6631d8e97bb3.638ac0120ff05c8913e85ebb74f45f6c.jpeg?odnHeight=640&odnWidth=640&odnBg=FFFFFF", stream=True).raw)
275
 
276
  prompts_per_expert = [
277
  [{"text": "User:<image>What is shown in this image. Explain the importance for materials design.<end_of_utterance>Assistant: The image shows", "image": [image_1]},
@@ -282,21 +287,21 @@ prompts_per_expert = [
282
  {"text": "User:<image>What is shown in this image, and what does it mean in terms of human history? <end_of_utterance>Assistant: The image shows a historical image of human development.", "image": [image_2a]},
283
  ],
284
 
285
- [{"text": "User:<image>What is shown in this image. Provide a brief answer. <end_of_utterance>Assistant: This is an apple.", "image": [image_3]},
286
  {"text": "User:<image>What is shown in this image. Brief and concise answer. <end_of_utterance>Assistant: The image shows an apple.", "image": [image_3a]},
287
  ],
288
  ]
289
 
290
- gating_layer_params = moe_model.train_gating_layer_params_from_hidden_states(processor, prompts_per_expert,
291
- epochs=1000, loss_steps=100, lr=5e-5, layer_offset=0)
 
292
 
293
- # Set parameters for a specific layer
294
  moe_model.set_gating_layer_params(gating_layer_params)
295
  ```
296
 
297
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/mh4eFDuFsTBOYbjc38PYz.png)
298
 
299
-
300
  Now that the MoE model has been trained, we can try inference. Inference after MoE gating layers are trained:
301
 
302
  ```python
@@ -324,7 +329,7 @@ inputs = {k: v.to(DEVICE) for k, v in inputs.items()}
324
  generated_ids = moe_model.generate(**inputs, max_new_tokens=500)
325
  generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
326
 
327
- print(generated_texts)
328
  ```
329
 
330
  ### Push to hub and save locally
@@ -343,7 +348,6 @@ Save locally:
343
  ```python
344
  processor.save_pretrained(moe_name, )
345
  moe_model.save_pretrained(moe_name, )
346
-
347
  ```
348
 
349
  Loading the model works as done above. Here included again for completeness:
 
45
 
46
  The model has 20b parameters (3 experts, each 8b each, 8b active parameters during inference).
47
 
48
+ ## Download Idefics-2 MoE Model and Sample inference code
49
 
50
  ```python
51
  pip install transformers -U
 
74
  count_parameters(moe_model)
75
  ```
76
 
77
+ Now use the downloaded MoE model for inference:
78
 
79
  ```python
80
  from transformers.image_utils import load_image
 
157
  2) A chatty version: HuggingFaceM4/idefics2-8b-chatty (model_1) (model_2)
158
  3) A basic variant: HuggingFaceM4/idefics2-8b (model_3)
159
 
160
+ One (or another model) must be used as base model, from which the vision model, connector, self-attention, etc. are used. From the list of models provided as experts, the feed forward layers are used. Each model will become one expert.
161
+
162
  ```python
163
  from transformers import AutoProcessor, Idefics2ForConditionalGeneration , AutoTokenizer
164
  from transformers import BitsAndBytesConfig
165
+ from Idefics2_MoE.moe_idefics2 import *
166
 
167
  DEVICE='cuda'
168
 
 
213
 
214
  Here we show how a MoE is constructed from the set of expert models loaded earlier. We consider three models, model_1, model_2 and model_3.
215
 
216
+ First, we designate the base model (here we use a deep copy of model_1) and the list of experts. We first create a config, then the moe_model. The config is based on the Idefics2 config from model_1, loaded above.
217
+
218
  ```python
219
  dtype = torch.bfloat16 # Desired dtype for new layers
220
  base_model = copy.deepcopy(model_1) # Your base model
 
269
  We train the gating layers by providing sample images/prompts for each of the three experts. Here is a simple example training set:
270
 
271
  ```python
272
+ image_1 = Image.open("./Image_1.jpg")
273
+ image_1a =Image.open("./Image_1b.jpg")
274
 
275
+ image_2 = Image.open("./Image_2.jpg")
276
+ image_2a =Image.open("./Image_2b.jpg")
277
 
278
+ image_3 = Image.open("./Image_3.jpg")
279
+ image_3a =Image.open("./Image_3b.jpg")
280
 
281
  prompts_per_expert = [
282
  [{"text": "User:<image>What is shown in this image. Explain the importance for materials design.<end_of_utterance>Assistant: The image shows", "image": [image_1]},
 
287
  {"text": "User:<image>What is shown in this image, and what does it mean in terms of human history? <end_of_utterance>Assistant: The image shows a historical image of human development.", "image": [image_2a]},
288
  ],
289
 
290
+ [{"text": "User:<image>What is shown in this image. Provide a brief answer. <end_of_utterance>Assistant: This is an apple, a fruit with good flavor.", "image": [image_3]},
291
  {"text": "User:<image>What is shown in this image. Brief and concise answer. <end_of_utterance>Assistant: The image shows an apple.", "image": [image_3a]},
292
  ],
293
  ]
294
 
295
+ gating_layer_params = moe_model.train_gating_layer_params_from_hidden_states(processor,
296
+ prompts_per_expert,
297
+ epochs=1000, loss_steps=100, lr=5e-5, )
298
 
299
+ # Set parameters for a specific layer
300
  moe_model.set_gating_layer_params(gating_layer_params)
301
  ```
302
 
303
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/mh4eFDuFsTBOYbjc38PYz.png)
304
 
 
305
  Now that the MoE model has been trained, we can try inference. Inference after MoE gating layers are trained:
306
 
307
  ```python
 
329
  generated_ids = moe_model.generate(**inputs, max_new_tokens=500)
330
  generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
331
 
332
+ print(generated_texts[0])
333
  ```
334
 
335
  ### Push to hub and save locally
 
348
  ```python
349
  processor.save_pretrained(moe_name, )
350
  moe_model.save_pretrained(moe_name, )
 
351
  ```
352
 
353
  Loading the model works as done above. Here included again for completeness: