Script?

#1
by Vezora - opened

Could you please potentially share with me the edited script? I’m having trouble MOE’ing phi.

This is what I got.
beware this will only work with 2 phis, you might have to tinker in the naming thing for more layers

  1. Modify moe_mixtral.py from /content/mergekit/mergekit/scripts/mixtral_moe.py from here

mixtral_moe.py

  1. Modify architecture.py /content/mergekit/mergekit/architecture.py
    (this you can take from the link to the commit i have in description or from below )

architecture.py

after all the file modifications and run, you need to replace config.json with the one from this repo
after that you need to add modeling_phi.py and configurations.phi from this repo to your repo

Thank you so much you are the 🐐! I Really appreciate it! Thank you!

@Vezora did you succeed? I am having trouble modifying mixtral_moe.py. What needs to be changed there?
@paulilioaica Should i use the mixtal_moe.py from here https://github.com/paulilioaica/Phi-MOE/blob/main/mixtral_moe.py

@vhug just go ahead and replace the whole file with the modified one

@paulilioaica but in your mixtral_moe.py you are using PHI2_INFO which leads to key error while using phi2 model because of layer names like "transformer.embd.wte.weight" while the phi2 model layers go by names like "model.embed_tokens.weight".

Is your mixtral_moe.py for phi2 or for phi1?

I ran this MergeKitNotebook with the 2 files overwritten and it works fine for me. Are you sure you replaced both architecture.py and the mixtral_moe.py ?

This is for Phi2

@paulilioaica your script actually works for the models in your merge-config.yaml. I replaced the 2 files and ran the mixtral_moe.py and it works just fine. But I am now confused, I checked the model layer names of "phi-2-orange" and the model layers from "phi-2" from microsoft and they differ. Why is that? Attached screenshot below where the left side is the print statement for phi-2 model from microsoft and on the right side is the print statement of the model layers for phi-2-orange.

image.png

Looks like they merged the official Phi2 code into the Transformer library and deleted the custom code
https://huggingface.co/microsoft/phi-2/commit/da135b7268f02aaa1f591fdd29b6be896008c798

@paulilioaica what does that mean?

Sign up or log in to comment