model has duplicated tensor layers

#1
by lunahr - opened

Gemma 2 ordinarily is supposed to have 2.67B parameters, however due to a defect in mergekit, some tensor layers have been duplicated creating a 3.2B parameter model instead.

This is weird.

Thanks for pointing this out, seems to be some bug in the way mergekit handles the gemma architecture, seen this on a few gemma based merges. Manually removed the extra parameters and updated config files to reflect, now down to the expected size and runs normally!

https://github.com/arcee-ai/mergekit/issues/385

Sign up or log in to comment