How did you go from merging 8B models, to get a 4.65B model?

#1
by HenkPoley - opened

The output has much less weights and biases. πŸ€”

Owner

@HenkPoley Great question , I also had that thought while getting the reduced parameter size . The reduction to 4.65B parameters might probably be due to the linear merging method, which averages out weights and can lead to a decrease in the number of unique parameters if there is redundancy or overlap between the models. Additionally, the 3 models that I used for merging might share certain architectural similarities, such as redundant layers or weights causing them to collapse into a single set of parameters during the merge, reducing the overall count.

Sign up or log in to comment