One or two models during inference?
#3
by
Venkman42
- opened
Hi there,
Does this model select one of the models during inference or does it use both?
So is the inference speed comparable to a 7b model or 13b?
Just curious since the 8x7b mixtral models use two models during inference as far as I know
I think this is decided by setting of num_experts_per_tok.
I think this is decided by setting of num_experts_per_tok.
How do you set "num_experts_per_tok" config?
@SamuelAzran config.json