"head_dim": 80

#2
by rjmehta - opened

Can "head_dim": 128 match the qwen3 head_dim?

Really appreciate if you could retrain the model with head_dim set to a power of two?

AngelSlim org

Really appreciate if you could retrain the model with head_dim set to a power of two?

Hi, our head_dim selection in training is based on hidden_size/num_attention_heads, which aligns with the config settings of Qwen3

RuntimeError: Error in function 'VariableLengthMergeStates' at /opt/conda/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/cascade.cuh:692: Unsupported head_dim: 80

In Qwen/Qwen3-32B config.json, the head_dim is 128

从你们展示的数据来看,Qwen/Qwen3-32B的加速比远低于Qwen3-14B和Qwen3-8B,有没有可能是Head dim的原因

Sign up or log in to comment