"head_dim": 80

by rjmehta - opened about 1 month ago

Discussion

rjmehta

about 1 month ago

Can "head_dim": 128 match the qwen3 head_dim?

rjmehta

20 days ago

Really appreciate if you could retrain the model with head_dim set to a power of two?

chaozeng123

AngelSlim org 18 days ago

Really appreciate if you could retrain the model with head_dim set to a power of two?

Hi, our head_dim selection in training is based on hidden_size/num_attention_heads, which aligns with the config settings of Qwen3

WWWSimon

16 days ago

RuntimeError: Error in function 'VariableLengthMergeStates' at /opt/conda/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/cascade.cuh:692: Unsupported head_dim: 80

WWWSimon

16 days ago

In Qwen/Qwen3-32B config.json, the head_dim is 128

WWWSimon

13 days ago

从你们展示的数据来看，Qwen/Qwen3-32B的加速比远低于Qwen3-14B和Qwen3-8B，有没有可能是Head dim的原因

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment