Precisions about the config properties wrt the paper

#5
by TomSchelsen - opened

In https://huggingface.co/answerdotai/ModernBERT-base/blob/main/config.json , we see "hidden_activation": "gelu" and "position_embedding_type": "absolute" (even though rope related settings do appear in the config as well), whereas the paper says that GeGLU and RoPE are used respectively. Is it expected and a strangeness coming from the transformers library itself or is it a misconfig/export ? Thanks

Answer.AI org

As we mention in the paper, GeGLU is GLU with GeLU instead of sigmoid. "hidden_activation": "gelu" is correct.

We adopt GeGLU (Shazeer, 2020), a Gated-Linear Units (GLU)-based (Dauphin et al., 2017) activation function built on top of the original BERT’s GeLU.

I believe position_embedding_type is a default config argument in transformers. ModernBERT doesn't use it, I'll have to check if we can remove it from the config.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment