Running into "Flash attention implementation does not support kwargs: prompt_length" when using the exact example from the Readme

#49

by HotSauce7 - opened Oct 15, 2024

Oct 15, 2024

Hi folks,
thanks for the amazing work. Unfortunately when I use the exact example from your Readme (Model Card), which is:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("jinaai/jina-embeddings-v3", trust_remote_code=True)

task = "retrieval.query"
embeddings = model.encode(
    ["What is the weather like in Berlin today?"],
    task=task,
    prompt_name=task,
)

with both sentence-transformers==3.2.0 and sentence-transformers==3.1.0 I am getting the warning Flash attention implementation does not support kwargs: prompt_length. If I remove prompt_name=task this does not occur, but creates totally different embeddings.
I have flash-attn==2.6.2 installed.
Any ideas what I am missing?

Mahendranath

Oct 19, 2024

Have you found the solution for it?

ivan-nenchev

Oct 20, 2024

I am experiencing the same issue. Any advice would be greatly appreciated.

q-zhang

Oct 22, 2024

•

edited Oct 22, 2024

# https://huggingface.co/jinaai/xlm-roberta-flash-implementation/blob/main/modeling_xlm_roberta.py
# line 671
        adapter_mask = kwargs.pop("adapter_mask", None)
        if kwargs:
            for key, value in kwargs.items():
                if value is not None:
                    logger.warning(
                        "Flash attention implementation does not support kwargs: %s",
                        key,
                    )

The file could be find in path ~/.cache/huggingface/modules/transformers_modules/jinaai/xlm-roberta-flash-implementation/12700ba4972d9e900313a85ae855f5a76fb9500e

maybe we could decrease the logger level to debug, to rm the warning.

jupyterjazz

Jina AI org Oct 22, 2024

Hi, thanks for reporting the issue. This warning didn't really impact the model outputs but I still made a modification to stop passing prompt_length to the model, so you shouldn't see the warning anymore.

As for the argument itself, it seems some models need prompt_length to not include the prompt during pooling, but jina-embeddings-v3 doesn't do this, so it's not relevant for us.

HotSauce7

Oct 23, 2024

@jupyterjazz Thank you for the quick response. It was rather an issue of confusion if the implementation with prompt_name=task was correct. Clarification and implementation of the solution makes perfect sense. Closing the issue!

HotSauce7 changed discussion status to closed Oct 23, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment