Flash attention ?

#3
by edmond - opened

It says flash attention 2 is not available for this model.
But I was thinking maybe I could only use FA2 for the gemma part, do you think this is possible ?

Google org

@edmond you can check out TGI's flash PaliGemma implementation here it's implemented for vision head but doesn't have as much effect as flash in decoder

edmond changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment