Spaces:

Do0rMaMu
/

Factory-POC

Sleeping

App Files Files Community

Factory-POC / flash-attention /csrc /ft_attention /README.md

Do0rMaMu

Upload folder using huggingface_hub

e45d058 verified 8 months ago

preview code

raw

history blame contribute delete

710 Bytes

Attention kernel from FasterTransformer

This CUDA extension wraps the single-query attention kernel from FasterTransformer v5.2.1 for benchmarking purpose.

cd csrc/ft_attention && pip install .

As of 2023-09-17, this extension is no longer used in the FlashAttention repo. FlashAttention now has implemented flash_attn_with_kvcache with all the features of this ft_attention kernel (and more).