Text Generation
Transformers
Safetensors
English
falcon_mamba
Eval Results
Inference Endpoints

How to use sequential prefill with transformers?

#7
by Juodumas - opened

As described in the blog https://huggingface.co/blog/falconmamba, sequential prefill enables long context. Is there any example code how to use it?

Ooooh the question was asked 17 days ago.
I tried running the model falcon-mamba-7b-instruct on Runpod instance 24 GB without any special parameters with a large context (40 documents) and I got a GPU memory error.
Another ticket has been opened: https://huggingface.co/tiiuae/falcon-mamba-7b-instruct/discussions/5
I hope someone can help us. :)

Sign up or log in to comment