How to use sequential prefill with transformers?
#7
by
Juodumas
- opened
As described in the blog https://huggingface.co/blog/falconmamba, sequential prefill enables long context. Is there any example code how to use it?
Ooooh the question was asked 17 days ago.
I tried running the model falcon-mamba-7b-instruct
on Runpod instance 24 GB without any special parameters with a large context (40 documents) and I got a GPU memory error.
Another ticket has been opened: https://huggingface.co/tiiuae/falcon-mamba-7b-instruct/discussions/5
I hope someone can help us. :)