Sequential Prefilling
#5
by
badgergy
- opened
How to enable the capability of aribitrary context length ?
I've tried the official demo code, while still encounter the CUDA OOM
I've got the same error. In other models I used past_key_values to load context token by token, but falcon_mamba architecture doesn't have it. However I see no setting that enables sequential prefill, even in the demo code of tiiuae...