Context window
#1
by
Danioken
- opened
Llama 3.1, but only 8k window context? What does it result from, the actual efficiency? Do you know how the model behaves above 8k? Do we need to use ROPE above 8k?
The first tests are great for 8k, I really like your model.
I have only tested 8k, but its context window is as high as L3.1 can go, in theory 128k (although most training on the model is less than 8k, so I am not sure how effective the higher values will be)
ok, thanks.
Danioken
changed discussion status to
closed