Context window

#1
by Danioken - opened

Llama 3.1, but only 8k window context? What does it result from, the actual efficiency? Do you know how the model behaves above 8k? Do we need to use ROPE above 8k?

The first tests are great for 8k, I really like your model.

I have only tested 8k, but its context window is as high as L3.1 can go, in theory 128k (although most training on the model is less than 8k, so I am not sure how effective the higher values will be)

ok, thanks.

Danioken changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment