Context window

by Danioken - opened 13 days ago

13 days ago

Llama 3.1, but only 8k window context? What does it result from, the actual efficiency? Do you know how the model behaves above 8k? Do we need to use ROPE above 8k?

The first tests are great for 8k, I really like your model.

Dunjeon

Owner 12 days ago

•

edited 12 days ago

I have only tested 8k, but its context window is as high as L3.1 can go, in theory 128k (although most training on the model is less than 8k, so I am not sure how effective the higher values will be)

Danioken

12 days ago

ok, thanks.

Danioken changed discussion status to closed 12 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment