Request: A GGML edition of extended Airophin v2+.

#2
by SabinStargem - opened

I recently upgraded my machine to have 128gb and a CUDA card, so I have been playing a lot with the Ycros GGML versions of extended Airoboros v1.4.1. They are very good for Llama 1 models, but it seems like that Vicuna v1.5 13b-16k is almost at the level of Airo 33b-16k.

This makes me interested in trying an L2 Airo with extended context. My personal preference would be an Airophin v2+ 16k. I am hoping for a v3 version, since the v2.1 Airoboros data set has been released at Durbin's github - it is supposed to address a variety of shortcomings in L2 models and Airo 2.0.

Anyhow, thanks for making the existing pytorches. The GGML versions have been fun to play with. :)

Yeah the L2-13b models become quite competitive with the L1-33b models (aside from the intermittent repetition problems with L2). Is your preference for Airophin v2 over v1 because of observed performance or because it employs PI vs PNTK (since PNTK is more of a pain to get working)?

I tried to find the v2.1 dataset, but I don't see it in Durbin's HF datasets. (edit: Looks like it's there now!)

A 16k 13b PI model will require me to kick off a new pretraining run, unless you're aware of any new 13b dolphin/orca 16k models? I might just kick that off anyway, as the v2 results didn't really improve much other than what might be expected with a smaller extension factor.

I don't know how to use FP16 models, so I haven't been able to test any of the Airophins. I assumed that the v2 might be better than the v1 in terms of quality, since it had v2.0m of Airoboros. It is my assumption is that it costs a pretty penny to make models, so I only wanted what I hoped to be the better version. If I can get both quality and length, then that would be terrific.

With GGUF now working in Kobold and in the wild, PNTK or other approaches might be easier to work with now.

EDIT: Huh. Meta released a 34b foundation, Code Llama. The interesting thing is that it is apparently trained on at least 16k context? I am VERY interested in trying an Airoboros v2.1 L2-34b-16k.

Sign up or log in to comment