Amazing model

#5
by lightning-missile - opened

After testing for creative writing, this seems like Luminum and mistral-large combined. This might be one of my daily drivers along with original Luminum.
Thanks for this model!

If you ever find the space try the 195b one. Yes it's huge, but you're in for a treat.i run the exl2-4.0 on 3 A40 on runpod. And today for the first time with XTC.

TheDrummer made a new finetune for Mistral large... Making a exl2 quant right now.

Anyways, it's s a pleasure.

If you ever find the space try the 195b one. Yes it's huge, but you're in for a treat.i run the exl2-4.0 on 3 A40 on runpod. And today for the first time with XTC.

TheDrummer made a new finetune for Mistral large... Making a exl2 quant right now.

Anyways, it's s a pleasure.

By any chance could you make a 6.0bpw of the 195b one?

will do when i am done with my current merges and quants.

will do when i am done with my current merges and quants.

Thank you so much!

If you ever find the space try the 195b one. Yes it's huge, but you're in for a treat.i run the exl2-4.0 on 3 A40 on runpod. And today for the first time with XTC.

TheDrummer made a new finetune for Mistral large... Making a exl2 quant right now.

Anyways, it's s a pleasure.

I'll see if I can try the 195B one, I have no idea how to run an exl model, I'm using kobold. BTW, what's your XTC settings? How did it go?

for XTC i used min_p=0.02, DRY with 0.75, XTC threshold=0.1 and probability I varied between 0.5 down to 0.1.
I like the feeling of new creativity, but for RP it makes the model stupid. It feels to me like the model is doing the opposite of what should be done, but I guess that's basically how XTC works. Sometimes it breaks coherence, as it forces the model to select the less probable tokens. But I admit I haven't played much with different settings as I am also testing my merges with TheDrummer/Behemoth-123B-v1.

If you ever find the space try the 195b one. Yes it's huge, but you're in for a treat.i run the exl2-4.0 on 3 A40 on runpod. And today for the first time with XTC.

TheDrummer made a new finetune for Mistral large... Making a exl2 quant right now.

Anyways, it's s a pleasure.

By any chance could you make a 6.0bpw of the 195b one?

Hi again. Is this the 195b? https://huggingface.co/schnapper79/lumikabra-195B_v0.3 but this is v0.3, not v0.4. Are there differences?

V0.4 with the same merge configuration was extremely stupid. Funny thing. So yeah, I made only v0.3 as 195b model

I made a 5.0bpw exl2 of the 195B model, hoping I could cram 32k context on 3xA40s ... so close... my own personal experience with testing exl2 quants is the amount of loss when you get below 5.0bpw really starts to ramp up. So the closer I can stay to that the happier I am but I am spoiled now and want 32k of context :D Still, I enjoy the 4.0bpw 195B

How much context did you get with 5.0bpw?

How much context did you get with 5.0bpw?

I loaded it at 16k fine, then tried 32k and it got to the end and ran out of VRAM. I don't use a cache because I have found issues with output that only occur with a cache enabled.
I didn't try to see what I could cram between 16k and 32k

How much context did you get with 5.0bpw?

I loaded it at 16k fine, then tried 32k and it got to the end and ran out of VRAM. I don't use a cache because I have found issues with output that only occur with a cache enabled.
I didn't try to see what I could cram between 16k and 32k

I don't use exl2 quants, but is it possible you use a context in between? Koboldcp has 24k context.

If you ever find the space try the 195b one. Yes it's huge, but you're in for a treat.i run the exl2-4.0 on 3 A40 on runpod. And today for the first time with XTC.

TheDrummer made a new finetune for Mistral large... Making a exl2 quant right now.

Anyways, it's s a pleasure.

Ug, I want to try the 195B, but there's no GGUF available. Besides, I don't think I can pay for the required gpus, I also use runpod. I'm using 2X A100-SXM4, 160 GB in total. I think the Q8 gguf of the 195B wouldn't fit at all.

How much context did you get with 5.0bpw?

I loaded it at 16k fine, then tried 32k and it got to the end and ran out of VRAM. I don't use a cache because I have found issues with output that only occur with a cache enabled.
I didn't try to see what I could cram between 16k and 32k

I don't use exl2 quants, but is it possible you use a context in between? Koboldcp has 24k context.

Yes, you can type in any value you want. 24k quite possibly could have fit

If you ever find the space try the 195b one. Yes it's huge, but you're in for a treat.i run the exl2-4.0 on 3 A40 on runpod. And today for the first time with XTC.

TheDrummer made a new finetune for Mistral large... Making a exl2 quant right now.

Anyways, it's s a pleasure.

Ug, I want to try the 195B, but there's no GGUF available. Besides, I don't think I can pay for the required gpus, I also use runpod. I'm using 2X A100-SXM4, 160 GB in total. I think the Q8 gguf of the 195B wouldn't fit at all.

I use 3xA40s which is 144GB. I got the 5.0bow to fit in VRAM only with 16k context. You might not get an 8bit to fit on 160GB but I'm sure you could get a lower bit quant. A general observation I have made from testing many models between 34 and 400B Higher density with lower bits beats lower density at higher bits. At least for creative

4.75 bpw can fit 32k context

I made a 5.0bpw exl2 of the 195B model, hoping I could cram 32k context on 3xA40s ... so close... my own personal experience with testing exl2 quants is the amount of loss when you get below 5.0bpw really starts to ramp up. So the closer I can stay to that the happier I am but I am spoiled now and want 32k of context :D Still, I enjoy the 4.0bpw 195B

Hi @BigHuggyD , revisiting this thread cause I'm downloading the 195b again. Regarding what you said about the amount of quality loss in 4.0 BPW, I thought imatrix quants helps with this? Does exl2 has imatrix quants?

I made a 5.0bpw exl2 of the 195B model, hoping I could cram 32k context on 3xA40s ... so close... my own personal experience with testing exl2 quants is the amount of loss when you get below 5.0bpw really starts to ramp up. So the closer I can stay to that the happier I am but I am spoiled now and want 32k of context :D Still, I enjoy the 4.0bpw 195B

Hi @BigHuggyD , revisiting this thread cause I'm downloading the 195b again. Regarding what you said about the amount of quality loss in 4.0 BPW, I thought imatrix quants helps with this? Does exl2 has imatrix quants?

Hey! Nope, no iMatrix for EXL2.

If you haven't seen this it does give you an idea of the dropoff that starts to occur below 5... It agrees with my own personal 'EXL2 only' testing

https://github.com/matt-c1/llama-3-quant-comparison?tab=readme-ov-file

Sign up or log in to comment