Amazing model

by lightning-missile - opened Oct 10, 2024

Oct 10, 2024

After testing for creative writing, this seems like Luminum and mistral-large combined. This might be one of my daily drivers along with original Luminum.
Thanks for this model!

schnapper79

Owner Oct 10, 2024

If you ever find the space try the 195b one. Yes it's huge, but you're in for a treat.i run the exl2-4.0 on 3 A40 on runpod. And today for the first time with XTC.

TheDrummer made a new finetune for Mistral large... Making a exl2 quant right now.

Anyways, it's s a pleasure.

invictus1

Oct 10, 2024

If you ever find the space try the 195b one. Yes it's huge, but you're in for a treat.i run the exl2-4.0 on 3 A40 on runpod. And today for the first time with XTC.

TheDrummer made a new finetune for Mistral large... Making a exl2 quant right now.

Anyways, it's s a pleasure.

By any chance could you make a 6.0bpw of the 195b one?

schnapper79

Owner Oct 11, 2024

will do when i am done with my current merges and quants.

invictus1

Oct 11, 2024

will do when i am done with my current merges and quants.

Thank you so much!

lightning-missile

Oct 11, 2024

If you ever find the space try the 195b one. Yes it's huge, but you're in for a treat.i run the exl2-4.0 on 3 A40 on runpod. And today for the first time with XTC.

TheDrummer made a new finetune for Mistral large... Making a exl2 quant right now.

Anyways, it's s a pleasure.

I'll see if I can try the 195B one, I have no idea how to run an exl model, I'm using kobold. BTW, what's your XTC settings? How did it go?

schnapper79

Owner Oct 11, 2024

for XTC i used min_p=0.02, DRY with 0.75, XTC threshold=0.1 and probability I varied between 0.5 down to 0.1.
I like the feeling of new creativity, but for RP it makes the model stupid. It feels to me like the model is doing the opposite of what should be done, but I guess that's basically how XTC works. Sometimes it breaks coherence, as it forces the model to select the less probable tokens. But I admit I haven't played much with different settings as I am also testing my merges with TheDrummer/Behemoth-123B-v1.

lightning-missile

Oct 13, 2024

If you ever find the space try the 195b one. Yes it's huge, but you're in for a treat.i run the exl2-4.0 on 3 A40 on runpod. And today for the first time with XTC.

TheDrummer made a new finetune for Mistral large... Making a exl2 quant right now.

Anyways, it's s a pleasure.

By any chance could you make a 6.0bpw of the 195b one?

Hi again. Is this the 195b? https://huggingface.co/schnapper79/lumikabra-195B_v0.3 but this is v0.3, not v0.4. Are there differences?

schnapper79

Owner Oct 13, 2024

V0.4 with the same merge configuration was extremely stupid. Funny thing. So yeah, I made only v0.3 as 195b model

BigHuggyD

Oct 18, 2024

I made a 5.0bpw exl2 of the 195B model, hoping I could cram 32k context on 3xA40s ... so close... my own personal experience with testing exl2 quants is the amount of loss when you get below 5.0bpw really starts to ramp up. So the closer I can stay to that the happier I am but I am spoiled now and want 32k of context :D Still, I enjoy the 4.0bpw 195B

schnapper79

Owner Oct 18, 2024

How much context did you get with 5.0bpw?

BigHuggyD

Oct 18, 2024

How much context did you get with 5.0bpw?

I loaded it at 16k fine, then tried 32k and it got to the end and ran out of VRAM. I don't use a cache because I have found issues with output that only occur with a cache enabled.
I didn't try to see what I could cram between 16k and 32k

lightning-missile

Oct 18, 2024

How much context did you get with 5.0bpw?

I loaded it at 16k fine, then tried 32k and it got to the end and ran out of VRAM. I don't use a cache because I have found issues with output that only occur with a cache enabled.
I didn't try to see what I could cram between 16k and 32k

I don't use exl2 quants, but is it possible you use a context in between? Koboldcp has 24k context.

lightning-missile

Oct 18, 2024

If you ever find the space try the 195b one. Yes it's huge, but you're in for a treat.i run the exl2-4.0 on 3 A40 on runpod. And today for the first time with XTC.

TheDrummer made a new finetune for Mistral large... Making a exl2 quant right now.

Anyways, it's s a pleasure.

Ug, I want to try the 195B, but there's no GGUF available. Besides, I don't think I can pay for the required gpus, I also use runpod. I'm using 2X A100-SXM4, 160 GB in total. I think the Q8 gguf of the 195B wouldn't fit at all.

BigHuggyD

Oct 18, 2024

How much context did you get with 5.0bpw?

I loaded it at 16k fine, then tried 32k and it got to the end and ran out of VRAM. I don't use a cache because I have found issues with output that only occur with a cache enabled.
I didn't try to see what I could cram between 16k and 32k

I don't use exl2 quants, but is it possible you use a context in between? Koboldcp has 24k context.

Yes, you can type in any value you want. 24k quite possibly could have fit

BigHuggyD

Oct 18, 2024

If you ever find the space try the 195b one. Yes it's huge, but you're in for a treat.i run the exl2-4.0 on 3 A40 on runpod. And today for the first time with XTC.

TheDrummer made a new finetune for Mistral large... Making a exl2 quant right now.

Anyways, it's s a pleasure.

Ug, I want to try the 195B, but there's no GGUF available. Besides, I don't think I can pay for the required gpus, I also use runpod. I'm using 2X A100-SXM4, 160 GB in total. I think the Q8 gguf of the 195B wouldn't fit at all.

I use 3xA40s which is 144GB. I got the 5.0bow to fit in VRAM only with 16k context. You might not get an 8bit to fit on 160GB but I'm sure you could get a lower bit quant. A general observation I have made from testing many models between 34 and 400B Higher density with lower bits beats lower density at higher bits. At least for creative

BigHuggyD

Oct 19, 2024

4.75 bpw can fit 32k context

lightning-missile

Dec 22, 2024

I made a 5.0bpw exl2 of the 195B model, hoping I could cram 32k context on 3xA40s ... so close... my own personal experience with testing exl2 quants is the amount of loss when you get below 5.0bpw really starts to ramp up. So the closer I can stay to that the happier I am but I am spoiled now and want 32k of context :D Still, I enjoy the 4.0bpw 195B

Hi @BigHuggyD , revisiting this thread cause I'm downloading the 195b again. Regarding what you said about the amount of quality loss in 4.0 BPW, I thought imatrix quants helps with this? Does exl2 has imatrix quants?

BigHuggyD

Dec 22, 2024

I made a 5.0bpw exl2 of the 195B model, hoping I could cram 32k context on 3xA40s ... so close... my own personal experience with testing exl2 quants is the amount of loss when you get below 5.0bpw really starts to ramp up. So the closer I can stay to that the happier I am but I am spoiled now and want 32k of context :D Still, I enjoy the 4.0bpw 195B

Hi @BigHuggyD , revisiting this thread cause I'm downloading the 195b again. Regarding what you said about the amount of quality loss in 4.0 BPW, I thought imatrix quants helps with this? Does exl2 has imatrix quants?

Hey! Nope, no iMatrix for EXL2.

If you haven't seen this it does give you an idea of the dropoff that starts to occur below 5... It agrees with my own personal 'EXL2 only' testing

https://github.com/matt-c1/llama-3-quant-comparison?tab=readme-ov-file

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment