TheBloke
/

Wizard-Vicuna-7B-Uncensored-GGML

Ggml v2 Request

by Jeximo - opened May 23, 2023

Jeximo

May 23, 2023

Hi,

I appreciate that you've converted to v3 and uploaded for us, just like you did with v2.

Llama.cpp is roughly 2seconds per token slower on v3 for me which I didn't expect.

I'm requesting an upload or link Wizard-Vicuna-7B-Uncensored.ggmlv2.q4_0.bin if it's possible.

Thanks for your consideration.

Owner May 23, 2023

Oh that's very surprising. On which quant type?

I'll look into making ggml v2s this evening

Jeximo

May 23, 2023

Oh that's very surprising. On which quant type?

I'll look into making ggml v2s this evening

Thanks for your response.

I use q4_0 and found the v3 of the same type is slower. I was surprised too so I made a post on llama.cpp hit raising the issue.

(I'm guessing the quant type means q4_0, please correct me if I'm not referring to the correct thing)

Owner May 23, 2023

Oh i just realised what model you were writing about. Ggml v2s are still available. Check branch previous_llama_ggmlv2

Owner May 23, 2023

Yes quant type is q4_0 etc. How odd, I thought v3 was meant to be quicker

Jeximo

May 23, 2023

Yes quant type is q4_0 etc. How odd, I thought v3 was meant to be quicker

Yeah, I can see the ram requirements for v3 q4_0 model has decreased, but my timings with v2 is better.

I hope to get both benefits, y'know? :)

Thanks again!

Jeximo changed discussion status to closed May 23, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment