GGUF
Inference Endpoints

good work! llama.cpp support plz.

#3
by wukongai - opened

wish to see it on the lmsys chat arena.

In my testing environment, the gguf model's performance falls significantly short of what you've indicated for LLaMA-3-70B and is even inferior to Qwen 1.5 7B. I used the third-party version of llama.cpp from your official resources and downloaded the gguf file from your official website. Could there be some mistake in how I've set this up?

cllama-yuan --color -c 128000 --temp 0.7 --repeat_penalty 1.1 -n -1 -ins -t 64 --log-append -m /raid5/models/del/Yuan2-moe_0526-000.gguf

Sign up or log in to comment