GPU offload can result in gibberish

#1
by slaw0mir - opened

I really like this model. One thing I noticed is that too much gpu offload seams to break it, somewhat similarly to this post:
https://huggingface.co/TheBloke/guanaco-7B-GGML/discussions/2

In my case I get "GroupGroup.." like endlessly repeating words. Offloading 32 layers seams to be the sweat spot for me, above 64 its completely broken, around 48 the output starts losing formatting strings and random words turn out like this: "dr 1 nc 4ed"

Make sure you see this document, as this model is a class 3/4 :

https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters

This doc will outline exact settings to use to "tame" this type of model ; including "base line" settings and advanced.
By "tame" I mean "errors" / "repeats" and other issues.
A lot of times just getting the basic parameters (especially penalties) is critical.

RE: Offloading ;
There are math differences between GPU and CPU - which can create "odd math issues"; often this issue can show up with operational problems.

Sign up or log in to comment