mradermacher
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -113,6 +113,13 @@ Through a combination of these ingenuous tricks:
|
|
113 |
The few evaluations I have suggests that this gives good quality, and my current set-up allows me to
|
114 |
generate imatrix data for most models in fp16, 70B in Q8_0 and almost everything else in Q4_K_S.
|
115 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
116 |
## Why don't you use gguf-split?
|
117 |
|
118 |
TL;DR: I don't have the hardware/resources for that.
|
|
|
113 |
The few evaluations I have suggests that this gives good quality, and my current set-up allows me to
|
114 |
generate imatrix data for most models in fp16, 70B in Q8_0 and almost everything else in Q4_K_S.
|
115 |
|
116 |
+
The trick to 3 is not actually having patience, the trick is to automate things to the point where you
|
117 |
+
don't have to wait for things normally. For example, if all goes well, quantizing a model requires just
|
118 |
+
a single command (or less) for static quants, and for imatrix quants I need to select the source gguf
|
119 |
+
and then run another command which handles download/computation/upload. Most of the time, I only have
|
120 |
+
to do stuff when things go wrong (which, with llama.cpp being so buggy and hard to use,
|
121 |
+
is unfortunately very frequent).
|
122 |
+
|
123 |
## Why don't you use gguf-split?
|
124 |
|
125 |
TL;DR: I don't have the hardware/resources for that.
|