jartine commited on
Commit
b4d83da
1 Parent(s): 5ad53be

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -114,7 +114,8 @@ speedups for llama.cpp's simplest quants: Q8\_0 and Q4\_0.
114
  This model is very large. Even at Q2 quantization, it's still well-over
115
  twice as large the highest tier NVIDIA gaming GPUs. llamafile supports
116
  splitting models over multiple GPUs (for NVIDIA only currently) if you
117
- have such a system.
 
118
 
119
  Mac Studio is a good option for running this model. An M2 Ultra desktop
120
  from Apple is affordable and has 128GB of unified RAM+VRAM. If you have
 
114
  This model is very large. Even at Q2 quantization, it's still well-over
115
  twice as large the highest tier NVIDIA gaming GPUs. llamafile supports
116
  splitting models over multiple GPUs (for NVIDIA only currently) if you
117
+ have such a system. The best way to get one, if you don't, is to pay a
118
+ few bucks an hour to rent a 4x RTX 4090 rig off vast.ai.
119
 
120
  Mac Studio is a good option for running this model. An M2 Ultra desktop
121
  from Apple is affordable and has 128GB of unified RAM+VRAM. If you have