Mozilla
/

Meta-Llama-3-70B-Instruct-llamafile

Text Generation

Model card Files Files and versions Community

jartine commited on Apr 20, 2024

Commit

3183783

•

1 Parent(s): 052c7c7

Update README.md

Files changed (1) hide show

README.md +24 -0

README.md CHANGED Viewed

@@ -67,6 +67,30 @@ computation speed (flops) so simpler quants help.
 Note: BF16 is currently only supported on CPU.
 ---
 ## Model Details

 Note: BF16 is currently only supported on CPU.
+## Hardware Choices
+Any Macbook with 32GB should be able to run
+Meta-Llama-3-70B-Instruct.Q2\_K.llamafile which I uploaded a few minutes
+ago. It's smart enough to solve math riddles, but at this level of
+quantization you should expect hallucinations.
+If you want to run Q4\_0 you'll probably be able to squeeze it on a
+$3,999.00 Macbook Pro M3 Max w/ 48GB of RAM.
+If you want to run Q5\_K\_M or or Q8\_0 the best choice is probably Mac
+Studio. I have an Apple M2 Ultra w/ 24-core CPU, 60-core GPU, 128GB RAM.
+It cost me $8000 with the monitor. If I run
+Meta-Llama-3-70B-Instruct.Q4\_0.llamafile then I get 14 tok/sec (prompt
+eval is 82 tok/sec) thanks to the Metal GPU.
+You could alternatively go on vast.ai and rent a system with 4x RTX
+4090's for a few bucks an hour. That'll run 70b. Or you could build your
+own, but the graphics cards alone will cost $10k+.
+AMD Threadripper Pro 7995WX ($10k) does a good job too. I get 5.9
+tok/sec eval with Q4\_0 and 49 tok/sec prompt. If I use F16 weights then
+prompt eval goes 65 tok/sec.
 ---
 ## Model Details