--- base_model: [deepseek-ai/DeepSeek-V2-Chat-0628] --- #### πŸš€ Custom quantizations of DeepSeek-V2-Chat-0628 supercharged for CPU inference of currently the #7 model globally on lmsys arena hard! πŸ–₯️ >[!TIP] >### πŸš„ Just download this IQ4XM 131Gb version, it's the one I use myself: > >🐧 On Linux: `sudo apt install -y aria2` > >🍎 On Mac: `brew install aria2` > ```bash aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf \ https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00002-of-00004.gguf \ https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00002-of-00004.gguf aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00003-of-00004.gguf \ https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00003-of-00004.gguf aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf \ https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf ``` ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/rbdug3j6BaeTSmKLDIp39.png) ### 🧠 This IQ4XM version uses GGML TYPE IQ_4_XS 4bit in combination with q8_0 bit for blazing fast performance with minimal loss, leveraging int8 optimizations on most newer server CPUs. ### πŸ› οΈ While it required some custom code wizardry, it's fully compatible with standard llama.cpp from GitHub or just search for nisten in lmstudio. >[!TIP] > >πŸ“ No need for file concatenation - just point llama-cli at the first file and watch the magic happen! > >πŸ’» Ready to delve in baby? Here's your command-line spell for interactive mode (prompt.txt is optional, but recommended for maximum sorcery): >```bash >./llama-cli --temp 0.4 -m deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf -c 32000 -co -cnv -i -f prompt.txt >``` ```verilog //PERPLEXITY BENCHMARKS, //deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf //the 4bit iq4xm gets best perplexity but it's likely just a rounding error ./llama-perplexity -m ~/r/deepseek_0628_cpu-iq4xm-00001-of-00002.gguf --chunks 12 -f ~/wiki.test.raw

deepseek-0628-bf16-00001-of-00011.gguf Model size: 440 Gib perplexity: 735.50 seconds per pass - ETA 36.77 minutes [1]2.4827,[2]3.3887,[3]2.9470,[4]3.4768,[5]3.9012,[6]4.5128,[7]4.7533,[8]4.9550,[9]5.2863,[10]5.6824,[11]5.7541,[12]5.8734, Final estimate: PPL = 5.8734 +/- 0.26967 deepseek_0628_cpu-iq1m-00001-of-00002.gguf model size = 73.27 GiB (2.67 BPW) perplexity: 96.54 seconds per pass - ETA 4.82 minutes [1]3.4340,[2]4.5503,[3]4.0118,[4]4.5807,[5]4.9540,[6]5.7353,[7]5.9430,[8]6.1320,[9]6.5690,[10]6.9401,[11]7.0363,[12]7.1857, Final estimate: PPL = 7.1857 +/- 0.33585 deepseek_0628_cpu_iq1_s-00001-of-00002.gguf model size = 58.42 GiB (2.13 BPW) perplexity: 94.39 seconds per pass - ETA 4.72 minutes [1]3.3257,[2]4.7059,[3]4.3868,[4]4.8870,[5]5.3162,[6]6.0753,[7]6.2931,[8]6.5085,[9]6.8913,[10]7.3148,[11]7.4235,[12]7.6295, Final estimate: PPL = 7.6295 +/- 0.36143 deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf size: 131Gb perplexity: 59.49 seconds per pass - ETA 2.97 minutes [1]2.4954,[2]3.3941,[3]2.9607,[4]3.4755,[5]3.8889,[6]4.5036,[7]4.7364,[8]4.9401,[9]5.2737,[10]5.6651,[11]5.7354,[12]5.8620, Final estimate: PPL = 5.8620 +/- 0.26853 ``` ```bash # πŸ‹οΈ For the nearly lossless Q8_0 version aria2c -x 8 -o deepseek-0628-q8_0-00001-of-00006.gguf \ https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00001-of-00006.gguf aria2c -x 8 -o deepseek-0628-q8_0-00002-of-00006.gguf \ https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00002-of-00006.gguf aria2c -x 8 -o deepseek-0628-q8_0-00003-of-00006.gguf \ https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00003-of-00006.gguf aria2c -x 8 -o deepseek-0628-q8_0-00004-of-00006.gguf \ https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00004-of-00006.gguf aria2c -x 8 -o deepseek-0628-q8_0-00005-of-00006.gguf \ https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00005-of-00006.gguf aria2c -x 8 -o deepseek-0628-q8_0-00006-of-00006.gguf \ https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00006-of-00006.gguf ``` ```bash # 🧠 For the full-brain BF16 version aria2c -x 8 -o deepseek-0628-bf16-00001-of-00011.gguf \ https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00001-of-00011.gguf aria2c -x 8 -o deepseek-0628-bf16-00002-of-00011.gguf \ https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00002-of-00011.gguf aria2c -x 8 -o deepseek-0628-bf16-00003-of-00011.gguf \ https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00003-of-00011.gguf aria2c -x 8 -o deepseek-0628-bf16-00004-of-00011.gguf \ https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00004-of-00011.gguf aria2c -x 8 -o deepseek-0628-bf16-00005-of-00011.gguf \ https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00005-of-00011.gguf aria2c -x 8 -o deepseek-0628-bf16-00006-of-00011.gguf \ https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00006-of-00011.gguf aria2c -x 8 -o deepseek-0628-bf16-00007-of-00011.gguf \ https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00007-of-00011.gguf aria2c -x 8 -o deepseek-0628-bf16-00008-of-00011.gguf \ https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00008-of-00011.gguf aria2c -x 8 -o deepseek-0628-bf16-00009-of-00011.gguf \ https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00009-of-00011.gguf aria2c -x 8 -o deepseek-0628-bf16-00010-of-00011.gguf \ https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00010-of-00011.gguf aria2c -x 8 -o deepseek-0628-bf16-00011-of-00011.gguf \ https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00011-of-00011.gguf ``` πŸ“œ The use of DeepSeek-V2-Chat-0628 model is subject to the [DeepSeek Model License](https://github.com/deepseek-ai/DeepSeek-V2/blob/main/LICENSE-MODEL). DeepSeek-V2 series supports commercial use. It's a permissive license that only restricts use for military purposes, harming minors, or patent trolling. ### 🌟 Model Information DeepSeek-V2-Chat-0628 is the latest and greatest in the DeepSeek family. This AI powerhouse has climbed the LMSYS Chatbot Arena Leaderboard faster than a rocket on steroids: - πŸ† Overall Arena Ranking: #11 global - πŸ’» Coding Arena Ranking: #3, global - 🧠 Hard Prompts Arena Ranking: #7 global, better than claude opus even in english only hard-prompts Want to seek deeper into this model's ocean of awesomeness? Swim over to the [original model card](https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat-0628) and prepare to have your mind blown! 🀯 Now go forth and accelerate πŸš€πŸ’‘ -Nisten