OPEA/DeepSeek-R1-int4-AutoRound-awq-asym

Model Details

This model is an int4 model with group_size 64 and asymmetric quantization of deepseek-ai/DeepSeek-R1 generated by intel/auto-round algorithm.

Please follow the license of the original model.

How To Use

INT4 VLLM Inference on CUDA(at least 8*80G)

To serve using vLLM with 8x 80GB GPUs, use the following command:

VLLM_WORKER_MULTIPROC_METHOD=spawn python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 12345 --max-model-len 65536 --max-num-batched-tokens 65536 --trust-remote-code --tensor-parallel-size 8 --gpu-memory-utilization 0.97 --dtype float16 --served-model-name deepseek-reasoner --model OPEA/DeepSeek-R1-int4-asym-AutoRound-awq

You can download the wheel built by cognitivecomputations for PyTorch 2.6 and Python 3.12 by clicking here.

import requests

url = "http://localhost:12345/v1/chat/completions"
headers = {"Content-Type": "application/json"}
prompt="一个汉字具有左右结构，左边是木，右边是乞，这个字是什么字，只需要回答这个字即可。"
data = {
    "model": "deepseek-reasoner",
    "messages": [
        {"role": "user", "content": prompt}
    ],
    "max_tokens": 4096, # change this to align with offical usage
    "top_p": 0.9, # change this to align with offical usage
    "temperature": 0.6 # change this to align with offical usage
}

response = requests.post(url, json=data, headers=headers)
print(response.json()["choices"][0]["message"]["content"])

"""
prompt： 一个汉字具有左右结构，左边是木，右边是乞，这个字是什么字，只需要回答这个字即可。
<think>
嗯，用户问的是一个左右结构的汉字，左边是“木”，右边是“乞”。我需要先确定这个字到底是什么。首先，我应该回忆一下常见的左右结构的汉字，左边是木字旁的字有很多，比如“林”、“松”、“柏”
  等，但右边是“乞”的可能比较少见。

首先，我可以考虑“木”加“乞”组合起来的可能性。可能这个字不是常用字，或者是某个生僻字。我需要检查一下自己的记忆，有没有学过或者见过这样的字。比如，“杞”字是木字旁右边一个“己”，所
  右边是“己”而不是“乞”。而“札”字右边是“乚”，所以也不是。再比如“杚”，这个字可能比较少见，但它的右边是“乞”吗？

或者，可能用户写错了右边的部分，比如“气”而不是“乞”，但用户明确说是“乞”。这时候可能需要查证一下。如果右边确实是“乞”，那么可能的字是“杚”。这个字的读音是什么呢？我记得“杚”有两种
  音，一种是gài，另一种是gè。当读gài时，同“槩”，也就是“概”的异体字，意思是量米粟时刮平斗斛用的木板；而读gè时，可能与“柯”有关，但不太确定。

不过，也有可能用户记错了结构，或者有其他可能性。比如，是否有其他木字旁加乞的字？或者是否是繁体字或异体字？比如“櫂”是木字旁右边比较复杂，但右边不是“乞”。再比如“栔”，右边是“契”
  上半部分，但也不是“乞”。

再仔细想想，可能这个字是“杚”，也就是木字旁加乞，读音为gài或gè。不过，这个字在现代汉语中可能不常用，属于生僻字。需要确认一下是否正确。另外，用户可能是在某个特定的上下文或方言中
  到这个字，所以需要给出正确的答案。

总结一下，可能的答案是“杚”，但需要确认是否正确。如果用户确认结构无误，那么这个字就是“杚”，尽管它不常见。所以最终的答案应该是“杚”。
</think>

杚

--------------------------------------------------

prompt="请用我给你的4个数字，通过加、减、乘、除、括号，组成一个运算，使得结果为24。注意：数字需要全部使用我提供的数字，4468"

<think>
嗯，我现在需要解决一个数学问题，就是用给定的四个数字4、4、6、8，通过加减乘除和括号组合起来得到24。首先，我得仔细想清楚怎么安排这些数字和运算符号。可能的话，先回忆一下常见的24
点游戏解法，可能会有帮助。

首先，我需要确定这四个数字的顺序和组合方式。因为有重复的数字，比如有两个4，所以可能需要更多的组合尝试。先考虑如何用这四个数中的某些数相乘或相除得到较大的数值，然后再调整剩下的
数来达到24。

比如，8和6这两个比较大的数，可能相乘的话是48，这样的话，剩下的两个4可能需要调整到48的一半，即24。不过这里有两个4，所以可能需要用除法或者减法。比如，48减去（4+4）=40，这样就不
行。或者48除以（4/4）=48，也不对。或者用8*6=48，然后4-4=0，这样48+0=48，也不行。

再想另一个方向，比如用4和4相乘得到16，再加上6和8的话，16+6+8=30，超过了24。或者16*（8-6）=32，也不对。或者（4*4）+6+8= 16+14=30，还是太大。

或者考虑用减法，比如8*4=32，然后32减去（6+4）=32-10=22，也不够。或者32-6-4=22，同样不行。

可能需要用除法来调整数值。比如，8除以（6-4）=4，这样再乘以剩下的4和另一个数？不过这样的话，可能不够。比如8/(6-4)=4，然后4*4=16，再加上6的话就超过了，不过这里可能重复用了6？或
者可能没正确使用所有数字。

再考虑另一种组合，比如（8-6）=2，然后4*4=16，再16*2=32，还是不行。或者（4+8）*（6-4）=12*2=24，这样用了4、8、6、4四个数字吗？是的，4+8=12，6-4=2，12*2=24，这样的话，确实用到了
所有的四个数字：4、8、6、4。所以这个组合可能可行？

不过，让我再检查一遍：4+8=12，6-4=2，然后12乘2等于24。是的，这样的话，四个数字都用到了，对吗？第一个4，第二个8，第三个6，第四个4。是的，刚好是4、4、6、8。所以这个解法是可行的
。

不过可能还有其他解法。比如，用6乘以（8 - (4/4)）。计算一下：4/4=1，8-1=7，6*7=42，不对。或者（6*8)/(4/4)=48/1=48，也不对。

或者（4*6）*(8/4)=24*2=48，同样不行。或者（4*8）-（6-4）=32-2=30，还是不行。

再试试其他组合，比如4*(6+8)/4。这里，6+8=14，4*14=56，56/4=14，不对。

或者，8*(6 - (4/4))：4/4=1，6-1=5，8*5=40，也不对。

或者，4*4*(8-6)=16*2=32，还是不够。

不过之前的那个解法（4+8）*(6-4)=12*2=24，确实可行。因此，这可能是一个正确的答案。不过需要确认是否所有的数字都被使用了一次，并且每个数字都只用了一次。这里，4被用了两次，而题目
中给出的数字是两个4，所以没问题。是的，正确使用了所有四个数字：4、4、6、8，每个数字都用了一次，没有重复使用也没有遗漏。

所以最终的表达式是（4 + 8）× (6 - 4) = 24。
</think>

使用数字4、4、6、8，可以通过以下运算得到24：

**运算步骤**：
(4 + 8) × (6 - 4) = 12 × 2 = 24

**验证**：
1. 使用所有数字：4、4、6、8（两个4均被使用）。
2. 通过加法和减法组合，最终结果为24。
3. 符合题目要求，仅使用加、减、乘、除和括号。

**答案**：
(4 + 8) × (6 - 4) = 24

--------------------------------------------------

prompt: How many e in word deepseek?
<think>
Okay, let's see. The question is asking how many times the letter "e" appears in the word "deepseek." Alright, first, I need to make sure I spell the word correctly. D-E-E-P-S-E-E-K. Let me break it down letter by letter.

Starting with the first letter: D. That's a D, not an E. Second letter: E. That's one. Third letter: E again. That's two. Fourth letter: P. Then S. Next is E. That's the third E. Then another E. Wait, let me check again. D-E-E-P-S-E-E-K. So after S, there's E, E, K. So that's E as the fifth letter, sixth letter, and then K. Wait, let me count the positions again to be sure.

1: D
2: E
3: E
4: P
5: S
6: E
7: E
8: K

So positions 2, 3, 6, 7 are E's. That makes four E's in total. Let me verify by writing the word out and underlining each E: D E E P S E E K. Yes, that's four E's. Wait, but sometimes people might miscount, especially if letters are repeated in sequence. Let me check again. The word is "deepseek." So "deep" has two E's, then "seek" has two E's as well. So deep (D-E-E-P) and seek (S-E-E-K). So 2 + 2 = 4. That makes sense. I think that's right. No, wait, "seek" is S-E-E-K, so two E's. So total of four E's. Yeah, that seems correct. I don't think I missed any. Let me write it again: D, E, E, P, S, E, E, K. Yep, four E's. So the answer should be 4.
</think>

The word "deepseek" contains the letter "e" four times. Here's the breakdown:

1. **D** - Not an "e"
2. **E** - First "e"
3. **E** - Second "e"
4. **P** - Not an "e"
5. **S** - Not an "e"
6. **E** - Third "e"
7. **E** - Fourth "e"
8. **K** - Not an "e"

**Answer:** There are **4** instances of the letter "e" in "deepseek".
"""

INT4 Inference on CPU

Requirements

pip install auto-round
pip uninstall intel-extension-for-pytorch
pip install intel-extension-for-transformers

will update later

Evaluate the model

pip3 install lm-eval==0.4.8

TORCH_DISTRIBUTED_DEBUG=INFO  python -m lm_eval --model vllm  --model_args "pretrained=OPEA/DeepSeek-R1-int4-AutoRound-awq-asym,tensor_parallel_size=8,dtype=bfloat16,max_model_len=65536,max_num_batched_tokens=65536,served_model_name=deepseek-reasoner" --batch_size 1 --device 'cuda'--trust_remote_code --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,boolq,arc_easy,arc_challenge,mmlu,openbookqa

Metric	FP8	INT4(BF16)
avg	0.6954	0.6963
mmlu	0.8514	0.8485
lambada_openai	0.7902	0.7809
hellaswag	0.6935	0.6883
winogrande	0.7932	0.8011
piqa	0.8308	0.8292
truthfulqa_mc1	0.4064	0.4051
openbookqa	0.3780	0.394
boolq	0.8856	0.8813
arc_easy	0.8598	0.8594
arc_challenge	0.6212	0.6271

Generate the model

1 add meta data to bf16 model https://huggingface.co/opensourcerelease/DeepSeek-R1-bf16

import safetensors
from safetensors.torch import save_file
 
for i in range(1, 164):
    idx_str = "0" * (5-len(str(i))) + str(i)
    safetensors_path = f"model-{idx_str}-of-000163.safetensors"
    print(safetensors_path)
    tensors = dict()
    with safetensors.safe_open(safetensors_path, framework="pt") as f:
        for key in f.keys():
            tensors[key] = f.get_tensor(key)
    save_file(tensors, safetensors_path, metadata={'format': 'pt'})

2 remove torch.no_grad in modeling_deepseek.py as we need some tuning in AutoRound.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import transformers


#  https://github.com/huggingface/transformers/pull/35493
def set_initialized_submodules(model, state_dict_keys):
    """
    Sets the `_is_hf_initialized` flag in all submodules of a given model when all its weights are in the loaded state
    dict.
    """
    state_dict_keys = set(state_dict_keys)
    not_initialized_submodules = {}
    for module_name, module in model.named_modules():
        if module_name == "":
            # When checking if the root module is loaded there's no need to prepend module_name.
            module_keys = set(module.state_dict())
        else:
            module_keys = {f"{module_name}.{k}" for k in module.state_dict()}
        if module_keys.issubset(state_dict_keys):
            module._is_hf_initialized = True
        else:
            not_initialized_submodules[module_name] = module
    return not_initialized_submodules


transformers.modeling_utils.set_initialized_submodules = set_initialized_submodules

model_name = "opensourcerelease/DeepSeek-R1-bf16"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, torch_dtype="auto")

block = model.model.layers
device_map = {}

for n, m in block.named_modules():
    if isinstance(m, (torch.nn.Linear, transformers.modeling_utils.Conv1D)):
        if "experts" in n and ("shared_experts" not in n) and int(n.split('.')[-2]) < 63:
            device = "cuda:1"
        elif "experts" in n and ("shared_experts" not in n) and int(n.split('.')[-2]) >= 63 and int(
                n.split('.')[-2]) < 128:
            device = "cuda:2"
        elif "experts" in n and ("shared_experts" not in n) and int(n.split('.')[-2]) >= 128 and int(
                n.split('.')[-2]) < 192:
            device = "cuda:3"
        elif "experts" in n and ("shared_experts" not in n) and int(
                n.split('.')[-2]) >= 192:
            device = "cuda:4"
        else:
            device = "cuda:0"
        n = n[2:]

        device_map.update({n: device})

from auto_round import AutoRound

autoround = AutoRound(model=model, tokenizer=tokenizer, device_map=device_map, nsamples=512,
                      batch_size=4, low_gpu_mem_usage=True, seqlen=2048, group_size=64, sym=False
                      )
autoround.quantize()
autoround.save_quantized(format="auto_awq", output_dir="tmp_autoround")

Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

Intel Neural Compressor link

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github

OPEA
/

DeepSeek-R1-int4-AutoRound-awq-asym