--- base_model: - apple/DiffuCoder-7B-Base tags: - code - text-diffusion-model - diffusion large language model license: unknown --- ### DiffuCoder-7B-Instruct The DiffuCoder-7B-Instruct model builds on the DiffuCoder-7B-Base checkpoint with instruction-tuning to better follow code-related prompts. - Training recipe: with a newly introduced pad token, we train this model with fixed length conditionally on [OpenCoder-SFT](https://huggingface.co/datasets/OpenCoder-LLM/opc-sft-stage2) data for 5 epochs. - Benchmarks: Demonstrates stronger instruction-following capabilities than the Base model. #### More details and usage examples: - Paper: [DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation](https://arxiv.org/abs/2506.20639) - GitHub: https://github.com/apple/ml-diffucoder ``` import torch from transformers import AutoModel, AutoTokenizer model_path = "apple/DiffuCoder-7B-Instruct" model = AutoModel.from_pretrained(model_path, torch_dtype=torch.bfloat16, trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) model = model.to("cuda").eval() query = "Write a function to find the shared elements from the given two lists." prompt = f"""<|im_start|>system You are a helpful assistant.<|im_end|> <|im_start|>user {query.strip()} <|im_end|> <|im_start|>assistant """ ## following the template of qwen; you can also use apply_chat_template function TOKEN_PER_STEP = 1 # diffusion timesteps * TOKEN_PER_STEP = total new tokens inputs = tokenizer(prompt, return_tensors="pt") input_ids = inputs.input_ids.to(device="cuda") attention_mask = inputs.attention_mask.to(device="cuda") output = model.diffusion_generate( input_ids, attention_mask=attention_mask, max_new_tokens=256, output_history=True, return_dict_in_generate=True, steps=256//TOKEN_PER_STEP, temperature=0.3, top_p=0.95, alg="entropy", alg_temp=0., ) generations = [ tokenizer.decode(g[len(p) :].tolist()) for p, g in zip(input_ids, output.sequences) ] print(generations[0].split('<|dlm_pad|>')[0]) ``` #### Acknowledgement To power this HuggingFace model release, we reuse [Dream](https://huggingface.co/Dream-org/Dream-v0-Base-7B)'s modeling architecture and generation utils.