File size: 2,554 Bytes

---
license: mit
language:
- en
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
tags:
- cot
- r1
- deepseek
- text
---
# Model Card for DeepSeek-R1-Distill-Qwen-1.5B-4bit

<!-- Provide a quick summary of what the model is/does. -->

This is a 4-bit quantized version of the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, optimized for efficient inference with reduced memory usage. The quantization was performed using the `bitsandbytes` library.

## Model Details

### Model Description

- **Model type:** Transformer-based Language Model
- **Language(s) (NLP):** English
- **License:** MIT
- **Finetuned from model:** `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`


### Direct Use

This model is intended for research and practical applications where memory efficiency is critical. It can be used for:

- Text generation
- Language understanding tasks
- Chatbots and conversational AI

### Downstream Use

This model can be fine-tuned for specific tasks such as:

- Sentiment analysis
- Text classification
- Summarization

### Out-of-Scope Use

This model is not suitable for:

- High-precision tasks requiring full 16-bit or 32-bit precision
- Applications requiring extremely low latency

## Bias, Risks, and Limitations

The model may inherit biases present in the training data. Users should be cautious when deploying the model in sensitive applications.

### Recommendations

Users should evaluate the model's performance on their specific tasks and datasets before deployment. Consider fine-tuning the model for better alignment with your use case.

## How to Get Started with the Model

Use the code below to get started with the model:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

# Quantization configuration
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True
)

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("emredeveloper/DeepSeek-R1-Distill-Qwen-1.5B-4bit", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "emredeveloper/DeepSeek-R1-Distill-Qwen-1.5B-4bit",
    quantization_config=quantization_config,
    device_map="auto",
    trust_remote_code=True
)

# Generate text
input_text = "Hello, how are you?"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))