|
--- |
|
license: mit |
|
language: |
|
- en |
|
base_model: |
|
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B |
|
tags: |
|
- cot |
|
- r1 |
|
- deepseek |
|
- text |
|
--- |
|
# Model Card for DeepSeek-R1-Distill-Qwen-1.5B-4bit |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This is a 4-bit quantized version of the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, optimized for efficient inference with reduced memory usage. The quantization was performed using the `bitsandbytes` library. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
- **Model type:** Transformer-based Language Model |
|
- **Language(s) (NLP):** English |
|
- **License:** MIT |
|
- **Finetuned from model:** `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` |
|
|
|
|
|
### Direct Use |
|
|
|
This model is intended for research and practical applications where memory efficiency is critical. It can be used for: |
|
|
|
- Text generation |
|
- Language understanding tasks |
|
- Chatbots and conversational AI |
|
|
|
### Downstream Use |
|
|
|
This model can be fine-tuned for specific tasks such as: |
|
|
|
- Sentiment analysis |
|
- Text classification |
|
- Summarization |
|
|
|
### Out-of-Scope Use |
|
|
|
This model is not suitable for: |
|
|
|
- High-precision tasks requiring full 16-bit or 32-bit precision |
|
- Applications requiring extremely low latency |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
The model may inherit biases present in the training data. Users should be cautious when deploying the model in sensitive applications. |
|
|
|
### Recommendations |
|
|
|
Users should evaluate the model's performance on their specific tasks and datasets before deployment. Consider fine-tuning the model for better alignment with your use case. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig |
|
import torch |
|
|
|
# Quantization configuration |
|
quantization_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_4bit_quant_type="nf4", |
|
bnb_4bit_compute_dtype=torch.bfloat16, |
|
bnb_4bit_use_double_quant=True |
|
) |
|
|
|
# Load the model and tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained("emredeveloper/DeepSeek-R1-Distill-Qwen-1.5B-4bit", trust_remote_code=True) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
"emredeveloper/DeepSeek-R1-Distill-Qwen-1.5B-4bit", |
|
quantization_config=quantization_config, |
|
device_map="auto", |
|
trust_remote_code=True |
|
) |
|
|
|
# Generate text |
|
input_text = "Hello, how are you?" |
|
inputs = tokenizer(input_text, return_tensors="pt").to(model.device) |
|
outputs = model.generate(**inputs, max_new_tokens=50) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |