File size: 2,554 Bytes
97d04de 2390d7d 0e5a729 2390d7d 0e5a729 2390d7d 0e5a729 2390d7d 0e5a729 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
---
license: mit
language:
- en
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
tags:
- cot
- r1
- deepseek
- text
---
# Model Card for DeepSeek-R1-Distill-Qwen-1.5B-4bit
<!-- Provide a quick summary of what the model is/does. -->
This is a 4-bit quantized version of the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, optimized for efficient inference with reduced memory usage. The quantization was performed using the `bitsandbytes` library.
## Model Details
### Model Description
- **Model type:** Transformer-based Language Model
- **Language(s) (NLP):** English
- **License:** MIT
- **Finetuned from model:** `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`
### Direct Use
This model is intended for research and practical applications where memory efficiency is critical. It can be used for:
- Text generation
- Language understanding tasks
- Chatbots and conversational AI
### Downstream Use
This model can be fine-tuned for specific tasks such as:
- Sentiment analysis
- Text classification
- Summarization
### Out-of-Scope Use
This model is not suitable for:
- High-precision tasks requiring full 16-bit or 32-bit precision
- Applications requiring extremely low latency
## Bias, Risks, and Limitations
The model may inherit biases present in the training data. Users should be cautious when deploying the model in sensitive applications.
### Recommendations
Users should evaluate the model's performance on their specific tasks and datasets before deployment. Consider fine-tuning the model for better alignment with your use case.
## How to Get Started with the Model
Use the code below to get started with the model:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
# Quantization configuration
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True
)
# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("emredeveloper/DeepSeek-R1-Distill-Qwen-1.5B-4bit", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"emredeveloper/DeepSeek-R1-Distill-Qwen-1.5B-4bit",
quantization_config=quantization_config,
device_map="auto",
trust_remote_code=True
)
# Generate text
input_text = "Hello, how are you?"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |