QwQ-4B-Instruct / README.md
prithivMLmods's picture
Update README.md
34dde4a verified
---
license: apache-2.0
datasets:
- amphora/QwQ-LongCoT-130K
language:
- en
base_model:
- prithivMLmods/QwQ-LCoT-7B-Instruct
pipeline_tag: text-generation
library_name: transformers
tags:
- QwQ
- Adapter
- safetensors
- Qwen2.5
- text-generation-inference
---
<pre align="center">
________ ________ _____ ___.
\_____ \ __ _ __\_____ \ / | | \_ |__
/ / \ \ \ \/ \/ / / / \ \ ______ / | |_ | __ \
/ \_/. \ \ / / \_/. \ /_____/ / ^ / | \_\ \
\_____\ \_/ \/\_/ \_____\ \_/ \____ | |___ /
\__> \__> |__| \/
</pre>
The **QwQ-4B-Instruct** is a lightweight and efficient fine-tuned language model for instruction-following tasks and reasoning. It is based on a quantized version of the **Qwen2.5-7B** model, optimized for inference speed and reduced memory consumption, while retaining robust capabilities for complex tasks.
With its robust natural language processing capabilities, **QwQ-4B-Instruct** excels in generating step-by-step solutions, creative content, and logical analyses. Its architecture integrates advanced understanding of both structured and unstructured data, ensuring precise text generation aligned with user inputs.
- Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains.
- Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots.
- **Long-context Support** up to 128K tokens and can generate up to 8K tokens.
- **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
# **Demo Start**
Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "prithivMLmods/QwQ-4B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```
# **Run with Ollama [Ollama Run]**
Ollama makes running machine learning models simple and efficient. Follow these steps to set up and run your GGUF models quickly.
## Quick Start: Step-by-Step Guide
| Step | Description | Command / Instructions |
|------|-------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1 | **Install Ollama 🦙** | Download Ollama from [https://ollama.com/download](https://ollama.com/download) and install it on your system. |
| 2 | **Create Your Model File** | - Create a file named after your model, e.g., `metallama`. |
| | | - Add the following line to specify the base model: |
| | | ```bash |
| | | FROM Llama-3.2-1B.F16.gguf |
| | | ``` |
| | | - Ensure the base model file is in the same directory. |
| 3 | **Create and Patch the Model** | Run the following commands to create and verify your model: |
| | | ```bash |
| | | ollama create metallama -f ./metallama |
| | | ollama list |
| | | ``` |
| 4 | **Run the Model** | Use the following command to start your model: |
| | | ```bash |
| | | ollama run metallama |
| | | ``` |
| 5 | **Interact with the Model** | Once the model is running, interact with it: |
| | | ```plaintext |
| | | >>> Tell me about Space X. |
| | | Space X, the private aerospace company founded by Elon Musk, is revolutionizing space exploration... |
| | | ``` |
## Conclusion
With Ollama, running and interacting with models is seamless. Start experimenting today!