prithivMLmods
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -17,75 +17,91 @@ tags:
|
|
17 |
- Qwen2.5
|
18 |
- text-generation-inference
|
19 |
---
|
20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
The **QwQ-4B-Instruct** is a lightweight and efficient fine-tuned language model for instruction-following tasks and reasoning. It is based on a quantized version of the **Qwen2.5-7B** model, optimized for inference speed and reduced memory consumption, while retaining robust capabilities for complex tasks.
|
23 |
|
24 |
-
|
25 |
-
|----------------------------------|-----------------|---------------------------------------------------|-------------------|
|
26 |
-
| `.gitattributes` | 1.57 kB | Tracks files stored with Git LFS. | Uploaded |
|
27 |
-
| `README.md` | 271 Bytes | Basic project documentation. | Updated |
|
28 |
-
| `added_tokens.json` | 657 Bytes | Specifies additional tokens for the tokenizer. | Uploaded |
|
29 |
-
| `config.json` | 1.26 kB | Detailed model configuration file. | Uploaded |
|
30 |
-
| `generation_config.json` | 281 Bytes | Configuration for text generation settings. | Uploaded |
|
31 |
-
| `merges.txt` | 1.82 MB | Byte pair encoding (BPE) merge rules for tokenizer.| Uploaded |
|
32 |
-
| `model-00001-of-00002.safetensors`| 4.46 GB | Part 1 of the model weights in safetensors format.| Uploaded (LFS) |
|
33 |
-
| `model-00002-of-00002.safetensors`| 1.09 GB | Part 2 of the model weights in safetensors format.| Uploaded (LFS) |
|
34 |
-
| `model.safetensors.index.json` | 124 kB | Index file for safetensors model sharding. | Uploaded |
|
35 |
-
| `special_tokens_map.json` | 644 Bytes | Mapping of special tokens (e.g., <PAD>, <EOS>). | Uploaded |
|
36 |
-
| `tokenizer.json` | 11.4 MB | Complete tokenizer configuration. | Uploaded (LFS) |
|
37 |
-
| `tokenizer_config.json` | 7.73 kB | Settings for the tokenizer integration. | Uploaded |
|
38 |
-
| `vocab.json` | 2.78 MB | Vocabulary file containing token-to-id mappings. | Uploaded |
|
39 |
|
40 |
-
|
|
|
|
|
|
|
41 |
|
42 |
-
|
43 |
-
- **4.46B parameters.**
|
44 |
|
45 |
-
|
46 |
-
- Available in multiple tensor types:
|
47 |
-
- **FP16**
|
48 |
-
- **F32**
|
49 |
-
- **U8 (Quantized)**
|
50 |
|
51 |
-
|
52 |
-
|
53 |
-
- `model-00001-of-00002.safetensors` (4.46 GB)
|
54 |
-
- `model-00002-of-00002.safetensors` (1.09 GB)
|
55 |
-
- Indexed with `model.safetensors.index.json`.
|
56 |
|
57 |
-
|
58 |
-
- Uses Byte-Pair Encoding (BPE).
|
59 |
-
- Includes:
|
60 |
-
- `vocab.json` (2.78 MB)
|
61 |
-
- `merges.txt` (1.82 MB)
|
62 |
-
- `tokenizer.json` (11.4 MB, pre-trained configuration).
|
63 |
-
- Special tokens mapped in `special_tokens_map.json` (e.g., `<pad>`, `<eos>`).
|
64 |
|
65 |
-
|
66 |
-
|
67 |
-
|
|
|
|
|
|
|
68 |
|
69 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
70 |
|
71 |
-
|
72 |
-
|
73 |
-
|
74 |
-
|
|
|
|
|
|
|
75 |
|
76 |
-
|
|
|
|
|
77 |
|
78 |
-
|
79 |
-
1. **Instruction-Following:**
|
80 |
-
- Excels in handling concise and multi-step instructions.
|
81 |
-
|
82 |
-
2. **Reasoning:**
|
83 |
-
- Well-suited for tasks requiring logical deductions and detailed explanations.
|
84 |
|
85 |
-
|
86 |
-
- Generates coherent and contextually aware responses across various domains.
|
87 |
|
88 |
-
|
89 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
90 |
|
91 |
-
|
|
|
|
17 |
- Qwen2.5
|
18 |
- text-generation-inference
|
19 |
---
|
20 |
+
<pre align="center">
|
21 |
+
________ ________ _____ ___.
|
22 |
+
\_____ \ __ _ __\_____ \ / | | \_ |__
|
23 |
+
/ / \ \ \ \/ \/ / / / \ \ / | |_ | __ \
|
24 |
+
/ \_/. \ \ / / \_/. \ / ^ / | \_\ \
|
25 |
+
\_____\ \_/ \/\_/ \_____\ \_/ \____ | |___ /
|
26 |
+
\__> \__> |__| \/
|
27 |
+
</pre>
|
28 |
|
29 |
The **QwQ-4B-Instruct** is a lightweight and efficient fine-tuned language model for instruction-following tasks and reasoning. It is based on a quantized version of the **Qwen2.5-7B** model, optimized for inference speed and reduced memory consumption, while retaining robust capabilities for complex tasks.
|
30 |
|
31 |
+
With its robust natural language processing capabilities, **QwQ-4B-Instruct** excels in generating step-by-step solutions, creative content, and logical analyses. Its architecture integrates advanced understanding of both structured and unstructured data, ensuring precise text generation aligned with user inputs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
|
33 |
+
- Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains.
|
34 |
+
- Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots.
|
35 |
+
- **Long-context Support** up to 128K tokens and can generate up to 8K tokens.
|
36 |
+
- **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
|
37 |
|
38 |
+
# **Demo Start**
|
|
|
39 |
|
40 |
+
Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
|
|
|
|
|
|
|
|
|
41 |
|
42 |
+
```python
|
43 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
|
|
|
44 |
|
45 |
+
model_name = "prithivMLmods/QwQ-4B-Instruct"
|
|
|
|
|
|
|
|
|
|
|
|
|
46 |
|
47 |
+
model = AutoModelForCausalLM.from_pretrained(
|
48 |
+
model_name,
|
49 |
+
torch_dtype="auto",
|
50 |
+
device_map="auto"
|
51 |
+
)
|
52 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
53 |
|
54 |
+
prompt = "Give me a short introduction to large language model."
|
55 |
+
messages = [
|
56 |
+
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
|
57 |
+
{"role": "user", "content": prompt}
|
58 |
+
]
|
59 |
+
text = tokenizer.apply_chat_template(
|
60 |
+
messages,
|
61 |
+
tokenize=False,
|
62 |
+
add_generation_prompt=True
|
63 |
+
)
|
64 |
+
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
65 |
|
66 |
+
generated_ids = model.generate(
|
67 |
+
**model_inputs,
|
68 |
+
max_new_tokens=512
|
69 |
+
)
|
70 |
+
generated_ids = [
|
71 |
+
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
72 |
+
]
|
73 |
|
74 |
+
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
75 |
+
```
|
76 |
+
# **Run with Ollama [Ollama Run]**
|
77 |
|
78 |
+
Ollama makes running machine learning models simple and efficient. Follow these steps to set up and run your GGUF models quickly.
|
|
|
|
|
|
|
|
|
|
|
79 |
|
80 |
+
## Quick Start: Step-by-Step Guide
|
|
|
81 |
|
82 |
+
| Step | Description | Command / Instructions |
|
83 |
+
|------|-------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|
|
84 |
+
| 1 | **Install Ollama 🦙** | Download Ollama from [https://ollama.com/download](https://ollama.com/download) and install it on your system. |
|
85 |
+
| 2 | **Create Your Model File** | - Create a file named after your model, e.g., `metallama`. |
|
86 |
+
| | | - Add the following line to specify the base model: |
|
87 |
+
| | | ```bash |
|
88 |
+
| | | FROM Llama-3.2-1B.F16.gguf |
|
89 |
+
| | | ``` |
|
90 |
+
| | | - Ensure the base model file is in the same directory. |
|
91 |
+
| 3 | **Create and Patch the Model** | Run the following commands to create and verify your model: |
|
92 |
+
| | | ```bash |
|
93 |
+
| | | ollama create metallama -f ./metallama |
|
94 |
+
| | | ollama list |
|
95 |
+
| | | ``` |
|
96 |
+
| 4 | **Run the Model** | Use the following command to start your model: |
|
97 |
+
| | | ```bash |
|
98 |
+
| | | ollama run metallama |
|
99 |
+
| | | ``` |
|
100 |
+
| 5 | **Interact with the Model** | Once the model is running, interact with it: |
|
101 |
+
| | | ```plaintext |
|
102 |
+
| | | >>> Tell me about Space X. |
|
103 |
+
| | | Space X, the private aerospace company founded by Elon Musk, is revolutionizing space exploration... |
|
104 |
+
| | | ``` |
|
105 |
|
106 |
+
## Conclusion
|
107 |
+
With Ollama, running and interacting with models is seamless. Start experimenting today!
|