prithivMLmods
/

QwQ-4B-Instruct

@@ -17,75 +17,91 @@ tags:
 - Qwen2.5
 - text-generation-inference
 ---
-### **QwQ-4B-Instruct-Model-Files**
 The **QwQ-4B-Instruct** is a lightweight and efficient fine-tuned language model for instruction-following tasks and reasoning. It is based on a quantized version of the **Qwen2.5-7B** model, optimized for inference speed and reduced memory consumption, while retaining robust capabilities for complex tasks.
-| **File Name**                   | **Size**        | **Description**                                    | **Upload Status** |
-|----------------------------------|-----------------|---------------------------------------------------|-------------------|
-| `.gitattributes`                 | 1.57 kB         | Tracks files stored with Git LFS.                 | Uploaded          |
-| `README.md`                      | 271 Bytes       | Basic project documentation.                      | Updated           |
-| `added_tokens.json`              | 657 Bytes       | Specifies additional tokens for the tokenizer.    | Uploaded          |
-| `config.json`                    | 1.26 kB         | Detailed model configuration file.                | Uploaded          |
-| `generation_config.json`         | 281 Bytes       | Configuration for text generation settings.        | Uploaded          |
-| `merges.txt`                     | 1.82 MB         | Byte pair encoding (BPE) merge rules for tokenizer.| Uploaded          |
-| `model-00001-of-00002.safetensors`| 4.46 GB         | Part 1 of the model weights in safetensors format.| Uploaded (LFS)    |
-| `model-00002-of-00002.safetensors`| 1.09 GB         | Part 2 of the model weights in safetensors format.| Uploaded (LFS)    |
-| `model.safetensors.index.json`   | 124 kB          | Index file for safetensors model sharding.         | Uploaded          |
-| `special_tokens_map.json`        | 644 Bytes       | Mapping of special tokens (e.g., <PAD>, <EOS>).   | Uploaded          |
-| `tokenizer.json`                 | 11.4 MB         | Complete tokenizer configuration.                 | Uploaded (LFS)    |
-| `tokenizer_config.json`          | 7.73 kB         | Settings for the tokenizer integration.           | Uploaded          |
-| `vocab.json`                     | 2.78 MB         | Vocabulary file containing token-to-id mappings.  | Uploaded          |
-### **Key Features:**
-1. **Model Size:**
-   - **4.46B parameters.**
-2. **Precision Support:**
-   - Available in multiple tensor types:
-     - **FP16**
-     - **F32**
-     - **U8 (Quantized)**
-3. **Model Sharding:**
-   - The model weights are stored in two parts for efficient download:
-     - `model-00001-of-00002.safetensors` (4.46 GB)
-     - `model-00002-of-00002.safetensors` (1.09 GB)
-   - Indexed with `model.safetensors.index.json`.
-4. **Tokenizer:**
-   - Uses Byte-Pair Encoding (BPE).
-   - Includes:
-     - `vocab.json` (2.78 MB)
-     - `merges.txt` (1.82 MB)
-     - `tokenizer.json` (11.4 MB, pre-trained configuration).
-   - Special tokens mapped in `special_tokens_map.json` (e.g., `<pad>`, `<eos>`).
-5. **Configuration Files:**
-   - `config.json`: Defines the architecture, hyperparameters, and settings.
-   - `generation_config.json`: Specifies text generation behavior (e.g., max length, temperature).
----
-### **Training Dataset:**
-- **Dataset Name:** [amphora/QwQ-LongCoT-130K](https://huggingface.co/datasets/amphora/QwQ-LongCoT-130K)
-- **Size:** 133k examples.
-- **Focus:** Chain-of-Thought reasoning for detailed and logical outputs.
----
-### **Use Cases:**
-1. **Instruction-Following:**
-   - Excels in handling concise and multi-step instructions.
-2. **Reasoning:**
-   - Well-suited for tasks requiring logical deductions and detailed explanations.
-3. **Text Generation:**
-   - Generates coherent and contextually aware responses across various domains.
-4. **Resource-Constrained Applications:**
-   - Optimized for scenarios requiring lower computational resources due to its smaller model size and quantization.
----

 - Qwen2.5
 - text-generation-inference
 ---
+<pre align="center">
+________            ________       _____  ___.
+\_____  \  __  _  __\_____  \     /  |  | \_ |__
+ /  / \  \ \ \/ \/ / /  / \  \   /   |  |_ | __ \
+/   \_/.  \ \     / /   \_/.  \ /    ^   / | \_\ \
+\_____\ \_/  \/\_/  \_____\ \_/ \____   |  |___  /
+       \__>                \__>      |__|      \/
+</pre>
 The **QwQ-4B-Instruct** is a lightweight and efficient fine-tuned language model for instruction-following tasks and reasoning. It is based on a quantized version of the **Qwen2.5-7B** model, optimized for inference speed and reduced memory consumption, while retaining robust capabilities for complex tasks.
+With its robust natural language processing capabilities, **QwQ-4B-Instruct** excels in generating step-by-step solutions, creative content, and logical analyses. Its architecture integrates advanced understanding of both structured and unstructured data, ensuring precise text generation aligned with user inputs.
+- Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains.
+- Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots.
+- **Long-context Support** up to 128K tokens and can generate up to 8K tokens.
+- **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
+# **Demo Start**
+Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "prithivMLmods/QwQ-4B-Instruct"
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+prompt = "Give me a short introduction to large language model."
+messages = [
+    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=512
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+```
+# **Run with Ollama [Ollama Run]**
+Ollama makes running machine learning models simple and efficient. Follow these steps to set up and run your GGUF models quickly.
+## Quick Start: Step-by-Step Guide
+| Step | Description                                     | Command / Instructions                                                                                                                                   |
+|------|-------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|
+| 1    | **Install Ollama 🦙**                          | Download Ollama from [https://ollama.com/download](https://ollama.com/download) and install it on your system.                                            |
+| 2    | **Create Your Model File**                     | - Create a file named after your model, e.g., `metallama`.                                                                                               |
+|      |                                                 | - Add the following line to specify the base model:                                                                                                      |
+|      |                                                 |   ```bash                                                                                                                                               |
+|      |                                                 |   FROM Llama-3.2-1B.F16.gguf                                                                                                                            |
+|      |                                                 |   ```                                                                                                                                                   |
+|      |                                                 | - Ensure the base model file is in the same directory.                                                                                                   |
+| 3    | **Create and Patch the Model**                 | Run the following commands to create and verify your model:                                                                                             |
+|      |                                                 | ```bash                                                                                                                                                 |
+|      |                                                 | ollama create metallama -f ./metallama                                                                                                                  |
+|      |                                                 | ollama list                                                                                                                                             |
+|      |                                                 | ```                                                                                                                                                     |
+| 4    | **Run the Model**                              | Use the following command to start your model:                                                                                                          |
+|      |                                                 | ```bash                                                                                                                                                 |
+|      |                                                 | ollama run metallama                                                                                                                                    |
+|      |                                                 | ```                                                                                                                                                     |
+| 5    | **Interact with the Model**                    | Once the model is running, interact with it:                                                                                                            |
+|      |                                                 | ```plaintext                                                                                                                                             |
+|      |                                                 | >>> Tell me about Space X.                                                                                                                              |
+|      |                                                 | Space X, the private aerospace company founded by Elon Musk, is revolutionizing space exploration...                                                    |
+|      |                                                 | ```                                                                                                                                                     |
+## Conclusion
+With Ollama, running and interacting with models is seamless. Start experimenting today!