ZeroXClem commited on
Commit
30dab3c
Β·
verified Β·
1 Parent(s): bd9007e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +107 -16
README.md CHANGED
@@ -9,33 +9,51 @@ library_name: transformers
9
  tags:
10
  - mergekit
11
  - merge
12
-
13
  ---
14
- # merge
15
 
16
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
17
 
18
- ## Merge Details
19
- ### Merge Method
20
 
21
- This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [Qwen/Qwen2.5-7B-Instruct-1M](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-1M) as a base.
 
 
 
 
 
 
22
 
23
- ### Models Merged
24
 
25
- The following models were included in the merge:
26
- * [Sakalti/SJT-7B-1M](https://huggingface.co/Sakalti/SJT-7B-1M)
27
- * [Triangle104/Q2.5-Instruct-1M_Harmony](https://huggingface.co/Triangle104/Q2.5-Instruct-1M_Harmony)
28
- * [bunnycore/Qwen2.5-7B-RRP-1M](https://huggingface.co/bunnycore/Qwen2.5-7B-RRP-1M)
29
- * [huihui-ai/Qwen2.5-7B-Instruct-1M-abliterated](https://huggingface.co/huihui-ai/Qwen2.5-7B-Instruct-1M-abliterated)
30
 
31
- ### Configuration
32
 
33
- The following YAML configuration was used to produce this model:
 
 
 
 
 
 
 
 
 
 
34
 
35
- ```yaml
36
 
37
- # Merge configuration for ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M using Model Stock
 
38
 
 
 
39
  base_model: Qwen/Qwen2.5-7B-Instruct-1M
40
  dtype: bfloat16
41
  merge_method: model_stock
@@ -46,6 +64,79 @@ models:
46
  - model: bunnycore/Qwen2.5-7B-RRP-1M
47
  - model: huihui-ai/Qwen2.5-7B-Instruct-1M-abliterated
48
  tokenizer_source: Qwen/Qwen2.5-7B-Instruct-1M
 
49
 
 
50
 
 
 
 
 
 
51
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  tags:
10
  - mergekit
11
  - merge
12
+ license: mit
13
  ---
14
+ # ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M
15
 
16
+ **ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M** is a custom merged language model based on **Qwen2.5-7B** with enhanced reasoning, roleplaying, and long-context capabilities. This model supports up to **1 million token** context lengths, making it ideal for ultra-long text processing, deep reasoning tasks, and immersive roleplay interactions.
17
 
18
+ ---
 
19
 
20
+ ## πŸ”§ **Model Details**
21
+ - **Base Model**: `Qwen/Qwen2.5-7B-Instruct-1M`
22
+ - **Models Used in Merge**:
23
+ - `Qwen/Qwen2.5-7B-Instruct-1M`
24
+ - `bunnycore/Qwen2.5-7B-RRP-1M`
25
+ - `Triangle104/Q2.5-Instruct-1M_Harmony`
26
+ - **Merge Method**: `MODEL_STOCK` (Optimized layer-wise weight averaging)
27
 
28
+ ---
29
 
30
+ ## πŸ“– **Overview**
31
+ **Qwen2.5-7B-CelestialHarmony-1M** enhances the **Qwen2.5-7B series** with a fine-tuned balance of roleplaying dynamics, structured reasoning, and long-context memory. The model is particularly well-suited for:
32
+ - **Roleplaying** πŸ§β€β™‚οΈ: Immersive character-based storytelling with deep contextual awareness.
33
+ - **Reasoning & Thought Processing** 🧠: Capable of structured logical thinking, especially when prompted with `<think>` tags.
34
+ - **Ultra-Long Context Handling** πŸ“œ: Efficient processing of sequences up to **1,010,000 tokens** using optimized sparse attention.
35
 
36
+ ---
37
 
38
+ ## βš™οΈ **Technical Specifications**
39
+ | Specification | Value |
40
+ |--------------|---------|
41
+ | **Model Type** | Causal Language Model |
42
+ | **Parameters** | 7.61B |
43
+ | **Non-Embedding Parameters** | 6.53B |
44
+ | **Layers** | 28 |
45
+ | **Attention Heads (GQA)** | 28 (Q), 4 (KV) |
46
+ | **Max Context Length** | 1,010,000 tokens |
47
+ | **Max Generation Length** | 8,192 tokens |
48
+ | **Merge Method** | Model Stock|
49
 
50
+ ---
51
 
52
+ ## πŸ”¬ **Merging Details**
53
+ This model was merged using the **Model Stock** method, which optimally averages weights from multiple fine-tuned models to create a more efficient, balanced, and performant model.
54
 
55
+ ### **Merge YAML Configuration**
56
+ ```yaml
57
  base_model: Qwen/Qwen2.5-7B-Instruct-1M
58
  dtype: bfloat16
59
  merge_method: model_stock
 
64
  - model: bunnycore/Qwen2.5-7B-RRP-1M
65
  - model: huihui-ai/Qwen2.5-7B-Instruct-1M-abliterated
66
  tokenizer_source: Qwen/Qwen2.5-7B-Instruct-1M
67
+ ```
68
 
69
+ ---
70
 
71
+ ## πŸš€ **Quickstart**
72
+ ### **Install Required Packages**
73
+ Ensure you have the latest `transformers` library installed:
74
+ ```bash
75
+ pip install transformers torch accelerate
76
  ```
77
+
78
+ ### **Load and Use the Model**
79
+ ```python
80
+ from transformers import AutoModelForCausalLM, AutoTokenizer
81
+
82
+ model_name = "ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M"
83
+
84
+ model = AutoModelForCausalLM.from_pretrained(
85
+ model_name,
86
+ torch_dtype="auto",
87
+ device_map="auto"
88
+ )
89
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
90
+
91
+ prompt = "Tell me a short story about an ancient celestial warrior."
92
+ messages = [
93
+ {"role": "system", "content": "You are a wise celestial storyteller."},
94
+ {"role": "user", "content": prompt}
95
+ ]
96
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
97
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
98
+
99
+ generated_ids = model.generate(**model_inputs, max_new_tokens=512)
100
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
101
+
102
+ print(response)
103
+ ```
104
+
105
+ ---
106
+
107
+ ## ⚑ **Optimized Deployment with vLLM**
108
+ For long-context inference, use **vLLM**:
109
+ ```bash
110
+ git clone -b dev/dual-chunk-attn [email protected]:QwenLM/vllm.git
111
+ cd vllm
112
+ pip install -e . -v
113
+ ```
114
+ Run the model:
115
+ ```bash
116
+ vllm serve ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M \
117
+ --tensor-parallel-size 4 \
118
+ --max-model-len 1010000 \
119
+ --enable-chunked-prefill --max-num-batched-tokens 131072 \
120
+ --enforce-eager \
121
+ --max-num-seqs 1
122
+ ```
123
+
124
+ ---
125
+
126
+ ## 🎯 **Model Capabilities**
127
+ βœ… **Roleplay & Storytelling** – Designed for engaging interactions.
128
+ βœ… **Long-Context Awareness** – Handles texts up to **1M tokens**.
129
+ βœ… **Logical Thinking & Reasoning** – Supports `<think>` tag to enhance thought structuring.
130
+ οΏ½οΏ½οΏ½ **Optimized Merge Strategy** – Uses `Model Stock` for superior generalization.
131
+
132
+ ---
133
+
134
+ ## πŸ“œ **Acknowledgments**
135
+ This model is built on top of **Qwen2.5-7B**, with contributions from **bunnycore, Triangle104, and Sakalti**, leveraging the **Model Stock** merging methodology.
136
+
137
+ For further details, see:
138
+ - πŸ“„ [Qwen2.5-7B Technical Report](https://arxiv.org/abs/2501.15383)
139
+ - πŸ“– [MergeKit Documentation](https://github.com/mlfoundations/mergekit)
140
+ - πŸš€ [vLLM for Long-Context Inference](https://github.com/QwenLM/vllm)
141
+
142
+ ---