File size: 8,376 Bytes

---
license: mit
library_name: transformers
tags:
- mergekit
- merge
base_model:
- Qwen/Qwen2.5-7B-Instruct-1M
- Sakalti/SJT-7B-1M
- Triangle104/Q2.5-Instruct-1M_Harmony
- bunnycore/Qwen2.5-7B-RRP-1M
- huihui-ai/Qwen2.5-7B-Instruct-1M-abliterated
model-index:
- name: Qwen2.5-7B-CelestialHarmony-1M
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: IFEval (0-Shot)
      type: HuggingFaceH4/ifeval
      args:
        num_few_shot: 0
    metrics:
    - type: inst_level_strict_acc and prompt_level_strict_acc
      value: 59.44
      name: strict accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: BBH (3-Shot)
      type: BBH
      args:
        num_few_shot: 3
    metrics:
    - type: acc_norm
      value: 34.51
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MATH Lvl 5 (4-Shot)
      type: hendrycks/competition_math
      args:
        num_few_shot: 4
    metrics:
    - type: exact_match
      value: 33.01
      name: exact match
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GPQA (0-shot)
      type: Idavidrein/gpqa
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 9.17
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MuSR (0-shot)
      type: TAUR-Lab/MuSR
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 16.74
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU-PRO (5-shot)
      type: TIGER-Lab/MMLU-Pro
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 37.63
      name: accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M
      name: Open LLM Leaderboard
---
# ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M

**ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M** is a custom merged language model based on **Qwen2.5-7B** with enhanced reasoning, roleplaying, and long-context capabilities. This model supports up to **1 million token** context lengths, making it ideal for ultra-long text processing, deep reasoning tasks, and immersive roleplay interactions.

Quants are availble in GGUF format, provided by [mradermacher](https://huggingface.co/mradermacher). 
1. [GGUF](https://huggingface.co/mradermacher/Qwen2.5-7B-CelestialHarmony-1M-GGUF)
2. [imatrix GGUF](https://huggingface.co/mradermacher/Qwen2.5-7B-CelestialHarmony-1M-i1-GGUF)
---

## 🔧 **Model Details**
- **Base Model**: `Qwen/Qwen2.5-7B-Instruct-1M`
- **Models Used in Merge**:
  - `Qwen/Qwen2.5-7B-Instruct-1M`
  - `bunnycore/Qwen2.5-7B-RRP-1M`
  - `Triangle104/Q2.5-Instruct-1M_Harmony`
  - `Sakalti/SJT-7B-1M`
  - `huihui-ai/Qwen2.5-7B-Instruct-1M-abliterated`
- **Merge Method**: `MODEL_STOCK` (Optimized layer-wise weight averaging)

---

## 📖 **Overview**
**Qwen2.5-7B-CelestialHarmony-1M** enhances the **Qwen2.5-7B series** with a fine-tuned balance of roleplaying dynamics, structured reasoning, and long-context memory. The model is particularly well-suited for:
- **Roleplaying** 🧝‍♂️: Immersive character-based storytelling with deep contextual awareness.
- **Reasoning & Thought Processing** 🧠: Capable of structured logical thinking, especially when prompted with `<think>` tags.
- **Ultra-Long Context Handling** 📜: Efficient processing of sequences up to **1,010,000 tokens** using optimized sparse attention.

---

## ⚙️ **Technical Specifications**
| Specification  | Value |
|--------------|---------|
| **Model Type** | Causal Language Model |
| **Parameters** | 7.61B |
| **Non-Embedding Parameters** | 6.53B |
| **Layers** | 28 |
| **Attention Heads (GQA)** | 28 (Q), 4 (KV) |
| **Max Context Length** | 1,010,000 tokens |
| **Max Generation Length** | 8,192 tokens |
| **Merge Method** | Model Stock|

---

## 🔬 **Merging Details**
This model was merged using the **Model Stock** method, which optimally averages weights from multiple fine-tuned models to create a more efficient, balanced, and performant model.

### **Merge YAML Configuration**
```yaml
base_model: Qwen/Qwen2.5-7B-Instruct-1M
dtype: bfloat16
merge_method: model_stock
models:
  - model: Qwen/Qwen2.5-7B-Instruct-1M
  - model: Triangle104/Q2.5-Instruct-1M_Harmony
  - model: Sakalti/SJT-7B-1M
  - model: bunnycore/Qwen2.5-7B-RRP-1M
  - model: huihui-ai/Qwen2.5-7B-Instruct-1M-abliterated
tokenizer_source: Qwen/Qwen2.5-7B-Instruct-1M
```

---

## 🚀 **Quickstart**
### **Install Required Packages**
Ensure you have the latest `transformers` library installed:
```bash
pip install transformers torch accelerate
```

### **Load and Use the Model**
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Tell me a short story about an ancient celestial warrior."
messages = [
    {"role": "system", "content": "You are a wise celestial storyteller."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(**model_inputs, max_new_tokens=512)
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(response)
```

---

## ⚡ **Optimized Deployment with vLLM**
For long-context inference, use **vLLM**:
```bash
git clone -b dev/dual-chunk-attn [email protected]:QwenLM/vllm.git
cd vllm
pip install -e . -v
```
Run the model:
```bash
vllm serve ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M \
  --tensor-parallel-size 4 \
  --max-model-len 1010000 \
  --enable-chunked-prefill --max-num-batched-tokens 131072 \
  --enforce-eager \
  --max-num-seqs 1
```

---

## 🎯 **Model Capabilities**
✅ **Roleplay & Storytelling** – Designed for engaging interactions.  
✅ **Long-Context Awareness** – Handles texts up to **1M tokens**.  
✅ **Logical Thinking & Reasoning** – Supports `<think>` tag to enhance thought structuring.  
✅ **Optimized Merge Strategy** – Uses `Model Stock` for superior generalization.  

---

## 📜 **Acknowledgments**
This model is built on top of **Qwen2.5-7B**, with contributions from **bunnycore, Triangle104, and Sakalti**, leveraging the **Model Stock** merging methodology.

For further details, see:
- 📄 [Qwen2.5-7B Technical Report](https://arxiv.org/abs/2501.15383)
- 📖 [MergeKit Documentation](https://github.com/mlfoundations/mergekit)
- 🚀 [vLLM for Long-Context Inference](https://github.com/QwenLM/vllm)

---
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/ZeroXClem__Qwen2.5-7B-CelestialHarmony-1M-details)

|      Metric       |Value|
|-------------------|----:|
|Avg.               |31.75|
|IFEval (0-Shot)    |59.44|
|BBH (3-Shot)       |34.51|
|MATH Lvl 5 (4-Shot)|33.01|
|GPQA (0-shot)      | 9.17|
|MuSR (0-shot)      |16.74|
|MMLU-PRO (5-shot)  |37.63|