Qwen3-72B-Synthesis / README.md
ehartford's picture
Update README.md
079a835 verified
---
license: apache-2.0
base_model:
- Qwen/Qwen3-32B
- Qwen/Qwen2.5-72B-Instruct
tags:
- merge
- frankenmerge
- qwen
---
# Qwen3-72B-Synthesis
This still doesn't work, I'm trying to fix it.
A Qwen3-Architecture 72B Model Forged from `Qwen3-32B` and `Qwen2.5-72B-Instruct`.
## Model Description
**Qwen3-72B-Synthesis** is an experimental, 80-layer, 72-billion-parameter large language model. It represents a novel approach to model creation, designed to produce a model with the pure, modern **Qwen3 architecture** while inheriting the vast, high-quality knowledge of the 72B-scale **Qwen2.5-Instruct** model.
This was not a simple merge. It was a multi-phase surgical procedure involving dimensional up-scaling, architectural alignment, and a strategic "knowledge transplant" using `MergeKit`. The result is a unique checkpoint that serves as an ideal starting point for further fine-tuning.
The core philosophy was to use `Qwen/Qwen3-32B` as the architectural "foundation" and `Qwen/Qwen2.5-72B-Instruct` as the "knowledge donor."
## Model Details
* **Architecture:** Qwen3 (RMSNorm, SwiGLU, no biases, includes `q_norm` and `k_norm`)
* **Parameters:** ~72 Billion
* **Layers:** 80
* **Foundation:** `Qwen/Qwen3-32B`
* **Donor:** `Qwen/Qwen2.5-72B-Instruct`
* **Tokenizer:** `Qwen/Qwen3-32B` Tokenizer (`vocab_size: 151936`)
## Model Creation Process
The creation of this model was a deliberate, three-phase process designed to overcome significant architectural incompatibilities.
### Phase 1: Foundation Upscaling
First, the `Qwen/Qwen3-32B` model (64 layers, 5120 hidden dim) was up-scaled to match the target 72B dimensions. This was done using a sophisticated **self-interpolation** script, where new dimensions were created by averaging different slices of the existing weights, rather than simple tiling. This produced `Qwen3-32B-Upscaled`, a 64-layer model with the correct 72B tensor shapes and Qwen3 architecture.
### Phase 2: Donor Alignment
The `Qwen/Qwen2.5-72B-Instruct` model was architecturally incompatible with the Qwen3 target. To solve this, a new donor model, `Qwen2.5-72B-Instruct-Aligned`, was created. This process involved:
1. Creating an empty 80-layer model shell with the pure Qwen3 architecture.
2. Surgically removing all `.bias` tensors from the Qwen2.5 weights.
3. Truncating the Qwen2.5 embedding and language model head layers from a vocabulary of 152064 to match Qwen3's 151936.
4. Loading the modified Qwen2.5 weights into the pure Qwen3 shell, resulting in a perfectly compatible donor model.
### Phase 3: Knowledge Transplant via MergeKit
With two architecturally-compatible models, the final merge was performed using `MergeKit`. A "Knowledge Bridge" strategy was employed to transplant a stable reasoning core from the donor while blending the rest.
The following `MergeKit` configuration was used:
```yaml
merge_method: linear
base_model: ./Qwen3-32B-Upscaled
dtype: bfloat16
slices:
# Slice 1: Blend the bottom 32 layers
- merge_method: linear
sources:
- model: ./Qwen3-32B-Upscaled
layer_range: [0, 32]
parameters:
weight: 0.5
- model: ./Qwen2.5-72B-Instruct-Aligned
layer_range: [0, 32]
parameters:
weight: 0.5
# Slice 2: The "Knowledge Bridge" - transplant a pure block from the donor
- merge_method: passthrough
sources:
- model: ./Qwen2.5-72B-Instruct-Aligned
layer_range: [32, 48]
# Slice 3: Blend the top layers
- merge_method: linear
sources:
- model: ./Qwen3-32B-Upscaled
layer_range: [32, 64]
parameters:
weight: 0.5
- model: ./Qwen2.5-72B-Instruct-Aligned
layer_range: [48, 80]
parameters:
weight: 0.5
tokenizer_source: ./Qwen3-32B-Upscaled
```
## How to Use
This model uses the standard Qwen ChatML prompt format.
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "cognitivecomputations/Qwen3-72B-Synthesis"
device = "cuda"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain the importance of the LLaMA paper in one paragraph."}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```
## Intended Use and Limitations
**This is an experimental model and should be considered a high-quality checkpoint, not a finished product.**
* **Fine-tuning is highly recommended.** While it inherits knowledge from a powerful instruction model, the merging process can create slight incoherence between layers. A round of fine-tuning on a high-quality instruction dataset is necessary to harmonize the weights and unlock its full potential.
* The model may exhibit unexpected behaviors, including repetitiveness or nonsensical outputs, prior to fine-tuning.
* This model has not been aligned for safety and may produce problematic, biased, or otherwise undesirable content. The user assumes all responsibility for the output generated.
## Acknowledgements
This model would not have been possible without the foundational work of Alibaba Cloud on the Qwen models, and the powerful, flexible `MergeKit` toolkit created by Charles Goddard and Arcee.ai.