Add library name and pipeline tag
Browse filesAdds a link to the paper, and sets the library name to transformers.
This PR also sets the pipeline tag to `image-to-image`, enabling people to find your model at https://huggingface.co/models?pipeline_tag=image-to-image.
README.md
CHANGED
@@ -1,13 +1,15 @@
|
|
1 |
---
|
|
|
|
|
|
|
2 |
frameworks:
|
3 |
- Pytorch
|
4 |
-
license: apache-2.0
|
5 |
tasks:
|
6 |
- any-to-any
|
7 |
---
|
8 |
|
9 |
## What is the Nexus-Gen
|
10 |
-
Nexus-Gen is a unified model that synergizes the language reasoning capabilities of LLMs with the image synthesis power of diffusion models. To align the embedding space of the LLM and diffusion model, we conduct a dual-phase alignment training process. (1) The autoregressive LLM learns to predict image embeddings conditioned on multimodal inputs, while (2) the vision decoder is trained to reconstruct high-fidelity images from these embeddings. During training the LLM, we identified a critical discrepancy between the autoregressive paradigm's training and inference phases, where error accumulation in continuous embedding space severely degrades generation quality. To avoid this issue, we introduce a prefilled autoregression strategy that prefills input sequence with position-embedded special tokens instead of continuous embeddings. Through dual-phase training, Nexus-Gen has developed the integrated capability to comprehensively address the image understanding, generation and editing tasks as follows.
|
11 |
|
12 |
More information please refer to our repo: https://github.com/modelscope/Nexus-Gen.git
|
13 |
|
@@ -55,4 +57,4 @@ python image_editing.py
|
|
55 |
```
|
56 |
|
57 |
### Training Codes
|
58 |
-
Nexus-Gen is trained base on [ms-swift](https://github.com/modelscope/ms-swift.git) and [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio.git). You can find the training scripts in `train/scripts/train_decoder.sh` and `train_llm.sh`.
|
|
|
1 |
---
|
2 |
+
license: apache-2.0
|
3 |
+
library_name: transformers
|
4 |
+
pipeline_tag: image-to-image
|
5 |
frameworks:
|
6 |
- Pytorch
|
|
|
7 |
tasks:
|
8 |
- any-to-any
|
9 |
---
|
10 |
|
11 |
## What is the Nexus-Gen
|
12 |
+
[Nexus-Gen](https://huggingface.co/papers/2504.21356) is a unified model that synergizes the language reasoning capabilities of LLMs with the image synthesis power of diffusion models. To align the embedding space of the LLM and diffusion model, we conduct a dual-phase alignment training process. (1) The autoregressive LLM learns to predict image embeddings conditioned on multimodal inputs, while (2) the vision decoder is trained to reconstruct high-fidelity images from these embeddings. During training the LLM, we identified a critical discrepancy between the autoregressive paradigm's training and inference phases, where error accumulation in continuous embedding space severely degrades generation quality. To avoid this issue, we introduce a prefilled autoregression strategy that prefills input sequence with position-embedded special tokens instead of continuous embeddings. Through dual-phase training, Nexus-Gen has developed the integrated capability to comprehensively address the image understanding, generation and editing tasks as follows.
|
13 |
|
14 |
More information please refer to our repo: https://github.com/modelscope/Nexus-Gen.git
|
15 |
|
|
|
57 |
```
|
58 |
|
59 |
### Training Codes
|
60 |
+
Nexus-Gen is trained base on [ms-swift](https://github.com/modelscope/ms-swift.git) and [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio.git). You can find the training scripts in `train/scripts/train_decoder.sh` and `train_llm.sh`.
|