Xiaomabufei
/

lumos

Xiaomabufei commited on Dec 11, 2024

Commit

8eb353f

verified ·

1 Parent(s): 39f2748

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -33,5 +33,18 @@ We further demonstrate the superiority of I2I priors over T2I priors on some tex
 ## 🚀 Model Structure
 ![pipeline](asset/method.png)

 ## 🚀 Model Structure
 ![pipeline](asset/method.png)
+[Lumos](https://arxiv.org/pdf/2412.07767) consists of transformer blocks for latent diffusion, which is applied for various visual generative tasks such as text-to-image, image-to-3D, and image-to-video generation.
+Source code is available at https://github.com/xiaomabufei/lumos.
+### Model Description
+- **Developed by:** Lumos-I2I
+- **Model type:** Diffusion-Transformer-based generative model
+- **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
+- **Model Description:** This is a model that can be used to generate and modify images based on image prompt.
+It is a [Transformer Latent Diffusion Model](https://arxiv.org/abs/2310.00426) that uses one fixed, pretrained vision encoders ([DINO](
+https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth))
+and one latent feature encoder ([VAE](https://arxiv.org/abs/2112.10752)).
+- **Resources for more information:** Check out our [GitHub Repository](https://github.com/xiaomabufei/lumos) and the [Lumos report on arXiv](https://arxiv.org/pdf/2412.07767).