Xiaomabufei
/

lumos

lumos

image to image

text to image

novel view synthesis

image to video

Model card Files Files and versions Community

Xiaomabufei commited on Dec 11, 2024

Commit

0bfca29

verified ·

1 Parent(s): 8b6be37

Update README.md

Browse files

Files changed (1) hide show

README.md +3 -10

README.md CHANGED Viewed

@@ -42,17 +42,10 @@ Source code is available at https://github.com/xiaomabufei/lumos.
 - **Developed by:** Lumos
 - **Model type:** Diffusion-Transformer-based generative model
 - **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
-- **Model Description:** Lumos-I2I is a model that can be used to generate and modify images based on image prompt.
 It is a [Transformer Latent Diffusion Model](https://arxiv.org/abs/2310.00426) that uses one fixed, pretrained vision encoders ([DINO](
-https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth))
-and one latent feature encoder ([VAE](https://arxiv.org/abs/2112.10752)).
-Lumos-T2I is a model that can be used to generate and modify images based on image prompt.
-It is a [Transformer Latent Diffusion Model](https://arxiv.org/abs/2310.00426) that uses one fixed, pretrained vision encoders ([DINO](
-https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth))
-and one latent feature encoder ([VAE](https://arxiv.org/abs/2112.10752)).
-Lumos-T2I is a model that can be used to generate and modify images based on text prompts.
 It is a [Transformer Latent Diffusion Model](https://arxiv.org/abs/2310.00426) that uses one fixed, pretrained text encoders ([T5](
-https://huggingface.co/DeepFloyd/t5-v1_1-xxl))
-and one latent feature encoder ([VAE](https://arxiv.org/abs/2112.10752)).
 - **Resources for more information:** Check out our [GitHub Repository](https://github.com/xiaomabufei/lumos) and the [Lumos report on arXiv](https://arxiv.org/pdf/2412.07767).

 - **Developed by:** Lumos
 - **Model type:** Diffusion-Transformer-based generative model
 - **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
+- **Model Description:** **Lumos-I2I** is a model that can be used to generate and modify images based on the image prompt.
 It is a [Transformer Latent Diffusion Model](https://arxiv.org/abs/2310.00426) that uses one fixed, pretrained vision encoders ([DINO](
+https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth)). **Lumos-T2I** is a model that can be used to generate and modify images based on text prompts.
 It is a [Transformer Latent Diffusion Model](https://arxiv.org/abs/2310.00426) that uses one fixed, pretrained text encoders ([T5](
+https://huggingface.co/DeepFloyd/t5-v1_1-xxl)).
 - **Resources for more information:** Check out our [GitHub Repository](https://github.com/xiaomabufei/lumos) and the [Lumos report on arXiv](https://arxiv.org/pdf/2412.07767).