Xiaomabufei commited on
Commit
8b6be37
·
verified ·
1 Parent(s): 8eb353f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -2
README.md CHANGED
@@ -39,12 +39,20 @@ Source code is available at https://github.com/xiaomabufei/lumos.
39
 
40
  ### Model Description
41
 
42
- - **Developed by:** Lumos-I2I
43
  - **Model type:** Diffusion-Transformer-based generative model
44
  - **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
45
- - **Model Description:** This is a model that can be used to generate and modify images based on image prompt.
46
  It is a [Transformer Latent Diffusion Model](https://arxiv.org/abs/2310.00426) that uses one fixed, pretrained vision encoders ([DINO](
47
  https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth))
48
  and one latent feature encoder ([VAE](https://arxiv.org/abs/2112.10752)).
 
 
 
 
 
 
 
 
49
  - **Resources for more information:** Check out our [GitHub Repository](https://github.com/xiaomabufei/lumos) and the [Lumos report on arXiv](https://arxiv.org/pdf/2412.07767).
50
 
 
39
 
40
  ### Model Description
41
 
42
+ - **Developed by:** Lumos
43
  - **Model type:** Diffusion-Transformer-based generative model
44
  - **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
45
+ - **Model Description:** Lumos-I2I is a model that can be used to generate and modify images based on image prompt.
46
  It is a [Transformer Latent Diffusion Model](https://arxiv.org/abs/2310.00426) that uses one fixed, pretrained vision encoders ([DINO](
47
  https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth))
48
  and one latent feature encoder ([VAE](https://arxiv.org/abs/2112.10752)).
49
+ Lumos-T2I is a model that can be used to generate and modify images based on image prompt.
50
+ It is a [Transformer Latent Diffusion Model](https://arxiv.org/abs/2310.00426) that uses one fixed, pretrained vision encoders ([DINO](
51
+ https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth))
52
+ and one latent feature encoder ([VAE](https://arxiv.org/abs/2112.10752)).
53
+ Lumos-T2I is a model that can be used to generate and modify images based on text prompts.
54
+ It is a [Transformer Latent Diffusion Model](https://arxiv.org/abs/2310.00426) that uses one fixed, pretrained text encoders ([T5](
55
+ https://huggingface.co/DeepFloyd/t5-v1_1-xxl))
56
+ and one latent feature encoder ([VAE](https://arxiv.org/abs/2112.10752)).
57
  - **Resources for more information:** Check out our [GitHub Repository](https://github.com/xiaomabufei/lumos) and the [Lumos report on arXiv](https://arxiv.org/pdf/2412.07767).
58