lehduong commited on
Commit
4be061f
1 Parent(s): 8014d74

Delete PROMPT_GUIDE.md

Browse files
Files changed (1) hide show
  1. PROMPT_GUIDE.md +0 -91
PROMPT_GUIDE.md DELETED
@@ -1,91 +0,0 @@
1
- # Prompt Guide
2
-
3
- All examples are generated with a CFG of $4.2$, $50$ steps, and are non-cherrypicked unless otherwise stated. Negative prompt is set to:
4
- ```
5
- monochrome, greyscale, low-res, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, blurry, amputation
6
- ```
7
-
8
- ## 1. Text-to-Image
9
-
10
- ### 1.1 Long and detailed prompts give (much) better results.
11
-
12
- Since our training comprised of long and detailed prompts, the model is more likely to generate better images with detailed prompts.
13
-
14
-
15
- The model shows good text adherence with long and complex prompts as in below images. We use the first $20$ prompts from [simoryu's examples](https://cloneofsimo.github.io/compare_aura_sd3/). For detailed prompts, results of other models, refer to the above link.
16
-
17
- <p align="center">
18
- <img src="assets/promptguide_complex.jpg" alt="Text-to-Image results" width="800">
19
- </p>
20
-
21
-
22
- ### 1.2 Resolution
23
-
24
- The model generally works well with height and width in range of $[768; 1280]$ (height/width must be divisible by 16) for text-to-image. For other tasks, it performs best with resolution around $512$.
25
-
26
- ## 2. ID Customization & Subject-driven generation
27
-
28
- - The expected length of source captions is $30$ to $75$ words. Empirically, we find that longer prompt can help preserve the ID better but it might hinder the text-adherence for target caption.
29
-
30
- - We find it better to add some descriptions (e.g., from source caption) to target to preserve the identity, especially for complex subjects with delicate details.
31
-
32
- <p align="center">
33
- <img src="assets/promptguide_idtask.jpg" alt="ablation id task" width="800">
34
- </p>
35
-
36
- ## 3. Multiview generation
37
-
38
- We recommend not use captions, which describe the facial features e.g., looking at the camera, etc, to mitigate multifaced/janus problems.
39
-
40
- ## 4. Image editing
41
-
42
- We find it's generally better to set the guidance scale to lower value e.g., $[3; 3.5]$ to avoid over-saturation results.
43
-
44
- ## 5. Special tokens and available colors
45
-
46
- ### 5.1 Task Tokens
47
-
48
- | Task | Token | Additional Tokens |
49
- |:---------------------|:---------------------------|:------------------|
50
- | Text to Image | `[[text2image]]` | |
51
- | Deblurring | `[[deblurring]]` | |
52
- | Inpainting | `[[image_inpainting]]` | |
53
- | Canny-edge and Image | `[[canny2image]]` | |
54
- | Depth and Image | `[[depth2image]]` | |
55
- | Hed and Image | `[[hed2img]]` | |
56
- | Pose and Image | `[[pose2image]]` | |
57
- | Image editing with Instruction | `[[image_editing]]` | |
58
- | Semantic map and Image| `[[semanticmap2image]]` | `<#00FFFF cyan mask: object/to/segment>` |
59
- | Boundingbox and Image | `[[boundingbox2image]]` | `<#00FFFF cyan boundingbox: object/to/detect>` |
60
- | ID customization | `[[faceid]]` | `[[img0]] target/caption [[img1]] caption/of/source/image_1 [[img2]] caption/of/source/image_2 [[img3]] caption/of/source/image_3` |
61
- | Multiview | `[[multiview]]` | |
62
- | Subject-Driven | `[[subject_driven]]` | `<item: name/of/subject> [[img0]] target/caption/goes/here [[img1]] insert/source/caption` |
63
-
64
-
65
- Note that you can replace the cyan color above with any from below table and have multiple additional tokens to detect/segment multiple classes.
66
-
67
- ### 5.2 Available colors
68
-
69
-
70
- | Hex Code | Color Name |
71
- |:---------|:-----------|
72
- | #FF0000 | <span style="color: #FF0000">red</span> |
73
- | #00FF00 | <span style="color: #00FF00">lime</span> |
74
- | #0000FF | <span style="color: #0000FF">blue</span> |
75
- | #FFFF00 | <span style="color: #FFFF00">yellow</span> |
76
- | #FF00FF | <span style="color: #FF00FF">magenta</span> |
77
- | #00FFFF | <span style="color: #00FFFF">cyan</span> |
78
- | #FFA500 | <span style="color: #FFA500">orange</span> |
79
- | #800080 | <span style="color: #800080">purple</span> |
80
- | #A52A2A | <span style="color: #A52A2A">brown</span> |
81
- | #008000 | <span style="color: #008000">green</span> |
82
- | #FFC0CB | <span style="color: #FFC0CB">pink</span> |
83
- | #008080 | <span style="color: #008080">teal</span> |
84
- | #FF8C00 | <span style="color: #FF8C00">darkorange</span> |
85
- | #8A2BE2 | <span style="color: #8A2BE2">blueviolet</span> |
86
- | #006400 | <span style="color: #006400">darkgreen</span> |
87
- | #FF4500 | <span style="color: #FF4500">orangered</span> |
88
- | #000080 | <span style="color: #000080">navy</span> |
89
- | #FFD700 | <span style="color: #FFD700">gold</span> |
90
- | #40E0D0 | <span style="color: #40E0D0">turquoise</span> |
91
- | #DA70D6 | <span style="color: #DA70D6">orchid</span> |