Psycheswings Flux1 LoRA Repo update. Contains runtime samples, the config.yaml file, optimizer.pt file, and I also threw in ai-toolkit notebooks directory for later use and as backups of the trainers (Dev and Schnell)
Browse files- config.yaml +99 -0
- notebooks/FLUX_1_dev_LoRA_Training.ipynb +291 -0
- notebooks/FLUX_1_schnell_LoRA_Training.ipynb +296 -0
- notebooks/SliderTraining.ipynb +339 -0
- optimizer.pt +3 -0
- samples/1733773241950__000000000_0.jpg +3 -0
- samples/1733773260524__000000000_1.jpg +3 -0
- samples/1733773279091__000000000_2.jpg +3 -0
- samples/1733773297647__000000000_3.jpg +3 -0
- samples/1733773316204__000000000_4.jpg +3 -0
- samples/1733773334761__000000000_5.jpg +3 -0
- samples/1733773854236__000000250_0.jpg +3 -0
- samples/1733773872815__000000250_1.jpg +3 -0
- samples/1733773891384__000000250_2.jpg +3 -0
- samples/1733773909950__000000250_3.jpg +3 -0
- samples/1733773928516__000000250_4.jpg +3 -0
- samples/1733773947082__000000250_5.jpg +3 -0
- samples/1733774468334__000000500_0.jpg +3 -0
- samples/1733774486925__000000500_1.jpg +3 -0
- samples/1733774505504__000000500_2.jpg +3 -0
- samples/1733774524087__000000500_3.jpg +3 -0
- samples/1733774542663__000000500_4.jpg +3 -0
- samples/1733774561257__000000500_5.jpg +3 -0
- samples/1733775084642__000000750_0.jpg +3 -0
- samples/1733775103231__000000750_1.jpg +3 -0
- samples/1733775121813__000000750_2.jpg +3 -0
- samples/1733775140394__000000750_3.jpg +3 -0
- samples/1733775158977__000000750_4.jpg +3 -0
- samples/1733775177573__000000750_5.jpg +3 -0
- samples/1733775697559__000001000_0.jpg +3 -0
- samples/1733775716149__000001000_1.jpg +3 -0
- samples/1733775734740__000001000_2.jpg +3 -0
- samples/1733775753330__000001000_3.jpg +3 -0
- samples/1733775771915__000001000_4.jpg +3 -0
- samples/1733775790516__000001000_5.jpg +3 -0
- samples/1733776314928__000001250_0.jpg +3 -0
- samples/1733776333525__000001250_1.jpg +3 -0
- samples/1733776352107__000001250_2.jpg +3 -0
- samples/1733776370700__000001250_3.jpg +3 -0
- samples/1733776389287__000001250_4.jpg +3 -0
- samples/1733776407876__000001250_5.jpg +3 -0
- samples/1733776931659__000001500_0.jpg +3 -0
- samples/1733776950250__000001500_1.jpg +3 -0
- samples/1733776968841__000001500_2.jpg +3 -0
- samples/1733776987424__000001500_3.jpg +3 -0
- samples/1733777006010__000001500_4.jpg +3 -0
- samples/1733777024585__000001500_5.jpg +3 -0
config.yaml
ADDED
@@ -0,0 +1,99 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
job: extension
|
2 |
+
config:
|
3 |
+
name: Psycheswings-Flux1
|
4 |
+
process:
|
5 |
+
- type: sd_trainer
|
6 |
+
training_folder: /content/output
|
7 |
+
performance_log_every: 100
|
8 |
+
device: cuda:0
|
9 |
+
network:
|
10 |
+
type: lora
|
11 |
+
linear: 16
|
12 |
+
linear_alpha: 16
|
13 |
+
save:
|
14 |
+
dtype: float16
|
15 |
+
save_every: 250
|
16 |
+
max_step_saves_to_keep: 10
|
17 |
+
datasets:
|
18 |
+
- folder_path: /content/dataset
|
19 |
+
caption_ext: txt
|
20 |
+
caption_dropout_rate: 0.05
|
21 |
+
shuffle_tokens: false
|
22 |
+
cache_latents_to_disk: true
|
23 |
+
resolution:
|
24 |
+
- 512
|
25 |
+
- 768
|
26 |
+
- 1024
|
27 |
+
train:
|
28 |
+
batch_size: 1
|
29 |
+
steps: 4000
|
30 |
+
gradient_accumulation_steps: 1
|
31 |
+
train_unet: true
|
32 |
+
train_text_encoder: false
|
33 |
+
content_or_style: balanced
|
34 |
+
gradient_checkpointing: true
|
35 |
+
noise_scheduler: flowmatch
|
36 |
+
optimizer: adamw8bit
|
37 |
+
lr: 0.0004
|
38 |
+
ema_config:
|
39 |
+
use_ema: true
|
40 |
+
ema_decay: 0.99
|
41 |
+
dtype: bf16
|
42 |
+
model:
|
43 |
+
name_or_path: black-forest-labs/FLUX.1-dev
|
44 |
+
is_flux: true
|
45 |
+
quantize: true
|
46 |
+
sample:
|
47 |
+
sampler: flowmatch
|
48 |
+
sample_every: 250
|
49 |
+
width: 1024
|
50 |
+
height: 1024
|
51 |
+
prompts:
|
52 |
+
- psyche\(Person\), Psycheswings \(Person\), @psychedwings, Photo of a young
|
53 |
+
woman with curly blonde hair, smiling at the camera. She is wearing a red
|
54 |
+
bikini top that accentuates her large breasts and cleavage. The background
|
55 |
+
is a serene beach scene with gentle waves crashing onto the shore. The lighting
|
56 |
+
is soft and natural, casting gentle shadows on her face and body. The overall
|
57 |
+
mood is cheerful and inviting.
|
58 |
+
- psyche\(Person\), Psycheswings \(Person\), @psychedwings, A young woman with
|
59 |
+
curly blonde hair, wearing a blue and white striped shirt, smiling at the
|
60 |
+
camera. She is standing in the middle of the image, with a bright blue sky
|
61 |
+
and a few people in the background. The building behind her has arches and
|
62 |
+
intricate carvings, and the image is taken from a low angle, giving a clear
|
63 |
+
view of her upper body. The lighting is bright and natural, highlighting her
|
64 |
+
curly hair and the blue sky.
|
65 |
+
- psyche\(Person\), Psycheswings \(Person\), @psychedwings, Photo of a young
|
66 |
+
woman with curly blonde hair, wearing a pink ribbed crop top and blue jeans,
|
67 |
+
standing in a modern living room with a beige carpet, white walls, and shelves
|
68 |
+
with various items. She is smiling at the camera, looking directly at the
|
69 |
+
viewer. The lighting is bright and natural, coming from the left side of the
|
70 |
+
image. The woman has fair skin and a slender physique. The image has a high-quality,
|
71 |
+
professional feel.
|
72 |
+
- psyche\(Person\), Psycheswings \(Person\), @psychedwings, Photo of a young
|
73 |
+
woman with curly blonde hair, wearing a green dress with white daisies, sitting
|
74 |
+
on a grassy field at sunset. She has a fair complexion and is looking directly
|
75 |
+
at the camera with a slight smile. The background features a row of houses
|
76 |
+
and a tree. The lighting is warm and golden, casting gentle shadows on her
|
77 |
+
face and body. The overall mood is peaceful and serene.
|
78 |
+
- psyche\(Person\), Psycheswings \(Person\), @psychedwings, Photo of a young
|
79 |
+
woman with curly blonde hair, wearing a blue sweater vest over a white shirt
|
80 |
+
and a red plaid skirt, standing in a classroom setting with a desk and chair
|
81 |
+
in the background. She has a neutral expression and is looking directly at
|
82 |
+
the camera. The lighting is soft and natural, casting gentle shadows on her
|
83 |
+
face. The image has a high-quality, professional feel.
|
84 |
+
- psyche\(Person\), Psycheswings \(Person\), @psychedwings, Photo of a young
|
85 |
+
woman with curly blonde hair, wearing a yellow raincoat over a grey t-shirt,
|
86 |
+
standing on a rainy street in a city. She has a neutral expression and is
|
87 |
+
looking directly at the camera. She is holding a sign that says "Psyche".
|
88 |
+
The background features tall buildings and a cloudy sky. The image is taken
|
89 |
+
from a low angle, focusing on the woman's face and upper body. The lighting
|
90 |
+
is soft and natural, highlighting her features. The overall mood is moody
|
91 |
+
and rainy.
|
92 |
+
neg: ''
|
93 |
+
seed: 79200
|
94 |
+
walk_seed: true
|
95 |
+
guidance_scale: 4
|
96 |
+
sample_steps: 20
|
97 |
+
meta:
|
98 |
+
name: Psycheswings-Flux1
|
99 |
+
version: '1.0'
|
notebooks/FLUX_1_dev_LoRA_Training.ipynb
ADDED
@@ -0,0 +1,291 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cells": [
|
3 |
+
{
|
4 |
+
"cell_type": "markdown",
|
5 |
+
"metadata": {
|
6 |
+
"collapsed": false,
|
7 |
+
"id": "zl-S0m3pkQC5"
|
8 |
+
},
|
9 |
+
"source": [
|
10 |
+
"# AI Toolkit by Ostris\n",
|
11 |
+
"## FLUX.1-dev Training\n"
|
12 |
+
]
|
13 |
+
},
|
14 |
+
{
|
15 |
+
"cell_type": "code",
|
16 |
+
"execution_count": null,
|
17 |
+
"metadata": {},
|
18 |
+
"outputs": [],
|
19 |
+
"source": [
|
20 |
+
"!nvidia-smi"
|
21 |
+
]
|
22 |
+
},
|
23 |
+
{
|
24 |
+
"cell_type": "code",
|
25 |
+
"execution_count": null,
|
26 |
+
"metadata": {
|
27 |
+
"id": "BvAG0GKAh59G"
|
28 |
+
},
|
29 |
+
"outputs": [],
|
30 |
+
"source": [
|
31 |
+
"!git clone https://github.com/ostris/ai-toolkit\n",
|
32 |
+
"!mkdir -p /content/dataset"
|
33 |
+
]
|
34 |
+
},
|
35 |
+
{
|
36 |
+
"cell_type": "markdown",
|
37 |
+
"metadata": {
|
38 |
+
"id": "UFUW4ZMmnp1V"
|
39 |
+
},
|
40 |
+
"source": [
|
41 |
+
"Put your image dataset in the `/content/dataset` folder"
|
42 |
+
]
|
43 |
+
},
|
44 |
+
{
|
45 |
+
"cell_type": "code",
|
46 |
+
"execution_count": null,
|
47 |
+
"metadata": {
|
48 |
+
"id": "XGZqVER_aQJW"
|
49 |
+
},
|
50 |
+
"outputs": [],
|
51 |
+
"source": [
|
52 |
+
"!cd ai-toolkit && git submodule update --init --recursive && pip install -r requirements.txt\n"
|
53 |
+
]
|
54 |
+
},
|
55 |
+
{
|
56 |
+
"cell_type": "markdown",
|
57 |
+
"metadata": {
|
58 |
+
"id": "OV0HnOI6o8V6"
|
59 |
+
},
|
60 |
+
"source": [
|
61 |
+
"## Model License\n",
|
62 |
+
"Training currently only works with FLUX.1-dev. Which means anything you train will inherit the non-commercial license. It is also a gated model, so you need to accept the license on HF before using it. Otherwise, this will fail. Here are the required steps to setup a license.\n",
|
63 |
+
"\n",
|
64 |
+
"Sign into HF and accept the model access here [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)\n",
|
65 |
+
"\n",
|
66 |
+
"[Get a READ key from huggingface](https://huggingface.co/settings/tokens/new?) and place it in the next cell after running it."
|
67 |
+
]
|
68 |
+
},
|
69 |
+
{
|
70 |
+
"cell_type": "code",
|
71 |
+
"execution_count": null,
|
72 |
+
"metadata": {
|
73 |
+
"id": "3yZZdhFRoj2m"
|
74 |
+
},
|
75 |
+
"outputs": [],
|
76 |
+
"source": [
|
77 |
+
"import getpass\n",
|
78 |
+
"import os\n",
|
79 |
+
"\n",
|
80 |
+
"# Prompt for the token\n",
|
81 |
+
"hf_token = getpass.getpass('Enter your HF access token and press enter: ')\n",
|
82 |
+
"\n",
|
83 |
+
"# Set the environment variable\n",
|
84 |
+
"os.environ['HF_TOKEN'] = hf_token\n",
|
85 |
+
"\n",
|
86 |
+
"print(\"HF_TOKEN environment variable has been set.\")"
|
87 |
+
]
|
88 |
+
},
|
89 |
+
{
|
90 |
+
"cell_type": "code",
|
91 |
+
"execution_count": null,
|
92 |
+
"metadata": {
|
93 |
+
"id": "9gO2EzQ1kQC8"
|
94 |
+
},
|
95 |
+
"outputs": [],
|
96 |
+
"source": [
|
97 |
+
"import os\n",
|
98 |
+
"import sys\n",
|
99 |
+
"sys.path.append('/content/ai-toolkit')\n",
|
100 |
+
"from toolkit.job import run_job\n",
|
101 |
+
"from collections import OrderedDict\n",
|
102 |
+
"from PIL import Image\n",
|
103 |
+
"import os\n",
|
104 |
+
"os.environ[\"HF_HUB_ENABLE_HF_TRANSFER\"] = \"1\""
|
105 |
+
]
|
106 |
+
},
|
107 |
+
{
|
108 |
+
"cell_type": "markdown",
|
109 |
+
"metadata": {
|
110 |
+
"id": "N8UUFzVRigbC"
|
111 |
+
},
|
112 |
+
"source": [
|
113 |
+
"## Setup\n",
|
114 |
+
"\n",
|
115 |
+
"This is your config. It is documented pretty well. Normally you would do this as a yaml file, but for colab, this will work. This will run as is without modification, but feel free to edit as you want."
|
116 |
+
]
|
117 |
+
},
|
118 |
+
{
|
119 |
+
"cell_type": "code",
|
120 |
+
"execution_count": null,
|
121 |
+
"metadata": {
|
122 |
+
"id": "_t28QURYjRQO"
|
123 |
+
},
|
124 |
+
"outputs": [],
|
125 |
+
"source": [
|
126 |
+
"from collections import OrderedDict\n",
|
127 |
+
"\n",
|
128 |
+
"job_to_run = OrderedDict([\n",
|
129 |
+
" ('job', 'extension'),\n",
|
130 |
+
" ('config', OrderedDict([\n",
|
131 |
+
" # this name will be the folder and filename name\n",
|
132 |
+
" ('name', 'my_first_flux_lora_v1'),\n",
|
133 |
+
" ('process', [\n",
|
134 |
+
" OrderedDict([\n",
|
135 |
+
" ('type', 'sd_trainer'),\n",
|
136 |
+
" # root folder to save training sessions/samples/weights\n",
|
137 |
+
" ('training_folder', '/content/output'),\n",
|
138 |
+
" # uncomment to see performance stats in the terminal every N steps\n",
|
139 |
+
" #('performance_log_every', 1000),\n",
|
140 |
+
" ('device', 'cuda:0'),\n",
|
141 |
+
" # if a trigger word is specified, it will be added to captions of training data if it does not already exist\n",
|
142 |
+
" # alternatively, in your captions you can add [trigger] and it will be replaced with the trigger word\n",
|
143 |
+
" # ('trigger_word', 'image'),\n",
|
144 |
+
" ('network', OrderedDict([\n",
|
145 |
+
" ('type', 'lora'),\n",
|
146 |
+
" ('linear', 16),\n",
|
147 |
+
" ('linear_alpha', 16)\n",
|
148 |
+
" ])),\n",
|
149 |
+
" ('save', OrderedDict([\n",
|
150 |
+
" ('dtype', 'float16'), # precision to save\n",
|
151 |
+
" ('save_every', 250), # save every this many steps\n",
|
152 |
+
" ('max_step_saves_to_keep', 4) # how many intermittent saves to keep\n",
|
153 |
+
" ])),\n",
|
154 |
+
" ('datasets', [\n",
|
155 |
+
" # datasets are a folder of images. captions need to be txt files with the same name as the image\n",
|
156 |
+
" # for instance image2.jpg and image2.txt. Only jpg, jpeg, and png are supported currently\n",
|
157 |
+
" # images will automatically be resized and bucketed into the resolution specified\n",
|
158 |
+
" OrderedDict([\n",
|
159 |
+
" ('folder_path', '/content/dataset'),\n",
|
160 |
+
" ('caption_ext', 'txt'),\n",
|
161 |
+
" ('caption_dropout_rate', 0.05), # will drop out the caption 5% of time\n",
|
162 |
+
" ('shuffle_tokens', False), # shuffle caption order, split by commas\n",
|
163 |
+
" ('cache_latents_to_disk', True), # leave this true unless you know what you're doing\n",
|
164 |
+
" ('resolution', [512, 768, 1024]) # flux enjoys multiple resolutions\n",
|
165 |
+
" ])\n",
|
166 |
+
" ]),\n",
|
167 |
+
" ('train', OrderedDict([\n",
|
168 |
+
" ('batch_size', 1),\n",
|
169 |
+
" ('steps', 2000), # total number of steps to train 500 - 4000 is a good range\n",
|
170 |
+
" ('gradient_accumulation_steps', 1),\n",
|
171 |
+
" ('train_unet', True),\n",
|
172 |
+
" ('train_text_encoder', False), # probably won't work with flux\n",
|
173 |
+
" ('content_or_style', 'balanced'), # content, style, balanced\n",
|
174 |
+
" ('gradient_checkpointing', True), # need the on unless you have a ton of vram\n",
|
175 |
+
" ('noise_scheduler', 'flowmatch'), # for training only\n",
|
176 |
+
" ('optimizer', 'adamw8bit'),\n",
|
177 |
+
" ('lr', 1e-4),\n",
|
178 |
+
"\n",
|
179 |
+
" # uncomment this to skip the pre training sample\n",
|
180 |
+
" # ('skip_first_sample', True),\n",
|
181 |
+
"\n",
|
182 |
+
" # uncomment to completely disable sampling\n",
|
183 |
+
" # ('disable_sampling', True),\n",
|
184 |
+
"\n",
|
185 |
+
" # uncomment to use new vell curved weighting. Experimental but may produce better results\n",
|
186 |
+
" # ('linear_timesteps', True),\n",
|
187 |
+
"\n",
|
188 |
+
" # ema will smooth out learning, but could slow it down. Recommended to leave on.\n",
|
189 |
+
" ('ema_config', OrderedDict([\n",
|
190 |
+
" ('use_ema', True),\n",
|
191 |
+
" ('ema_decay', 0.99)\n",
|
192 |
+
" ])),\n",
|
193 |
+
"\n",
|
194 |
+
" # will probably need this if gpu supports it for flux, other dtypes may not work correctly\n",
|
195 |
+
" ('dtype', 'bf16')\n",
|
196 |
+
" ])),\n",
|
197 |
+
" ('model', OrderedDict([\n",
|
198 |
+
" # huggingface model name or path\n",
|
199 |
+
" ('name_or_path', 'black-forest-labs/FLUX.1-dev'),\n",
|
200 |
+
" ('is_flux', True),\n",
|
201 |
+
" ('quantize', True), # run 8bit mixed precision\n",
|
202 |
+
" #('low_vram', True), # uncomment this if the GPU is connected to your monitors. It will use less vram to quantize, but is slower.\n",
|
203 |
+
" ])),\n",
|
204 |
+
" ('sample', OrderedDict([\n",
|
205 |
+
" ('sampler', 'flowmatch'), # must match train.noise_scheduler\n",
|
206 |
+
" ('sample_every', 250), # sample every this many steps\n",
|
207 |
+
" ('width', 1024),\n",
|
208 |
+
" ('height', 1024),\n",
|
209 |
+
" ('prompts', [\n",
|
210 |
+
" # you can add [trigger] to the prompts here and it will be replaced with the trigger word\n",
|
211 |
+
" #'[trigger] holding a sign that says \\'I LOVE PROMPTS!\\'',\n",
|
212 |
+
" 'woman with red hair, playing chess at the park, bomb going off in the background',\n",
|
213 |
+
" 'a woman holding a coffee cup, in a beanie, sitting at a cafe',\n",
|
214 |
+
" 'a horse is a DJ at a night club, fish eye lens, smoke machine, lazer lights, holding a martini',\n",
|
215 |
+
" 'a man showing off his cool new t shirt at the beach, a shark is jumping out of the water in the background',\n",
|
216 |
+
" 'a bear building a log cabin in the snow covered mountains',\n",
|
217 |
+
" 'woman playing the guitar, on stage, singing a song, laser lights, punk rocker',\n",
|
218 |
+
" 'hipster man with a beard, building a chair, in a wood shop',\n",
|
219 |
+
" 'photo of a man, white background, medium shot, modeling clothing, studio lighting, white backdrop',\n",
|
220 |
+
" 'a man holding a sign that says, \\'this is a sign\\'',\n",
|
221 |
+
" 'a bulldog, in a post apocalyptic world, with a shotgun, in a leather jacket, in a desert, with a motorcycle'\n",
|
222 |
+
" ]),\n",
|
223 |
+
" ('neg', ''), # not used on flux\n",
|
224 |
+
" ('seed', 42),\n",
|
225 |
+
" ('walk_seed', True),\n",
|
226 |
+
" ('guidance_scale', 4),\n",
|
227 |
+
" ('sample_steps', 20)\n",
|
228 |
+
" ]))\n",
|
229 |
+
" ])\n",
|
230 |
+
" ])\n",
|
231 |
+
" ])),\n",
|
232 |
+
" # you can add any additional meta info here. [name] is replaced with config name at top\n",
|
233 |
+
" ('meta', OrderedDict([\n",
|
234 |
+
" ('name', '[name]'),\n",
|
235 |
+
" ('version', '1.0')\n",
|
236 |
+
" ]))\n",
|
237 |
+
"])\n"
|
238 |
+
]
|
239 |
+
},
|
240 |
+
{
|
241 |
+
"cell_type": "markdown",
|
242 |
+
"metadata": {
|
243 |
+
"id": "h6F1FlM2Wb3l"
|
244 |
+
},
|
245 |
+
"source": [
|
246 |
+
"## Run it\n",
|
247 |
+
"\n",
|
248 |
+
"Below does all the magic. Check your folders to the left. Items will be in output/LoRA/your_name_v1 In the samples folder, there are preiodic sampled. This doesnt work great with colab. They will be in /content/output"
|
249 |
+
]
|
250 |
+
},
|
251 |
+
{
|
252 |
+
"cell_type": "code",
|
253 |
+
"execution_count": null,
|
254 |
+
"metadata": {
|
255 |
+
"id": "HkajwI8gteOh"
|
256 |
+
},
|
257 |
+
"outputs": [],
|
258 |
+
"source": [
|
259 |
+
"run_job(job_to_run)\n"
|
260 |
+
]
|
261 |
+
},
|
262 |
+
{
|
263 |
+
"cell_type": "markdown",
|
264 |
+
"metadata": {
|
265 |
+
"id": "Hblgb5uwW5SD"
|
266 |
+
},
|
267 |
+
"source": [
|
268 |
+
"## Done\n",
|
269 |
+
"\n",
|
270 |
+
"Check your ourput dir and get your slider\n"
|
271 |
+
]
|
272 |
+
}
|
273 |
+
],
|
274 |
+
"metadata": {
|
275 |
+
"accelerator": "GPU",
|
276 |
+
"colab": {
|
277 |
+
"gpuType": "A100",
|
278 |
+
"machine_shape": "hm",
|
279 |
+
"provenance": []
|
280 |
+
},
|
281 |
+
"kernelspec": {
|
282 |
+
"display_name": "Python 3",
|
283 |
+
"name": "python3"
|
284 |
+
},
|
285 |
+
"language_info": {
|
286 |
+
"name": "python"
|
287 |
+
}
|
288 |
+
},
|
289 |
+
"nbformat": 4,
|
290 |
+
"nbformat_minor": 0
|
291 |
+
}
|
notebooks/FLUX_1_schnell_LoRA_Training.ipynb
ADDED
@@ -0,0 +1,296 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cells": [
|
3 |
+
{
|
4 |
+
"cell_type": "markdown",
|
5 |
+
"metadata": {
|
6 |
+
"collapsed": false,
|
7 |
+
"id": "zl-S0m3pkQC5"
|
8 |
+
},
|
9 |
+
"source": [
|
10 |
+
"# AI Toolkit by Ostris\n",
|
11 |
+
"## FLUX.1-schnell Training\n"
|
12 |
+
]
|
13 |
+
},
|
14 |
+
{
|
15 |
+
"cell_type": "code",
|
16 |
+
"execution_count": null,
|
17 |
+
"metadata": {
|
18 |
+
"id": "3cokMT-WC6rG"
|
19 |
+
},
|
20 |
+
"outputs": [],
|
21 |
+
"source": [
|
22 |
+
"!nvidia-smi"
|
23 |
+
]
|
24 |
+
},
|
25 |
+
{
|
26 |
+
"cell_type": "code",
|
27 |
+
"execution_count": null,
|
28 |
+
"metadata": {
|
29 |
+
"collapsed": true,
|
30 |
+
"id": "BvAG0GKAh59G"
|
31 |
+
},
|
32 |
+
"outputs": [],
|
33 |
+
"source": [
|
34 |
+
"!git clone https://github.com/ostris/ai-toolkit\n",
|
35 |
+
"!mkdir -p /content/dataset"
|
36 |
+
]
|
37 |
+
},
|
38 |
+
{
|
39 |
+
"cell_type": "markdown",
|
40 |
+
"metadata": {
|
41 |
+
"id": "UFUW4ZMmnp1V"
|
42 |
+
},
|
43 |
+
"source": [
|
44 |
+
"Put your image dataset in the `/content/dataset` folder"
|
45 |
+
]
|
46 |
+
},
|
47 |
+
{
|
48 |
+
"cell_type": "code",
|
49 |
+
"execution_count": null,
|
50 |
+
"metadata": {
|
51 |
+
"collapsed": true,
|
52 |
+
"id": "XGZqVER_aQJW"
|
53 |
+
},
|
54 |
+
"outputs": [],
|
55 |
+
"source": [
|
56 |
+
"!cd ai-toolkit && git submodule update --init --recursive && pip install -r requirements.txt\n"
|
57 |
+
]
|
58 |
+
},
|
59 |
+
{
|
60 |
+
"cell_type": "markdown",
|
61 |
+
"metadata": {
|
62 |
+
"id": "OV0HnOI6o8V6"
|
63 |
+
},
|
64 |
+
"source": [
|
65 |
+
"## Model License\n",
|
66 |
+
"Training currently only works with FLUX.1-dev. Which means anything you train will inherit the non-commercial license. It is also a gated model, so you need to accept the license on HF before using it. Otherwise, this will fail. Here are the required steps to setup a license.\n",
|
67 |
+
"\n",
|
68 |
+
"Sign into HF and accept the model access here [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)\n",
|
69 |
+
"\n",
|
70 |
+
"[Get a READ key from huggingface](https://huggingface.co/settings/tokens/new?) and place it in the next cell after running it."
|
71 |
+
]
|
72 |
+
},
|
73 |
+
{
|
74 |
+
"cell_type": "code",
|
75 |
+
"execution_count": null,
|
76 |
+
"metadata": {
|
77 |
+
"id": "3yZZdhFRoj2m"
|
78 |
+
},
|
79 |
+
"outputs": [],
|
80 |
+
"source": [
|
81 |
+
"import getpass\n",
|
82 |
+
"import os\n",
|
83 |
+
"\n",
|
84 |
+
"# Prompt for the token\n",
|
85 |
+
"hf_token = getpass.getpass('Enter your HF access token and press enter: ')\n",
|
86 |
+
"\n",
|
87 |
+
"# Set the environment variable\n",
|
88 |
+
"os.environ['HF_TOKEN'] = hf_token\n",
|
89 |
+
"\n",
|
90 |
+
"print(\"HF_TOKEN environment variable has been set.\")"
|
91 |
+
]
|
92 |
+
},
|
93 |
+
{
|
94 |
+
"cell_type": "code",
|
95 |
+
"execution_count": 5,
|
96 |
+
"metadata": {
|
97 |
+
"id": "9gO2EzQ1kQC8"
|
98 |
+
},
|
99 |
+
"outputs": [],
|
100 |
+
"source": [
|
101 |
+
"import os\n",
|
102 |
+
"import sys\n",
|
103 |
+
"sys.path.append('/content/ai-toolkit')\n",
|
104 |
+
"from toolkit.job import run_job\n",
|
105 |
+
"from collections import OrderedDict\n",
|
106 |
+
"from PIL import Image\n",
|
107 |
+
"import os\n",
|
108 |
+
"os.environ[\"HF_HUB_ENABLE_HF_TRANSFER\"] = \"1\""
|
109 |
+
]
|
110 |
+
},
|
111 |
+
{
|
112 |
+
"cell_type": "markdown",
|
113 |
+
"metadata": {
|
114 |
+
"id": "N8UUFzVRigbC"
|
115 |
+
},
|
116 |
+
"source": [
|
117 |
+
"## Setup\n",
|
118 |
+
"\n",
|
119 |
+
"This is your config. It is documented pretty well. Normally you would do this as a yaml file, but for colab, this will work. This will run as is without modification, but feel free to edit as you want."
|
120 |
+
]
|
121 |
+
},
|
122 |
+
{
|
123 |
+
"cell_type": "code",
|
124 |
+
"execution_count": 6,
|
125 |
+
"metadata": {
|
126 |
+
"id": "_t28QURYjRQO"
|
127 |
+
},
|
128 |
+
"outputs": [],
|
129 |
+
"source": [
|
130 |
+
"from collections import OrderedDict\n",
|
131 |
+
"\n",
|
132 |
+
"job_to_run = OrderedDict([\n",
|
133 |
+
" ('job', 'extension'),\n",
|
134 |
+
" ('config', OrderedDict([\n",
|
135 |
+
" # this name will be the folder and filename name\n",
|
136 |
+
" ('name', 'my_first_flux_lora_v1'),\n",
|
137 |
+
" ('process', [\n",
|
138 |
+
" OrderedDict([\n",
|
139 |
+
" ('type', 'sd_trainer'),\n",
|
140 |
+
" # root folder to save training sessions/samples/weights\n",
|
141 |
+
" ('training_folder', '/content/output'),\n",
|
142 |
+
" # uncomment to see performance stats in the terminal every N steps\n",
|
143 |
+
" #('performance_log_every', 1000),\n",
|
144 |
+
" ('device', 'cuda:0'),\n",
|
145 |
+
" # if a trigger word is specified, it will be added to captions of training data if it does not already exist\n",
|
146 |
+
" # alternatively, in your captions you can add [trigger] and it will be replaced with the trigger word\n",
|
147 |
+
" # ('trigger_word', 'image'),\n",
|
148 |
+
" ('network', OrderedDict([\n",
|
149 |
+
" ('type', 'lora'),\n",
|
150 |
+
" ('linear', 16),\n",
|
151 |
+
" ('linear_alpha', 16)\n",
|
152 |
+
" ])),\n",
|
153 |
+
" ('save', OrderedDict([\n",
|
154 |
+
" ('dtype', 'float16'), # precision to save\n",
|
155 |
+
" ('save_every', 250), # save every this many steps\n",
|
156 |
+
" ('max_step_saves_to_keep', 4) # how many intermittent saves to keep\n",
|
157 |
+
" ])),\n",
|
158 |
+
" ('datasets', [\n",
|
159 |
+
" # datasets are a folder of images. captions need to be txt files with the same name as the image\n",
|
160 |
+
" # for instance image2.jpg and image2.txt. Only jpg, jpeg, and png are supported currently\n",
|
161 |
+
" # images will automatically be resized and bucketed into the resolution specified\n",
|
162 |
+
" OrderedDict([\n",
|
163 |
+
" ('folder_path', '/content/dataset'),\n",
|
164 |
+
" ('caption_ext', 'txt'),\n",
|
165 |
+
" ('caption_dropout_rate', 0.05), # will drop out the caption 5% of time\n",
|
166 |
+
" ('shuffle_tokens', False), # shuffle caption order, split by commas\n",
|
167 |
+
" ('cache_latents_to_disk', True), # leave this true unless you know what you're doing\n",
|
168 |
+
" ('resolution', [512, 768, 1024]) # flux enjoys multiple resolutions\n",
|
169 |
+
" ])\n",
|
170 |
+
" ]),\n",
|
171 |
+
" ('train', OrderedDict([\n",
|
172 |
+
" ('batch_size', 1),\n",
|
173 |
+
" ('steps', 2000), # total number of steps to train 500 - 4000 is a good range\n",
|
174 |
+
" ('gradient_accumulation_steps', 1),\n",
|
175 |
+
" ('train_unet', True),\n",
|
176 |
+
" ('train_text_encoder', False), # probably won't work with flux\n",
|
177 |
+
" ('gradient_checkpointing', True), # need the on unless you have a ton of vram\n",
|
178 |
+
" ('noise_scheduler', 'flowmatch'), # for training only\n",
|
179 |
+
" ('optimizer', 'adamw8bit'),\n",
|
180 |
+
" ('lr', 1e-4),\n",
|
181 |
+
"\n",
|
182 |
+
" # uncomment this to skip the pre training sample\n",
|
183 |
+
" # ('skip_first_sample', True),\n",
|
184 |
+
"\n",
|
185 |
+
" # uncomment to completely disable sampling\n",
|
186 |
+
" # ('disable_sampling', True),\n",
|
187 |
+
"\n",
|
188 |
+
" # uncomment to use new vell curved weighting. Experimental but may produce better results\n",
|
189 |
+
" # ('linear_timesteps', True),\n",
|
190 |
+
"\n",
|
191 |
+
" # ema will smooth out learning, but could slow it down. Recommended to leave on.\n",
|
192 |
+
" ('ema_config', OrderedDict([\n",
|
193 |
+
" ('use_ema', True),\n",
|
194 |
+
" ('ema_decay', 0.99)\n",
|
195 |
+
" ])),\n",
|
196 |
+
"\n",
|
197 |
+
" # will probably need this if gpu supports it for flux, other dtypes may not work correctly\n",
|
198 |
+
" ('dtype', 'bf16')\n",
|
199 |
+
" ])),\n",
|
200 |
+
" ('model', OrderedDict([\n",
|
201 |
+
" # huggingface model name or path\n",
|
202 |
+
" ('name_or_path', 'black-forest-labs/FLUX.1-schnell'),\n",
|
203 |
+
" ('assistant_lora_path', 'ostris/FLUX.1-schnell-training-adapter'), # Required for flux schnell training\n",
|
204 |
+
" ('is_flux', True),\n",
|
205 |
+
" ('quantize', True), # run 8bit mixed precision\n",
|
206 |
+
" # low_vram is painfully slow to fuse in the adapter avoid it unless absolutely necessary\n",
|
207 |
+
" #('low_vram', True), # uncomment this if the GPU is connected to your monitors. It will use less vram to quantize, but is slower.\n",
|
208 |
+
" ])),\n",
|
209 |
+
" ('sample', OrderedDict([\n",
|
210 |
+
" ('sampler', 'flowmatch'), # must match train.noise_scheduler\n",
|
211 |
+
" ('sample_every', 250), # sample every this many steps\n",
|
212 |
+
" ('width', 1024),\n",
|
213 |
+
" ('height', 1024),\n",
|
214 |
+
" ('prompts', [\n",
|
215 |
+
" # you can add [trigger] to the prompts here and it will be replaced with the trigger word\n",
|
216 |
+
" #'[trigger] holding a sign that says \\'I LOVE PROMPTS!\\'',\n",
|
217 |
+
" 'woman with red hair, playing chess at the park, bomb going off in the background',\n",
|
218 |
+
" 'a woman holding a coffee cup, in a beanie, sitting at a cafe',\n",
|
219 |
+
" 'a horse is a DJ at a night club, fish eye lens, smoke machine, lazer lights, holding a martini',\n",
|
220 |
+
" 'a man showing off his cool new t shirt at the beach, a shark is jumping out of the water in the background',\n",
|
221 |
+
" 'a bear building a log cabin in the snow covered mountains',\n",
|
222 |
+
" 'woman playing the guitar, on stage, singing a song, laser lights, punk rocker',\n",
|
223 |
+
" 'hipster man with a beard, building a chair, in a wood shop',\n",
|
224 |
+
" 'photo of a man, white background, medium shot, modeling clothing, studio lighting, white backdrop',\n",
|
225 |
+
" 'a man holding a sign that says, \\'this is a sign\\'',\n",
|
226 |
+
" 'a bulldog, in a post apocalyptic world, with a shotgun, in a leather jacket, in a desert, with a motorcycle'\n",
|
227 |
+
" ]),\n",
|
228 |
+
" ('neg', ''), # not used on flux\n",
|
229 |
+
" ('seed', 42),\n",
|
230 |
+
" ('walk_seed', True),\n",
|
231 |
+
" ('guidance_scale', 1), # schnell does not do guidance\n",
|
232 |
+
" ('sample_steps', 4) # 1 - 4 works well\n",
|
233 |
+
" ]))\n",
|
234 |
+
" ])\n",
|
235 |
+
" ])\n",
|
236 |
+
" ])),\n",
|
237 |
+
" # you can add any additional meta info here. [name] is replaced with config name at top\n",
|
238 |
+
" ('meta', OrderedDict([\n",
|
239 |
+
" ('name', '[name]'),\n",
|
240 |
+
" ('version', '1.0')\n",
|
241 |
+
" ]))\n",
|
242 |
+
"])\n"
|
243 |
+
]
|
244 |
+
},
|
245 |
+
{
|
246 |
+
"cell_type": "markdown",
|
247 |
+
"metadata": {
|
248 |
+
"id": "h6F1FlM2Wb3l"
|
249 |
+
},
|
250 |
+
"source": [
|
251 |
+
"## Run it\n",
|
252 |
+
"\n",
|
253 |
+
"Below does all the magic. Check your folders to the left. Items will be in output/LoRA/your_name_v1 In the samples folder, there are preiodic sampled. This doesnt work great with colab. They will be in /content/output"
|
254 |
+
]
|
255 |
+
},
|
256 |
+
{
|
257 |
+
"cell_type": "code",
|
258 |
+
"execution_count": null,
|
259 |
+
"metadata": {
|
260 |
+
"id": "HkajwI8gteOh"
|
261 |
+
},
|
262 |
+
"outputs": [],
|
263 |
+
"source": [
|
264 |
+
"run_job(job_to_run)\n"
|
265 |
+
]
|
266 |
+
},
|
267 |
+
{
|
268 |
+
"cell_type": "markdown",
|
269 |
+
"metadata": {
|
270 |
+
"id": "Hblgb5uwW5SD"
|
271 |
+
},
|
272 |
+
"source": [
|
273 |
+
"## Done\n",
|
274 |
+
"\n",
|
275 |
+
"Check your ourput dir and get your slider\n"
|
276 |
+
]
|
277 |
+
}
|
278 |
+
],
|
279 |
+
"metadata": {
|
280 |
+
"accelerator": "GPU",
|
281 |
+
"colab": {
|
282 |
+
"gpuType": "A100",
|
283 |
+
"machine_shape": "hm",
|
284 |
+
"provenance": []
|
285 |
+
},
|
286 |
+
"kernelspec": {
|
287 |
+
"display_name": "Python 3",
|
288 |
+
"name": "python3"
|
289 |
+
},
|
290 |
+
"language_info": {
|
291 |
+
"name": "python"
|
292 |
+
}
|
293 |
+
},
|
294 |
+
"nbformat": 4,
|
295 |
+
"nbformat_minor": 0
|
296 |
+
}
|
notebooks/SliderTraining.ipynb
ADDED
@@ -0,0 +1,339 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"nbformat": 4,
|
3 |
+
"nbformat_minor": 0,
|
4 |
+
"metadata": {
|
5 |
+
"colab": {
|
6 |
+
"provenance": [],
|
7 |
+
"machine_shape": "hm",
|
8 |
+
"gpuType": "V100"
|
9 |
+
},
|
10 |
+
"kernelspec": {
|
11 |
+
"name": "python3",
|
12 |
+
"display_name": "Python 3"
|
13 |
+
},
|
14 |
+
"language_info": {
|
15 |
+
"name": "python"
|
16 |
+
},
|
17 |
+
"accelerator": "GPU"
|
18 |
+
},
|
19 |
+
"cells": [
|
20 |
+
{
|
21 |
+
"cell_type": "markdown",
|
22 |
+
"source": [
|
23 |
+
"# AI Toolkit by Ostris\n",
|
24 |
+
"## Slider Training\n",
|
25 |
+
"\n",
|
26 |
+
"This is a quick colab demo for training sliders like can be found in my CivitAI profile https://civitai.com/user/Ostris/models . I will work on making it more user friendly, but for now, it will get you started."
|
27 |
+
],
|
28 |
+
"metadata": {
|
29 |
+
"collapsed": false
|
30 |
+
}
|
31 |
+
},
|
32 |
+
{
|
33 |
+
"cell_type": "code",
|
34 |
+
"source": [
|
35 |
+
"!git clone https://github.com/ostris/ai-toolkit"
|
36 |
+
],
|
37 |
+
"metadata": {
|
38 |
+
"id": "BvAG0GKAh59G"
|
39 |
+
},
|
40 |
+
"execution_count": null,
|
41 |
+
"outputs": []
|
42 |
+
},
|
43 |
+
{
|
44 |
+
"cell_type": "code",
|
45 |
+
"execution_count": null,
|
46 |
+
"metadata": {
|
47 |
+
"id": "XGZqVER_aQJW"
|
48 |
+
},
|
49 |
+
"outputs": [],
|
50 |
+
"source": [
|
51 |
+
"!cd ai-toolkit && git submodule update --init --recursive && pip install -r requirements.txt\n"
|
52 |
+
]
|
53 |
+
},
|
54 |
+
{
|
55 |
+
"cell_type": "code",
|
56 |
+
"source": [
|
57 |
+
"import os\n",
|
58 |
+
"import sys\n",
|
59 |
+
"sys.path.append('/content/ai-toolkit')\n",
|
60 |
+
"from toolkit.job import run_job\n",
|
61 |
+
"from collections import OrderedDict\n",
|
62 |
+
"from PIL import Image"
|
63 |
+
],
|
64 |
+
"metadata": {
|
65 |
+
"collapsed": false
|
66 |
+
},
|
67 |
+
"outputs": []
|
68 |
+
},
|
69 |
+
{
|
70 |
+
"cell_type": "markdown",
|
71 |
+
"source": [
|
72 |
+
"## Setup\n",
|
73 |
+
"\n",
|
74 |
+
"This is your config. It is documented pretty well. Normally you would do this as a yaml file, but for colab, this will work. This will run as is without modification, but feel free to edit as you want."
|
75 |
+
],
|
76 |
+
"metadata": {
|
77 |
+
"id": "N8UUFzVRigbC"
|
78 |
+
}
|
79 |
+
},
|
80 |
+
{
|
81 |
+
"cell_type": "code",
|
82 |
+
"source": [
|
83 |
+
"from collections import OrderedDict\n",
|
84 |
+
"\n",
|
85 |
+
"job_to_run = OrderedDict({\n",
|
86 |
+
" # This is the config I use on my sliders, It is solid and tested\n",
|
87 |
+
" 'job': 'train',\n",
|
88 |
+
" 'config': {\n",
|
89 |
+
" # the name will be used to create a folder in the output folder\n",
|
90 |
+
" # it will also replace any [name] token in the rest of this config\n",
|
91 |
+
" 'name': 'detail_slider_v1',\n",
|
92 |
+
" # folder will be created with name above in folder below\n",
|
93 |
+
" # it can be relative to the project root or absolute\n",
|
94 |
+
" 'training_folder': \"output/LoRA\",\n",
|
95 |
+
" 'device': 'cuda', # cpu, cuda:0, etc\n",
|
96 |
+
" # for tensorboard logging, we will make a subfolder for this job\n",
|
97 |
+
" 'log_dir': \"output/.tensorboard\",\n",
|
98 |
+
" # you can stack processes for other jobs, It is not tested with sliders though\n",
|
99 |
+
" # just use one for now\n",
|
100 |
+
" 'process': [\n",
|
101 |
+
" {\n",
|
102 |
+
" 'type': 'slider', # tells runner to run the slider process\n",
|
103 |
+
" # network is the LoRA network for a slider, I recommend to leave this be\n",
|
104 |
+
" 'network': {\n",
|
105 |
+
" 'type': \"lora\",\n",
|
106 |
+
" # rank / dim of the network. Bigger is not always better. Especially for sliders. 8 is good\n",
|
107 |
+
" 'linear': 8, # \"rank\" or \"dim\"\n",
|
108 |
+
" 'linear_alpha': 4, # Do about half of rank \"alpha\"\n",
|
109 |
+
" # 'conv': 4, # for convolutional layers \"locon\"\n",
|
110 |
+
" # 'conv_alpha': 4, # Do about half of conv \"alpha\"\n",
|
111 |
+
" },\n",
|
112 |
+
" # training config\n",
|
113 |
+
" 'train': {\n",
|
114 |
+
" # this is also used in sampling. Stick with ddpm unless you know what you are doing\n",
|
115 |
+
" 'noise_scheduler': \"ddpm\", # or \"ddpm\", \"lms\", \"euler_a\"\n",
|
116 |
+
" # how many steps to train. More is not always better. I rarely go over 1000\n",
|
117 |
+
" 'steps': 100,\n",
|
118 |
+
" # I have had good results with 4e-4 to 1e-4 at 500 steps\n",
|
119 |
+
" 'lr': 2e-4,\n",
|
120 |
+
" # enables gradient checkpoint, saves vram, leave it on\n",
|
121 |
+
" 'gradient_checkpointing': True,\n",
|
122 |
+
" # train the unet. I recommend leaving this true\n",
|
123 |
+
" 'train_unet': True,\n",
|
124 |
+
" # train the text encoder. I don't recommend this unless you have a special use case\n",
|
125 |
+
" # for sliders we are adjusting representation of the concept (unet),\n",
|
126 |
+
" # not the description of it (text encoder)\n",
|
127 |
+
" 'train_text_encoder': False,\n",
|
128 |
+
"\n",
|
129 |
+
" # just leave unless you know what you are doing\n",
|
130 |
+
" # also supports \"dadaptation\" but set lr to 1 if you use that,\n",
|
131 |
+
" # but it learns too fast and I don't recommend it\n",
|
132 |
+
" 'optimizer': \"adamw\",\n",
|
133 |
+
" # only constant for now\n",
|
134 |
+
" 'lr_scheduler': \"constant\",\n",
|
135 |
+
" # we randomly denoise random num of steps form 1 to this number\n",
|
136 |
+
" # while training. Just leave it\n",
|
137 |
+
" 'max_denoising_steps': 40,\n",
|
138 |
+
" # works great at 1. I do 1 even with my 4090.\n",
|
139 |
+
" # higher may not work right with newer single batch stacking code anyway\n",
|
140 |
+
" 'batch_size': 1,\n",
|
141 |
+
" # bf16 works best if your GPU supports it (modern)\n",
|
142 |
+
" 'dtype': 'bf16', # fp32, bf16, fp16\n",
|
143 |
+
" # I don't recommend using unless you are trying to make a darker lora. Then do 0.1 MAX\n",
|
144 |
+
" # although, the way we train sliders is comparative, so it probably won't work anyway\n",
|
145 |
+
" 'noise_offset': 0.0,\n",
|
146 |
+
" },\n",
|
147 |
+
"\n",
|
148 |
+
" # the model to train the LoRA network on\n",
|
149 |
+
" 'model': {\n",
|
150 |
+
" # name_or_path can be a hugging face name, local path or url to model\n",
|
151 |
+
" # on civit ai with or without modelVersionId. They will be cached in /model folder\n",
|
152 |
+
" # epicRealisim v5\n",
|
153 |
+
" 'name_or_path': \"https://civitai.com/models/25694?modelVersionId=134065\",\n",
|
154 |
+
" 'is_v2': False, # for v2 models\n",
|
155 |
+
" 'is_v_pred': False, # for v-prediction models (most v2 models)\n",
|
156 |
+
" # has some issues with the dual text encoder and the way we train sliders\n",
|
157 |
+
" # it works bit weights need to probably be higher to see it.\n",
|
158 |
+
" 'is_xl': False, # for SDXL models\n",
|
159 |
+
" },\n",
|
160 |
+
"\n",
|
161 |
+
" # saving config\n",
|
162 |
+
" 'save': {\n",
|
163 |
+
" 'dtype': 'float16', # precision to save. I recommend float16\n",
|
164 |
+
" 'save_every': 50, # save every this many steps\n",
|
165 |
+
" # this will remove step counts more than this number\n",
|
166 |
+
" # allows you to save more often in case of a crash without filling up your drive\n",
|
167 |
+
" 'max_step_saves_to_keep': 2,\n",
|
168 |
+
" },\n",
|
169 |
+
"\n",
|
170 |
+
" # sampling config\n",
|
171 |
+
" 'sample': {\n",
|
172 |
+
" # must match train.noise_scheduler, this is not used here\n",
|
173 |
+
" # but may be in future and in other processes\n",
|
174 |
+
" 'sampler': \"ddpm\",\n",
|
175 |
+
" # sample every this many steps\n",
|
176 |
+
" 'sample_every': 20,\n",
|
177 |
+
" # image size\n",
|
178 |
+
" 'width': 512,\n",
|
179 |
+
" 'height': 512,\n",
|
180 |
+
" # prompts to use for sampling. Do as many as you want, but it slows down training\n",
|
181 |
+
" # pick ones that will best represent the concept you are trying to adjust\n",
|
182 |
+
" # allows some flags after the prompt\n",
|
183 |
+
" # --m [number] # network multiplier. LoRA weight. -3 for the negative slide, 3 for the positive\n",
|
184 |
+
" # slide are good tests. will inherit sample.network_multiplier if not set\n",
|
185 |
+
" # --n [string] # negative prompt, will inherit sample.neg if not set\n",
|
186 |
+
" # Only 75 tokens allowed currently\n",
|
187 |
+
" # I like to do a wide positive and negative spread so I can see a good range and stop\n",
|
188 |
+
" # early if the network is braking down\n",
|
189 |
+
" 'prompts': [\n",
|
190 |
+
" \"a woman in a coffee shop, black hat, blonde hair, blue jacket --m -5\",\n",
|
191 |
+
" \"a woman in a coffee shop, black hat, blonde hair, blue jacket --m -3\",\n",
|
192 |
+
" \"a woman in a coffee shop, black hat, blonde hair, blue jacket --m 3\",\n",
|
193 |
+
" \"a woman in a coffee shop, black hat, blonde hair, blue jacket --m 5\",\n",
|
194 |
+
" \"a golden retriever sitting on a leather couch, --m -5\",\n",
|
195 |
+
" \"a golden retriever sitting on a leather couch --m -3\",\n",
|
196 |
+
" \"a golden retriever sitting on a leather couch --m 3\",\n",
|
197 |
+
" \"a golden retriever sitting on a leather couch --m 5\",\n",
|
198 |
+
" \"a man with a beard and red flannel shirt, wearing vr goggles, walking into traffic --m -5\",\n",
|
199 |
+
" \"a man with a beard and red flannel shirt, wearing vr goggles, walking into traffic --m -3\",\n",
|
200 |
+
" \"a man with a beard and red flannel shirt, wearing vr goggles, walking into traffic --m 3\",\n",
|
201 |
+
" \"a man with a beard and red flannel shirt, wearing vr goggles, walking into traffic --m 5\",\n",
|
202 |
+
" ],\n",
|
203 |
+
" # negative prompt used on all prompts above as default if they don't have one\n",
|
204 |
+
" 'neg': \"cartoon, fake, drawing, illustration, cgi, animated, anime, monochrome\",\n",
|
205 |
+
" # seed for sampling. 42 is the answer for everything\n",
|
206 |
+
" 'seed': 42,\n",
|
207 |
+
" # walks the seed so s1 is 42, s2 is 43, s3 is 44, etc\n",
|
208 |
+
" # will start over on next sample_every so s1 is always seed\n",
|
209 |
+
" # works well if you use same prompt but want different results\n",
|
210 |
+
" 'walk_seed': False,\n",
|
211 |
+
" # cfg scale (4 to 10 is good)\n",
|
212 |
+
" 'guidance_scale': 7,\n",
|
213 |
+
" # sampler steps (20 to 30 is good)\n",
|
214 |
+
" 'sample_steps': 20,\n",
|
215 |
+
" # default network multiplier for all prompts\n",
|
216 |
+
" # since we are training a slider, I recommend overriding this with --m [number]\n",
|
217 |
+
" # in the prompts above to get both sides of the slider\n",
|
218 |
+
" 'network_multiplier': 1.0,\n",
|
219 |
+
" },\n",
|
220 |
+
"\n",
|
221 |
+
" # logging information\n",
|
222 |
+
" 'logging': {\n",
|
223 |
+
" 'log_every': 10, # log every this many steps\n",
|
224 |
+
" 'use_wandb': False, # not supported yet\n",
|
225 |
+
" 'verbose': False, # probably done need unless you are debugging\n",
|
226 |
+
" },\n",
|
227 |
+
"\n",
|
228 |
+
" # slider training config, best for last\n",
|
229 |
+
" 'slider': {\n",
|
230 |
+
" # resolutions to train on. [ width, height ]. This is less important for sliders\n",
|
231 |
+
" # as we are not teaching the model anything it doesn't already know\n",
|
232 |
+
" # but must be a size it understands [ 512, 512 ] for sd_v1.5 and [ 768, 768 ] for sd_v2.1\n",
|
233 |
+
" # and [ 1024, 1024 ] for sd_xl\n",
|
234 |
+
" # you can do as many as you want here\n",
|
235 |
+
" 'resolutions': [\n",
|
236 |
+
" [512, 512],\n",
|
237 |
+
" # [ 512, 768 ]\n",
|
238 |
+
" # [ 768, 768 ]\n",
|
239 |
+
" ],\n",
|
240 |
+
" # slider training uses 4 combined steps for a single round. This will do it in one gradient\n",
|
241 |
+
" # step. It is highly optimized and shouldn't take anymore vram than doing without it,\n",
|
242 |
+
" # since we break down batches for gradient accumulation now. so just leave it on.\n",
|
243 |
+
" 'batch_full_slide': True,\n",
|
244 |
+
" # These are the concepts to train on. You can do as many as you want here,\n",
|
245 |
+
" # but they can conflict outweigh each other. Other than experimenting, I recommend\n",
|
246 |
+
" # just doing one for good results\n",
|
247 |
+
" 'targets': [\n",
|
248 |
+
" # target_class is the base concept we are adjusting the representation of\n",
|
249 |
+
" # for example, if we are adjusting the representation of a person, we would use \"person\"\n",
|
250 |
+
" # if we are adjusting the representation of a cat, we would use \"cat\" It is not\n",
|
251 |
+
" # a keyword necessarily but what the model understands the concept to represent.\n",
|
252 |
+
" # \"person\" will affect men, women, children, etc but will not affect cats, dogs, etc\n",
|
253 |
+
" # it is the models base general understanding of the concept and everything it represents\n",
|
254 |
+
" # you can leave it blank to affect everything. In this example, we are adjusting\n",
|
255 |
+
" # detail, so we will leave it blank to affect everything\n",
|
256 |
+
" {\n",
|
257 |
+
" 'target_class': \"\",\n",
|
258 |
+
" # positive is the prompt for the positive side of the slider.\n",
|
259 |
+
" # It is the concept that will be excited and amplified in the model when we slide the slider\n",
|
260 |
+
" # to the positive side and forgotten / inverted when we slide\n",
|
261 |
+
" # the slider to the negative side. It is generally best to include the target_class in\n",
|
262 |
+
" # the prompt. You want it to be the extreme of what you want to train on. For example,\n",
|
263 |
+
" # if you want to train on fat people, you would use \"an extremely fat, morbidly obese person\"\n",
|
264 |
+
" # as the prompt. Not just \"fat person\"\n",
|
265 |
+
" # max 75 tokens for now\n",
|
266 |
+
" 'positive': \"high detail, 8k, intricate, detailed, high resolution, high res, high quality\",\n",
|
267 |
+
" # negative is the prompt for the negative side of the slider and works the same as positive\n",
|
268 |
+
" # it does not necessarily work the same as a negative prompt when generating images\n",
|
269 |
+
" # these need to be polar opposites.\n",
|
270 |
+
" # max 76 tokens for now\n",
|
271 |
+
" 'negative': \"blurry, boring, fuzzy, low detail, low resolution, low res, low quality\",\n",
|
272 |
+
" # the loss for this target is multiplied by this number.\n",
|
273 |
+
" # if you are doing more than one target it may be good to set less important ones\n",
|
274 |
+
" # to a lower number like 0.1 so they don't outweigh the primary target\n",
|
275 |
+
" 'weight': 1.0,\n",
|
276 |
+
" },\n",
|
277 |
+
" ],\n",
|
278 |
+
" },\n",
|
279 |
+
" },\n",
|
280 |
+
" ]\n",
|
281 |
+
" },\n",
|
282 |
+
"\n",
|
283 |
+
" # You can put any information you want here, and it will be saved in the model.\n",
|
284 |
+
" # The below is an example, but you can put your grocery list in it if you want.\n",
|
285 |
+
" # It is saved in the model so be aware of that. The software will include this\n",
|
286 |
+
" # plus some other information for you automatically\n",
|
287 |
+
" 'meta': {\n",
|
288 |
+
" # [name] gets replaced with the name above\n",
|
289 |
+
" 'name': \"[name]\",\n",
|
290 |
+
" 'version': '1.0',\n",
|
291 |
+
" # 'creator': {\n",
|
292 |
+
" # 'name': 'your name',\n",
|
293 |
+
" # 'email': '[email protected]',\n",
|
294 |
+
" # 'website': 'https://your.website'\n",
|
295 |
+
" # }\n",
|
296 |
+
" }\n",
|
297 |
+
"})\n"
|
298 |
+
],
|
299 |
+
"metadata": {
|
300 |
+
"id": "_t28QURYjRQO"
|
301 |
+
},
|
302 |
+
"execution_count": null,
|
303 |
+
"outputs": []
|
304 |
+
},
|
305 |
+
{
|
306 |
+
"cell_type": "markdown",
|
307 |
+
"source": [
|
308 |
+
"## Run it\n",
|
309 |
+
"\n",
|
310 |
+
"Below does all the magic. Check your folders to the left. Items will be in output/LoRA/your_name_v1 In the samples folder, there are preiodic sampled. This doesnt work great with colab. Ill update soon."
|
311 |
+
],
|
312 |
+
"metadata": {
|
313 |
+
"id": "h6F1FlM2Wb3l"
|
314 |
+
}
|
315 |
+
},
|
316 |
+
{
|
317 |
+
"cell_type": "code",
|
318 |
+
"source": [
|
319 |
+
"run_job(job_to_run)\n"
|
320 |
+
],
|
321 |
+
"metadata": {
|
322 |
+
"id": "HkajwI8gteOh"
|
323 |
+
},
|
324 |
+
"execution_count": null,
|
325 |
+
"outputs": []
|
326 |
+
},
|
327 |
+
{
|
328 |
+
"cell_type": "markdown",
|
329 |
+
"source": [
|
330 |
+
"## Done\n",
|
331 |
+
"\n",
|
332 |
+
"Check your ourput dir and get your slider\n"
|
333 |
+
],
|
334 |
+
"metadata": {
|
335 |
+
"id": "Hblgb5uwW5SD"
|
336 |
+
}
|
337 |
+
}
|
338 |
+
]
|
339 |
+
}
|
optimizer.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3f2a6938c50ea1e94ab65f006c34141c55a64d5f2de742ba8a2366f54170065b
|
3 |
+
size 175676612
|
samples/1733773241950__000000000_0.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733773260524__000000000_1.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733773279091__000000000_2.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733773297647__000000000_3.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733773316204__000000000_4.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733773334761__000000000_5.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733773854236__000000250_0.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733773872815__000000250_1.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733773891384__000000250_2.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733773909950__000000250_3.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733773928516__000000250_4.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733773947082__000000250_5.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733774468334__000000500_0.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733774486925__000000500_1.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733774505504__000000500_2.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733774524087__000000500_3.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733774542663__000000500_4.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733774561257__000000500_5.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733775084642__000000750_0.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733775103231__000000750_1.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733775121813__000000750_2.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733775140394__000000750_3.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733775158977__000000750_4.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733775177573__000000750_5.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733775697559__000001000_0.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733775716149__000001000_1.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733775734740__000001000_2.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733775753330__000001000_3.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733775771915__000001000_4.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733775790516__000001000_5.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733776314928__000001250_0.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733776333525__000001250_1.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733776352107__000001250_2.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733776370700__000001250_3.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733776389287__000001250_4.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733776407876__000001250_5.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733776931659__000001500_0.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733776950250__000001500_1.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733776968841__000001500_2.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733776987424__000001500_3.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733777006010__000001500_4.jpg
ADDED
![]() |
Git LFS Details
|
samples/1733777024585__000001500_5.jpg
ADDED
![]() |
Git LFS Details
|