File size: 4,586 Bytes
872ba34 6efac01 872ba34 12a9ff7 872ba34 01cf450 872ba34 12a9ff7 700c080 12a9ff7 872ba34 01cf450 872ba34 01cf450 20327b0 01cf450 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
---
tags:
- text-to-image
- lora
- diffusers
- template:diffusion-lora
widget:
- text: woodcut illustration of a fireman saving a cat stuck in a tree
output:
url: images/sample_1_2639_74a325bbe0dd0554b640.png
base_model: ashen0209/Flux-Dev2Pro
instance_prompt: woodcut illustration
license: other
license_name: flux-1-dev-non-commercial-license
license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md
datasets:
- gigant/oldbookillustrations
language:
- en
- fr
---
# woodcut illustration
<Gallery />
## Model description
Trained on 336 illustrations from _Nouveau dictionnaire encyclopédique universel illustré,_ A.K.A. the Trousset encyclopedia.
FLUX.1 [dev] has decent performance with “woodcut illustration” already,
but sometimes it uses mid-tones instead of proper black & white shading.
This LoRA is kinda scuffed but demonstrates some progress in influencing that style,
or maybe just biasing the scene toward what is in the encyclopedia. It seems to like foliage.
Its effect may be more visible on something like [Dedistilled-Mix](https://huggingface.co/wikeeyang/Flux.1-Dedistilled-Mix-Tuned-fp8)
or [Fusion](https://huggingface.co/Anibaaal/Flux-Fusion-V2-4step-merge-gguf-nf4) than stock FLUX.1 [dev].
## Trigger words
Training captions began with the phrase “woodcut illustration.”
They also include “1880s, 19th century”; I haven't tested prompting those.
## Download model
Weights for this model are available in Safetensors format.
[Download](/keturn/woodcut-illustrations-Trousset-LoRA/tree/main) them in the Files & versions tab.
## Methodology
I chose to use illustrations from _le Trousset_ because it could provide hundreds of images in a relatively consistent style.
Source images were 1600px on the long side, I excluded those with more extreme aspect ratios (beyond 16:10),
then resized to 1024px for RAM's sake.
Trained with [kohya-ss/sd-scripts](https://github.com/kohya-ss/sd-scripts/tree/e89653975ddf429cdf0c0fd268da0a5a3e8dba1f) (2024-12-15).
Some perhaps relevant settings:
<dl>
<dt>--optimizer_type adafactor --optimizer_args "relative_step=False" "scale_parameter=False" "warmup_init=False"
--lr_scheduler constant_with_warmup --learning_rate 8e-4 --model_prediction_type raw --guidance_scale 1 --loss_type l2
</dt>
<dd>fluxgym defaults</dd>
<dt>--network_dim 16 --network_alpha 16</dt>
<dd>definitely helped to raise network_dim up from fluxgym's default to 12 or 16.
I read somewhere that alpha should be set to the same, and raising it from the default 1.0 seems to have helped.</dd>
<dt>--network_args "train_blocks=single" "train_single_block_indices=18-37" "verbose=True"</dt>
<dd>In hopes that focusing on the higher half of the blocks would help prioritize small-scale detail over large structure.</dd>
<dt>--timestep_sampling shift --discrete_flow_shift 1.0</dt>
<dd>fluxgym's default shift is 3.1. Unverified hunch that we want to bring this back down to focus on lower timesteps for small-scale stuff.</dd>
</dl>
To accomodate 12 GB VRAM budget, used bfloat16 precision with the u-net at float8. The T5 encoder was not trained.
### Potential
I can think of lots of things one _could_ do to perhaps improve the result, though I don't have a good sense for which are most effective.
- More dataset curation. The images were _relatively_ consistent, but the botanical illustrations are better than the mammals,
the engineering illustrations are different than the cityscapes, etc.
- [`trousset-bugs-and-botany-sources.txt`](./trousset-bugs-and-botany-sources.txt): mostly flowers, close-up detail illustrations with no backgrounds.
- [`trousset-landscape-sources.txt`](./trousset-landscape-sources.txt): cityscapes and landscapes and a few other scenes with a similar mostly-full-frame style.
- Dataset augmentation. (I haven't yet turned to any of the tricks like mirroring the images, but I'm not sure if more is better at this point.)
- Multi-resolution training. Flux is usually pretty good over a range of sizes,
but this LoRA seems to suffer more when going below the megapixel size it was trained at.
I think some discretion is necessary for how images are cropped or downsized to avoid losing the qualities of the lines in shading.
- More detailed captions? Less detailed captions?
- Include T5 encoder in training. (Requires a bigger server.)
- Include the autoencoder in training. (Seems relevant for styles with narrow high-contrast lines.)
- Esoteric hyperparameter stuff. Different layers/blocks/rank/alpha?
|