File size: 3,299 Bytes
fba69ac
 
 
 
 
 
 
 
 
5182b66
fba69ac
 
 
 
 
 
75e2f8d
 
 
 
 
 
 
 
 
 
7f0a15e
75e2f8d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fba69ac
 
b6e68c2
fba69ac
75e2f8d
fba69ac
75e2f8d
fba69ac
75e2f8d
fba69ac
75e2f8d
fba69ac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
---
license: mit
datasets:
- isek-ai/danbooru-tags-2016-2023
language:
- en
library_name: transformers
---

# SDPrompt-RetNet-v2-beta

This model is a pretrained RetNet model trained from scratch using https://github.com/syncdoth/RetNet.

It achieves the following results on the evaluation set:
- Loss: 0.5923

## Usage

```bash
pip install transformers safetensors
```

```py
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

MODEL_NAME = "isek-ai/SDPrompt-RetNet-v2-beta"
DEVICE = "cuda"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model= AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.float16, # or torch.bfloat16
    trust_remote_code=True,
).to(DEVICE)
model.eval()
streamer = TextStreamer(tokenizer)

prompt = "1girl"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
_ = model.generate(
    inputs["input_ids"],
    max_new_tokens=256,
    do_sample=True,
    top_p=0.9,
    top_k=20,
    temperature=0.9,
    streamer=streamer,
)
# 1girl, :<, bag, black hair, blurry, bokeh, cloud, depth of field, from side, long sleeves, night, outdoors, pleated skirt, power lines, purple eyes, road, scenery, shoes, shoulder bag,gasm, sidelocks, sign, skirt,let's drawsaurus, skylight smile, sneakers, standing, star (sky), sweater, town, traffic cone, utility pole, vending machine, wide-eyed, window, wooden box, yellow skirt,ization, zettai ryouiki, zoom layer, white footwear, zipper, zipper pull tab, zipperland sheet, zombie pose, ladder, leaning back, leg up, looking to the side,let, miniskirt, motion blur, musical note, open mouth, part
```


## Model description

This model is trained with **only Danbooru tags** to generate prompts for image generation models.

## Training data

- [isek-ai/danbooru-tags-2016-2023](https://huggingface.co/datasets/isek-ai/danbooru-tags-2016-2023)

### Dataset filtering

TODO

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 32
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 0.975         | 0.07  | 500  | 1.0005          |
| 0.7549        | 0.13  | 1000 | 0.7604          |
| 0.6923        | 0.2   | 1500 | 0.7090          |
| 0.6753        | 0.26  | 2000 | 0.6778          |
| 0.6591        | 0.33  | 2500 | 0.6568          |
| 0.6337        | 0.39  | 3000 | 0.6429          |
| 0.6288        | 0.46  | 3500 | 0.6319          |
| 0.624         | 0.53  | 4000 | 0.6218          |
| 0.62          | 0.59  | 4500 | 0.6172          |
| 0.603         | 0.66  | 5000 | 0.6090          |
| 0.5931        | 0.72  | 5500 | 0.6032          |
| 0.5957        | 0.79  | 6000 | 0.5986          |
| 0.5972        | 0.85  | 6500 | 0.5948          |
| 0.5928        | 0.92  | 7000 | 0.5926          |
| 0.5904        | 0.98  | 7500 | 0.5923          |


### Framework versions

- Transformers 4.36.1
- Pytorch 2.1.2+cu121
- Datasets 2.15.0
- Tokenizers 0.15.0