codeShare commited on
Commit
16db48a
·
verified ·
1 Parent(s): e754fde

Upload sd_token_similarity_calculator.ipynb

Browse files
Files changed (1) hide show
  1. sd_token_similarity_calculator.ipynb +332 -26
sd_token_similarity_calculator.ipynb CHANGED
@@ -17,12 +17,23 @@
17
  {
18
  "cell_type": "markdown",
19
  "source": [
20
- "This Notebook is a Stable-diffusion tool which allows you to find similiar tokens from the SD 1.5 vocab.json that you can use for text-to-image generation. Try this Free online SD 1.5 generator with the results: https://perchance.org/fusion-ai-image-generator"
 
 
21
  ],
22
  "metadata": {
23
  "id": "L7JTcbOdBPfh"
24
  }
25
  },
 
 
 
 
 
 
 
 
 
26
  {
27
  "cell_type": "code",
28
  "source": [
@@ -88,6 +99,144 @@
88
  "execution_count": null,
89
  "outputs": []
90
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
  {
92
  "cell_type": "code",
93
  "source": [
@@ -107,22 +256,10 @@
107
  "#You can leave the 'prompt' field empty to get a random value tensor. Since the tensor is random value, it will not correspond to any tensor in the vocab.json list , and this it will have no ID."
108
  ],
109
  "metadata": {
110
- "id": "RPdkYzT2_X85",
111
- "colab": {
112
- "base_uri": "https://localhost:8080/"
113
- },
114
- "outputId": "86f2f01e-6a04-4292-cee7-70fd8398e07f"
115
  },
116
  "execution_count": null,
117
- "outputs": [
118
- {
119
- "output_type": "stream",
120
- "name": "stdout",
121
- "text": [
122
- "[49406, 8922, 49407]\n"
123
- ]
124
- }
125
- ]
126
  },
127
  {
128
  "cell_type": "code",
@@ -353,21 +490,20 @@
353
  "source": [
354
  "\n",
355
  "\n",
356
- "This is how the notebook works:\n",
357
  "\n",
358
  "Similiar vectors = similiar output in the SD 1.5 / SDXL / FLUX model\n",
359
  "\n",
360
- "CLIP converts the prompt text to vectors (“tensors”) , with float32 values usually ranging from -1 to 1\n",
361
  "\n",
362
- "Dimensions are [ 1x768 ] tensors for SD 1.5 , and a [ 1x768 , 1x1024 ] tensor for SDXL and FLUX.\n",
363
  "\n",
364
  "The SD models and FLUX converts these vectors to an image.\n",
365
  "\n",
366
- "This notebook takes an input string , tokenizes it and matches the first token against the 49407 token vectors in the vocab.json : https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main/tokenizer\n",
367
  "\n",
368
  "It finds the “most similiar tokens” in the list. Similarity is the theta angle between the token vectors.\n",
369
  "\n",
370
- "\n",
371
  "<div>\n",
372
  "<img src=\"https://huggingface.co/datasets/codeShare/sd_tokens/resolve/main/cosine.jpeg\" width=\"300\"/>\n",
373
  "</div>\n",
@@ -376,19 +512,189 @@
376
  "\n",
377
  "Negative similarity is also possible.\n",
378
  "\n",
379
- "So if you are bored of prompting “girl” and want something similiar you can run this notebook and use the “chick</w>” token at 21.88% similarity , for example\n",
380
  "\n",
381
- "You can also run a mixed search , like “cute+girl”/2 , where for examplekpop</w>” has a 16.71% similarity\n",
382
  "\n",
383
- "Sidenote: Prompt weights like (banana:1.2) will scale the magnitude of the corresponding 1x768 tensor(s) by 1.2 .\n",
384
  "\n",
385
- "Source: https://huggingface.co/docs/diffusers/main/en/using-diffusers/weighted_prompts*\n",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
386
  "\n",
387
  "So TLDR; vector direction = “what to generate” , vector magnitude = “prompt weights”\n",
388
  "\n",
389
- "/---/\n",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
390
  "\n",
391
- "Read more about CLIP here: https://huggingface.co/docs/transformers/model_doc/clip"
392
  ],
393
  "metadata": {
394
  "id": "njeJx_nSSA8H"
 
17
  {
18
  "cell_type": "markdown",
19
  "source": [
20
+ "This Notebook is a Stable-diffusion tool which allows you to find similiar tokens from the SD 1.5 vocab.json that you can use for text-to-image generation. Try this Free online SD 1.5 generator with the results: https://perchance.org/fusion-ai-image-generator\n",
21
+ "\n",
22
+ "Scroll to the bottom of the notebook to see the guide for how this works."
23
  ],
24
  "metadata": {
25
  "id": "L7JTcbOdBPfh"
26
  }
27
  },
28
+ {
29
+ "cell_type": "code",
30
+ "source": [],
31
+ "metadata": {
32
+ "id": "PBwVIuAjEdHA"
33
+ },
34
+ "execution_count": null,
35
+ "outputs": []
36
+ },
37
  {
38
  "cell_type": "code",
39
  "source": [
 
99
  "execution_count": null,
100
  "outputs": []
101
  },
102
+ {
103
+ "cell_type": "code",
104
+ "source": [
105
+ "# @title ⚡ Get similiar tokens\n",
106
+ "from transformers import AutoTokenizer\n",
107
+ "tokenizer = AutoTokenizer.from_pretrained(\"openai/clip-vit-large-patch14\", clean_up_tokenization_spaces = False)\n",
108
+ "\n",
109
+ "prompt= \"banana\" # @param {type:'string'}\n",
110
+ "\n",
111
+ "tokenizer_output = tokenizer(text = prompt)\n",
112
+ "input_ids = tokenizer_output['input_ids']\n",
113
+ "print(input_ids)\n",
114
+ "\n",
115
+ "\n",
116
+ "#The prompt will be enclosed with the <|start-of-text|> and <|end-of-text|> tokens, which is why output will be [49406, ... , 49407].\n",
117
+ "\n",
118
+ "#You can leave the 'prompt' field empty to get a random value tensor. Since the tensor is random value, it will not correspond to any tensor in the vocab.json list , and this it will have no ID.\n",
119
+ "\n",
120
+ "id_A = input_ids[1]\n",
121
+ "A = token[id_A]\n",
122
+ "_A = LA.vector_norm(A, ord=2)\n",
123
+ "\n",
124
+ "#if no imput exists we just randomize the entire thing\n",
125
+ "if (prompt == \"\"):\n",
126
+ " id_A = -1\n",
127
+ " print(\"Tokenized prompt tensor A is a random valued tensor with no ID\")\n",
128
+ " R = torch.rand(768)\n",
129
+ " _R = LA.vector_norm(R, ord=2)\n",
130
+ " A = R*(_A/_R)\n",
131
+ "\n",
132
+ "\n",
133
+ "mix_with = \"\" # @param {\"type\":\"string\",\"placeholder\":\"(optional) write something else\"}\n",
134
+ "mix_method = \"None\" # @param [\"None\" , \"Average\", \"Subtract\"] {allow-input: true}\n",
135
+ "w = 0.5 # @param {type:\"slider\", min:0, max:1, step:0.01}\n",
136
+ "\n",
137
+ "tokenizer_output = tokenizer(text = mix_with)\n",
138
+ "input_ids = tokenizer_output['input_ids']\n",
139
+ "id_C = input_ids[1]\n",
140
+ "C = token[id_C]\n",
141
+ "_C = LA.vector_norm(C, ord=2)\n",
142
+ "\n",
143
+ "#if no imput exists we just randomize the entire thing\n",
144
+ "if (mix_with == \"\"):\n",
145
+ " id_C = -1\n",
146
+ " print(\"Tokenized prompt 'mix_with' tensor C is a random valued tensor with no ID\")\n",
147
+ " R = torch.rand(768)\n",
148
+ " _R = LA.vector_norm(R, ord=2)\n",
149
+ " C = R*(_C/_R)\n",
150
+ "\n",
151
+ "if (mix_method == \"None\"):\n",
152
+ " print(\"No operation\")\n",
153
+ "\n",
154
+ "if (mix_method == \"Average\"):\n",
155
+ " A = w*A + (1-w)*C\n",
156
+ " _A = LA.vector_norm(A, ord=2)\n",
157
+ " print(\"Tokenized prompt tensor A has been recalculated as A = w*A + (1-w)*C , where C is the tokenized prompt 'mix_with' tensor C\")\n",
158
+ "\n",
159
+ "if (mix_method == \"Subtract\"):\n",
160
+ " tmp = (A/_A) - (C/_C)\n",
161
+ " _tmp = LA.vector_norm(tmp, ord=2)\n",
162
+ " A = tmp*((w*_A + (1-w)*_C)/_tmp)\n",
163
+ " _A = LA.vector_norm(A, ord=2)\n",
164
+ " print(\"Tokenized prompt tensor A has been recalculated as A = (w*_A + (1-w)*_C) * norm(w*A - (1-w)*C) , where C is the tokenized prompt 'mix_with' tensor C\")\n",
165
+ "\n",
166
+ "#OPTIONAL : Add/subtract + normalize above result with another token. Leave field empty to get a random value tensor\n",
167
+ "\n",
168
+ "dots = torch.zeros(NUM_TOKENS)\n",
169
+ "for index in range(NUM_TOKENS):\n",
170
+ " id_B = index\n",
171
+ " B = token[id_B]\n",
172
+ " _B = LA.vector_norm(B, ord=2)\n",
173
+ " result = torch.dot(A,B)/(_A*_B)\n",
174
+ " #result = absolute_value(result.item())\n",
175
+ " result = result.item()\n",
176
+ " dots[index] = result\n",
177
+ "\n",
178
+ "name_A = \"A of random type\"\n",
179
+ "if (id_A>-1):\n",
180
+ " name_A = vocab[id_A]\n",
181
+ "\n",
182
+ "name_C = \"token C of random type\"\n",
183
+ "if (id_C>-1):\n",
184
+ " name_C = vocab[id_C]\n",
185
+ "\n",
186
+ "\n",
187
+ "sorted, indices = torch.sort(dots,dim=0 , descending=True)\n",
188
+ "#----#\n",
189
+ "if (mix_method == \"Average\"):\n",
190
+ " print(f'Calculated all cosine-similarities between the average of token {name_A} and {name_C} with Id_A = {id_A} and mixed Id_C = {id_C} as a 1x{sorted.shape[0]} tensor')\n",
191
+ "if (mix_method == \"Subtract\"):\n",
192
+ " print(f'Calculated all cosine-similarities between the subtract of token {name_A} and {name_C} with Id_A = {id_A} and mixed Id_C = {id_C} as a 1x{sorted.shape[0]} tensor')\n",
193
+ "if (mix_method == \"None\"):\n",
194
+ " print(f'Calculated all cosine-similarities between the token {name_A} with Id_A = {id_A} with the the rest of the {NUM_TOKENS} tokens as a 1x{sorted.shape[0]} tensor')\n",
195
+ "\n",
196
+ "#Produce a list id IDs that are most similiar to the prompt ID at positiion 1 based on above result\n",
197
+ "\n",
198
+ "list_size = 100 # @param {type:'number'}\n",
199
+ "print_ID = False # @param {type:\"boolean\"}\n",
200
+ "print_Similarity = True # @param {type:\"boolean\"}\n",
201
+ "print_Name = True # @param {type:\"boolean\"}\n",
202
+ "print_Divider = True # @param {type:\"boolean\"}\n",
203
+ "\n",
204
+ "\n",
205
+ "if (print_Divider):\n",
206
+ " print('//---//') # % value\n",
207
+ "\n",
208
+ "print('') # % value\n",
209
+ "print('Here is the result : ') # % value\n",
210
+ "print('') # % value\n",
211
+ "\n",
212
+ "for index in range(list_size):\n",
213
+ " id = indices[index].item()\n",
214
+ " if (print_Name):\n",
215
+ " print(f'{vocab[id]}') # vocab item\n",
216
+ " if (print_ID):\n",
217
+ " print(f'ID = {id}') # IDs\n",
218
+ " if (print_Similarity):\n",
219
+ " print(f'similiarity = {round(sorted[index].item()*100,2)} %') # % value\n",
220
+ " if (print_Divider):\n",
221
+ " print('--------')\n",
222
+ "\n",
223
+ "#Print the sorted list from above result"
224
+ ],
225
+ "metadata": {
226
+ "id": "iWeFnT1gAx6A"
227
+ },
228
+ "execution_count": null,
229
+ "outputs": []
230
+ },
231
+ {
232
+ "cell_type": "markdown",
233
+ "source": [
234
+ "# ↓ Sub modules (use these to build your own projects) ↓"
235
+ ],
236
+ "metadata": {
237
+ "id": "_d8WtPgtAymM"
238
+ }
239
+ },
240
  {
241
  "cell_type": "code",
242
  "source": [
 
256
  "#You can leave the 'prompt' field empty to get a random value tensor. Since the tensor is random value, it will not correspond to any tensor in the vocab.json list , and this it will have no ID."
257
  ],
258
  "metadata": {
259
+ "id": "RPdkYzT2_X85"
 
 
 
 
260
  },
261
  "execution_count": null,
262
+ "outputs": []
 
 
 
 
 
 
 
 
263
  },
264
  {
265
  "cell_type": "code",
 
490
  "source": [
491
  "\n",
492
  "\n",
493
+ "# How does this notebook work?\n",
494
  "\n",
495
  "Similiar vectors = similiar output in the SD 1.5 / SDXL / FLUX model\n",
496
  "\n",
497
+ "CLIP converts the prompt text to vectors (“tensors”) , with float32 values usually ranging from -1 to 1.\n",
498
  "\n",
499
+ "Dimensions are \\[ 1x768 ] tensors for SD 1.5 , and a \\[ 1x768 , 1x1024 ] tensor for SDXL and FLUX.\n",
500
  "\n",
501
  "The SD models and FLUX converts these vectors to an image.\n",
502
  "\n",
503
+ "This notebook takes an input string , tokenizes it and matches the first token against the 49407 token vectors in the vocab.json : [https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main/tokenizer](https://www.google.com/url?q=https%3A%2F%2Fhuggingface.co%2Fblack-forest-labs%2FFLUX.1-dev%2Ftree%2Fmain%2Ftokenizer)\n",
504
  "\n",
505
  "It finds the “most similiar tokens” in the list. Similarity is the theta angle between the token vectors.\n",
506
  "\n",
 
507
  "<div>\n",
508
  "<img src=\"https://huggingface.co/datasets/codeShare/sd_tokens/resolve/main/cosine.jpeg\" width=\"300\"/>\n",
509
  "</div>\n",
 
512
  "\n",
513
  "Negative similarity is also possible.\n",
514
  "\n",
515
+ "# How can I use it?\n",
516
  "\n",
517
+ "If you are bored of prompting “girl” and want something similiar you can run this notebook and use the chick” token at 21.88% similarity , for example\n",
518
  "\n",
519
+ "You can also run a mixed search , like “cute+girl”/2 , where for example “kpop” has a 16.71% similarity\n",
520
  "\n",
521
+ "There are some strange tokens further down the list you go. Example: tokens similiar to the token \"pewdiepie</w>\" (yes this is an actual token that exists in CLIP)\n",
522
+ "\n",
523
+ "<div>\n",
524
+ "<img src=\"https://lemmy.world/pictrs/image/a1cd284e-3341-4284-9949-5f8b58d3bd1f.jpeg\" width=\"300\"/>\n",
525
+ "</div>\n",
526
+ "\n",
527
+ "Each of these correspond to a unique 1x768 token vector.\n",
528
+ "\n",
529
+ "The higher the ID value , the less often the token appeared in the CLIP training data.\n",
530
+ "\n",
531
+ "To reiterate; this is the CLIP model training data , not the SD-model training data.\n",
532
+ "\n",
533
+ "So for certain models , tokens with high ID can give very consistent results , if the SD model is trained to handle them.\n",
534
+ "\n",
535
+ "Example of this can be anime models , where japanese artist names can affect the output greatly. \n",
536
+ "\n",
537
+ "Tokens with high ID will often give the \"fun\" output when used in very short prompts.\n",
538
+ "\n",
539
+ "# What about token vector length?\n",
540
+ "\n",
541
+ "If you are wondering about token magnitude,\n",
542
+ "Prompt weights like (banana:1.2) will scale the magnitude of the corresponding 1x768 tensor(s) by 1.2 . So thats how prompt token magnitude works.\n",
543
+ "\n",
544
+ "Source: [https://huggingface.co/docs/diffusers/main/en/using-diffusers/weighted\\_prompts](https://www.google.com/url?q=https%3A%2F%2Fhuggingface.co%2Fdocs%2Fdiffusers%2Fmain%2Fen%2Fusing-diffusers%2Fweighted_prompts)\\*\n",
545
  "\n",
546
  "So TLDR; vector direction = “what to generate” , vector magnitude = “prompt weights”\n",
547
  "\n",
548
+ "# How prompting works (technical summary)\n",
549
+ "\n",
550
+ " 1. There is no correct way to prompt.\n",
551
+ "\n",
552
+ "2. Stable diffusion reads your prompt left to right, one token at a time, finding association _from_ the previous token _to_ the current token _and to_ the image generated thus far (Cross Attention Rule)\n",
553
+ "\n",
554
+ "3. Stable Diffusion is an optimization problem that seeks to maximize similarity to prompt and minimize similarity to negatives (Optimization Rule)\n",
555
+ "\n",
556
+ "Reference material (covers entire SD , so not good source material really, but the info is there) : https://youtu.be/sFztPP9qPRc?si=ge2Ty7wnpPGmB0gi\n",
557
+ "\n",
558
+ "# The SD pipeline\n",
559
+ "\n",
560
+ "For every step (20 in total by default) for SD1.5 :\n",
561
+ "\n",
562
+ "1. Prompt text => (tokenizer)\n",
563
+ "2. => Nx768 token vectors =>(CLIP model) =>\n",
564
+ "3. 1x768 encoding => ( the SD model / Unet ) =>\n",
565
+ "4. => _Desired_ image per Rule 3 => ( sampler)\n",
566
+ "5. => Paint a section of the image => (image)\n",
567
+ "\n",
568
+ "# Disclaimer /Trivia\n",
569
+ "\n",
570
+ "This notebook should be seen as a \"dictionary search tool\" for the vocab.json , which is the same for SD1.5 , SDXL and FLUX. Feel free to verify this by checking the 'tokenizer' folder under each model.\n",
571
+ "\n",
572
+ "vocab.json in the FLUX model , for example (1 of 2 copies) : https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main/tokenizer\n",
573
+ "\n",
574
+ "I'm using Clip-vit-large-patch14 , which is used in SD 1.5 , and is one among the two tokenizers for SDXL and FLUX : https://huggingface.co/openai/clip-vit-large-patch14/blob/main/README.md\n",
575
+ "\n",
576
+ "This set of tokens has dimension 1x768. \n",
577
+ "\n",
578
+ "SDXL and FLUX uses an additional set of tokens of dimension 1x1024.\n",
579
+ "\n",
580
+ "These are not included in this notebook. Feel free to include them yourselves (I would appreciate that).\n",
581
+ "\n",
582
+ "To do so, you will have to download a FLUX and/or SDXL model\n",
583
+ "\n",
584
+ ", and copy the 49407x1024 tensor list that is stored within the model and then save it as a .pt file.\n",
585
+ "\n",
586
+ "//---//\n",
587
+ "\n",
588
+ "I am aware it is actually the 1x768 text_encoding being processed into an image for the SD models + FLUX.\n",
589
+ "\n",
590
+ "As such , I've included text_encoding comparison at the bottom of the Notebook.\n",
591
+ "\n",
592
+ "I am also aware thar SDXL and FLUX uses additional encodings , which are not included in this notebook.\n",
593
+ "\n",
594
+ "* Clip-vit-bigG for SDXL: https://huggingface.co/laion/CLIP-ViT-bigG-14-laion2B-39B-b160k/blob/main/README.md\n",
595
+ "\n",
596
+ "* And the T5 text encoder for FLUX. I have 0% understanding of FLUX T5 text_encoder.\n",
597
+ "\n",
598
+ "//---//\n",
599
+ "\n",
600
+ "If you want them , feel free to include them yourself and share the results (cuz I probably won't) :)!\n",
601
+ "\n",
602
+ "That being said , being an encoding , I reckon the CLIP Nx768 => 1x768 should be \"linear\" (or whatever one might call it)\n",
603
+ "\n",
604
+ "So exchange a few tokens in the Nx768 for something similiar , and the resulting 1x768 ought to be kinda similar to 1x768 we had earlier. Hopefully.\n",
605
+ "\n",
606
+ "I feel its important to mention this , in case some wonder why the token-token similarity don't match the text-encoding to text-encoding similarity.\n",
607
+ "\n",
608
+ "# Note regarding CLIP text encoding vs. token\n",
609
+ "\n",
610
+ "*To make this disclaimer clear; Token-to-token similarity is not the same as text_encoding similarity.*\n",
611
+ "\n",
612
+ "I have to say this , since it will otherwise get (even more) confusing , as both the individual tokens , and the text_encoding have dimensions 1x768.\n",
613
+ "\n",
614
+ "They are separate things. Separate results. etc.\n",
615
+ "\n",
616
+ "As such , you will not get anything useful if you start comparing similarity between a token , and a text-encoding. So don't do that :)!\n",
617
+ "\n",
618
+ "# What about the CLIP image encoding?\n",
619
+ "\n",
620
+ "The CLIP model can also do an image_encoding of an image, where the output will be a 1x768 tensor. These _can_ be compared with the text_encoding.\n",
621
+ "\n",
622
+ "Comparing CLIP image_encoding with the CLIP text_encoding for a bunch of random prompts until you find the \"highest similarity\" , is a method used in the CLIP interrogator : https://huggingface.co/spaces/pharmapsychotic/CLIP-Interrogator\n",
623
+ "\n",
624
+ "List of random prompts for CLIP interrogator can be found here, for reference : https://github.com/pharmapsychotic/clip-interrogator/tree/main/clip_interrogator/data\n",
625
+ "\n",
626
+ "The CLIP image_encoding is not included in this Notebook.\n",
627
+ "\n",
628
+ "If you spot errors / ideas for improvememts; feel free to fix the code in your own notebook and post the results.\n",
629
+ "\n",
630
+ "I'd appreciate that over people saying \"your math is wrong you n00b!\" with no constructive feedback.\n",
631
+ "\n",
632
+ "//---//\n",
633
+ "\n",
634
+ "Regarding output\n",
635
+ "\n",
636
+ "# What are the </w> symbols?\n",
637
+ "\n",
638
+ "The whitespace symbol indicate if the tokenized item ends with whitespace ( the suffix \"banana</w>\" => \"banana \" ) or not (the prefix \"post\" in \"post-apocalyptic \")\n",
639
+ "\n",
640
+ "For ease of reference , I call them prefix-tokens and suffix-tokens.\n",
641
+ "\n",
642
+ "Sidenote:\n",
643
+ "\n",
644
+ "Prefix tokens have the unique property in that they \"mutate\" suffix tokens\n",
645
+ "\n",
646
+ "Example: \"photo of a #prefix#-banana\"\n",
647
+ "\n",
648
+ "where #prefix# is a randomly selected prefix-token from the vocab.json\n",
649
+ "\n",
650
+ "The hyphen \"-\" exists to guarantee the tokenized text splits into the written #prefix# and #suffix# token respectively. The \"-\" hypen symbol can be replaced by any other special character of your choosing.\n",
651
+ "\n",
652
+ " Capital letters work too , e.g \"photo of a #prefix#Abanana\" since the capital letters A-Z are only listed once in the entire vocab.json.\n",
653
+ "\n",
654
+ "You can also choose to omit any separator and just rawdog it with the prompt \"photo of a #prefix#banana\" , however know that this may , on occasion , be tokenized as completely different tokens of lower ID:s.\n",
655
+ "\n",
656
+ "Curiously , common NSFW terms found online have in the CLIP model have been purposefully fragmented into separate #prefix# and #suffix# counterparts in the vocab.json. Likely for PR-reasons.\n",
657
+ "\n",
658
+ "You can verify the results using this online tokenizer: https://sd-tokenizer.rocker.boo/\n",
659
+ "\n",
660
+ "<div>\n",
661
+ "<img src=\"https://lemmy.world/pictrs/image/43467d75-7406-4a13-93ca-cdc469f944fc.jpeg\" width=\"300\"/>\n",
662
+ "<img src=\"https://lemmy.world/pictrs/image/c0411565-0cb3-47b1-a788-b368924d6f17.jpeg\" width=\"300\"/>\n",
663
+ "<img src=\"https://lemmy.world/pictrs/image/c27c6550-a88b-4543-9bd7-067dff016be2.jpeg\" width=\"300\"/>\n",
664
+ "</div>\n",
665
+ "\n",
666
+ "# What is that gibberish tokens that show up?\n",
667
+ "\n",
668
+ "The gibberish tokens like \"ðŁĺħ\\</w>\" are actually emojis!\n",
669
+ "\n",
670
+ "Try writing some emojis in this online tokenizer to see the results: https://sd-tokenizer.rocker.boo/\n",
671
+ "\n",
672
+ "It is a bit borked as it can't process capital letters properly.\n",
673
+ "\n",
674
+ "Also note that this is not reversible.\n",
675
+ "\n",
676
+ "If tokenization \"😅\" => ðŁĺħ</w>\n",
677
+ "\n",
678
+ "Then you can't prompt \"ðŁĺħ\" and expect to get the same result as the tokenized original emoji , \"😅\".\n",
679
+ "\n",
680
+ "SD 1.5 models actually have training for Emojis.\n",
681
+ "\n",
682
+ "But you have to set CLIP skip to 1 for this to work is intended.\n",
683
+ "\n",
684
+ "For example, this is the result from \"photo of a 🧔🏻‍♂️\"\n",
685
+ "\n",
686
+ "\n",
687
+ "<div>\n",
688
+ "<img src=\"https://lemmy.world/pictrs/image/e2b51aea-6960-4ad0-867e-8ce85f2bd51e.jpeg\" width=\"300\"/>\n",
689
+ "</div>\n",
690
+ "\n",
691
+ "A tutorial on stuff you can do with the vocab.list concluded.\n",
692
+ "\n",
693
+ "Anyways, have fun with the notebook.\n",
694
+ "\n",
695
+ "There might be some updates in the future with features not mentioned here.\n",
696
  "\n",
697
+ "//---//"
698
  ],
699
  "metadata": {
700
  "id": "njeJx_nSSA8H"