File size: 89,907 Bytes
d15c366
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Tokenizer\n",
    "\n",
    "A completely seperate, independent module from the LLM. which has its own training dataset of text, on which you train the vocabulary using the BPE(Byte pair encoding) algorithm. It then translates back and forth between the raw text and the sequence of integers/tokens. LLM only deals with the tokens and never directly deals with the text.\n",
    "\n",
    "![image.png](../public/tokenizer.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "97"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# the unicode code point of the character\n",
    "ord('a')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[3118,\n",
       " 3136,\n",
       " 3120,\n",
       " 3137,\n",
       " 32,\n",
       " 3086,\n",
       " 3122,\n",
       " 3134,\n",
       " 32,\n",
       " 3081,\n",
       " 3112,\n",
       " 3149,\n",
       " 3112,\n",
       " 3134,\n",
       " 3120,\n",
       " 3137,\n",
       " 63,\n",
       " 32,\n",
       " 40,\n",
       " 72,\n",
       " 111,\n",
       " 119,\n",
       " 32,\n",
       " 97,\n",
       " 114,\n",
       " 101,\n",
       " 32,\n",
       " 121,\n",
       " 111,\n",
       " 117,\n",
       " 63,\n",
       " 41]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tokens = [ord(c) for c in \"మీరు ఎలా ఉన్నారు? (How are you?)\"]\n",
    "tokens"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "but having the token for each letter will increase the computation cost to generate and also train the model. so the BPE algorithm to introduced in the [GPT2 paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Byte pair encoding algorithm\n",
    "\n",
    "consider the string:\n",
    "\n",
    "`aaabdaaabac`\n",
    "\n",
    "the byte pair \"aa\" is the most occuring in the string, so we replace that with a new byte which is not used in the `vocab`, let's say \"Z\".\n",
    "Now the following string will be\n",
    "\n",
    "```\n",
    "ZabdZabac\n",
    "Z = aa\n",
    "```\n",
    "\n",
    "this process will be continued with recursive byte pair encoding replacing all the byte pairs till the string/data cannot be compressed further.\n",
    "\n",
    "\n",
    "Then the process is repeated with byte pair \"ab\", replacing it with \"Y\"\n",
    "```\n",
    "ZYdZYac\n",
    "Y=ab\n",
    "Z=aa\n",
    "```\n",
    "replacing \"ZY\" with \"X\"\n",
    "```\n",
    "XdXac\n",
    "X=ZY\n",
    "Y=ab\n",
    "Z=aa\n",
    "```\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "----\n",
      "Autogen enables the next-gen LLM applications with a generic [multi-agent conversation](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat) framework. It offers customizable and conversable agents that integrate LLMs, tools, and humans.\n",
      "By automating chat among multiple capable agents, one can easily make them collectively perform tasks autonomously or with human feedback, including tasks that require using tools via code.\n",
      "\n",
      "Features of this use case include:\n",
      "\n",
      "- **Multi-agent conversations**: AutoGen agents can communicate with each other to solve tasks. This allows for more complex and sophisticated applications than would be possible with a single LLM.\n",
      "- **Customization**: AutoGen agents can be customized to meet the specific needs of an application. This includes the ability to choose the LLMs to use, the types of human input to allow, and the tools to employ.\n",
      "- **Human participation**: AutoGen seamlessly allows human participation. This means that humans can provide input and feedback to the agents as needed.\n",
      "\n",
      "For [example](https://github.com/microsoft/autogen/blob/main/test/twoagent.py),\n",
      "\n",
      "```python\n",
      "from autogen import AssistantAgent, UserProxyAgent, config_list_from_json\n",
      "# Load LLM inference endpoints from an env variable or a file\n",
      "# See https://microsoft.github.io/autogen/docs/FAQ#set-your-api-endpoints\n",
      "# and OAI_CONFIG_LIST_sample\n",
      "config_list = config_list_from_json(env_or_file=\"OAI_CONFIG_LIST\")\n",
      "# You can also set config_list directly as a list, for example, config_list = [{'model': 'gpt-4', 'api_key': '<your OpenAI API key here>'},]\n",
      "assistant = AssistantAgent(\"assistant\", llm_config={\"config_list\": config_list})\n",
      "user_proxy = UserProxyAgent(\"user_proxy\", code_execution_config={\"work_dir\": \"coding\", \"use_docker\": False}) # IMPORTANT: set to True to run code in docker, recommended\n",
      "user_proxy.initiate_chat(assistant, message=\"Plot a chart of NVDA and TESLA stock price change YTD.\")\n",
      "# This initiates an automated chat between the two agents to solve the task\n",
      "```\n",
      "\n",
      "more python code:\n",
      "\n",
      "```python\n",
      "    def create(\n",
      "        self,\n",
      "        *,\n",
      "        messages: Iterable[ChatCompletionMessageParam],\n",
      "        model: Union[str, ChatModel],\n",
      "        frequency_penalty: Optional[float] | NotGiven = NOT_GIVEN,\n",
      "        function_call: completion_create_params.FunctionCall | NotGiven = NOT_GIVEN,\n",
      "        functions: Iterable[completion_create_params.Function] | NotGiven = NOT_GIVEN,\n",
      "        logit_bias: Optional[Dict[str, int]] | NotGiven = NOT_GIVEN,\n",
      "        logprobs: Optional[bool] | NotGiven = NOT_GIVEN,\n",
      "        max_tokens: Optional[int] | NotGiven = NOT_GIVEN,\n",
      "        n: Optional[int] | NotGiven = NOT_GIVEN,\n",
      "        presence_penalty: Optional[float] | NotGiven = NOT_GIVEN,\n",
      "        response_format: completion_create_params.ResponseFormat | NotGiven = NOT_GIVEN,\n",
      "        seed: Optional[int] | NotGiven = NOT_GIVEN,\n",
      "        stop: Union[Optional[str], List[str]] | NotGiven = NOT_GIVEN,\n",
      "        stream: Optional[Literal[False]] | Literal[True] | NotGiven = NOT_GIVEN,\n",
      "        stream_options: Optional[ChatCompletionStreamOptionsParam] | NotGiven = NOT_GIVEN,\n",
      "        temperature: Optional[float] | NotGiven = NOT_GIVEN,\n",
      "        tool_choice: ChatCompletionToolChoiceOptionParam | NotGiven = NOT_GIVEN,\n",
      "        tools: Iterable[ChatCompletionToolParam] | NotGiven = NOT_GIVEN,\n",
      "        top_logprobs: Optional[int] | NotGiven = NOT_GIVEN,\n",
      "        top_p: Optional[float] | NotGiven = NOT_GIVEN,\n",
      "        user: str | NotGiven = NOT_GIVEN,\n",
      "        # Use the following arguments if you need to pass additional parameters to the API that aren't available via kwargs.\n",
      "        # The extra values given here take precedence over values defined on the client or passed to this method.\n",
      "        extra_headers: Headers | None = None,\n",
      "        extra_query: Query | None = None,\n",
      "        extra_body: Body | None = None,\n",
      "        timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,\n",
      "    ) -> ChatCompletion | Stream[ChatCompletionChunk]:\n",
      "        return self._post(\n",
      "            \"/chat/completions\",\n",
      "            body=maybe_transform(\n",
      "                {\n",
      "                    \"messages\": messages,\n",
      "                    \"model\": model,\n",
      "                    \"frequency_penalty\": frequency_penalty,\n",
      "                    \"function_call\": function_call,\n",
      "                    \"functions\": functions,\n",
      "                    \"logit_bias\": logit_bias,\n",
      "                    \"logprobs\": logprobs,\n",
      "                    \"max_tokens\": max_tokens,\n",
      "                    \"n\": n,\n",
      "                    \"presence_penalty\": presence_penalty,\n",
      "                    \"response_format\": response_format,\n",
      "                    \"seed\": seed,\n",
      "                    \"stop\": stop,\n",
      "                    \"stream\": stream,\n",
      "                    \"stream_options\": stream_options,\n",
      "                    \"temperature\": temperature,\n",
      "                    \"tool_choice\": tool_choice,\n",
      "                    \"tools\": tools,\n",
      "                    \"top_logprobs\": top_logprobs,\n",
      "                    \"top_p\": top_p,\n",
      "                    \"user\": user,\n",
      "                },\n",
      "                completion_create_params.CompletionCreateParams,\n",
      "            ),\n",
      "            options=make_request_options(\n",
      "                extra_headers=extra_headers, extra_query=extra_query, extra_body=extra_body, timeout=timeout\n",
      "            ),\n",
      "            cast_to=ChatCompletion,\n",
      "            stream=stream or False,\n",
      "            stream_cls=Stream[ChatCompletionChunk],\n",
      "        )\n",
      "```\n",
      "\n",
      "length: 5397\n",
      "----\n",
      "[65, 117, 116, 111, 103, 101, 110, 32, 101, 110, 97, 98, 108, 101, 115, 32, 116, 104, 101, 32, 110, 101, 120, 116, 45, 103, 101, 110, 32, 76, 76, 77, 32, 97, 112, 112, 108, 105, 99, 97, 116, 105, 111, 110, 115, 32, 119, 105, 116, 104, 32, 97, 32, 103, 101, 110, 101, 114, 105, 99, 32, 91, 109, 117, 108, 116, 105, 45, 97, 103, 101, 110, 116, 32, 99, 111, 110, 118, 101, 114, 115, 97, 116, 105, 111, 110, 93, 40, 104, 116, 116, 112, 115, 58, 47, 47, 109, 105, 99, 114, 111, 115, 111, 102, 116, 46, 103, 105, 116, 104, 117, 98, 46, 105, 111, 47, 97, 117, 116, 111, 103, 101, 110, 47, 100, 111, 99, 115, 47, 85, 115, 101, 45, 67, 97, 115, 101, 115, 47, 97, 103, 101, 110, 116, 95, 99, 104, 97, 116, 41, 32, 102, 114, 97, 109, 101, 119, 111, 114, 107, 46, 32, 73, 116, 32, 111, 102, 102, 101, 114, 115, 32, 99, 117, 115, 116, 111, 109, 105, 122, 97, 98, 108, 101, 32, 97, 110, 100, 32, 99, 111, 110, 118, 101, 114, 115, 97, 98, 108, 101, 32, 97, 103, 101, 110, 116, 115, 32, 116, 104, 97, 116, 32, 105, 110, 116, 101, 103, 114, 97, 116, 101, 32, 76, 76, 77, 115, 44, 32, 116, 111, 111, 108, 115, 44, 32, 97, 110, 100, 32, 104, 117, 109, 97, 110, 115, 46, 10, 66, 121, 32, 97, 117, 116, 111, 109, 97, 116, 105, 110, 103, 32, 99, 104, 97, 116, 32, 97, 109, 111, 110, 103, 32, 109, 117, 108, 116, 105, 112, 108, 101, 32, 99, 97, 112, 97, 98, 108, 101, 32, 97, 103, 101, 110, 116, 115, 44, 32, 111, 110, 101, 32, 99, 97, 110, 32, 101, 97, 115, 105, 108, 121, 32, 109, 97, 107, 101, 32, 116, 104, 101, 109, 32, 99, 111, 108, 108, 101, 99, 116, 105, 118, 101, 108, 121, 32, 112, 101, 114, 102, 111, 114, 109, 32, 116, 97, 115, 107, 115, 32, 97, 117, 116, 111, 110, 111, 109, 111, 117, 115, 108, 121, 32, 111, 114, 32, 119, 105, 116, 104, 32, 104, 117, 109, 97, 110, 32, 102, 101, 101, 100, 98, 97, 99, 107, 44, 32, 105, 110, 99, 108, 117, 100, 105, 110, 103, 32, 116, 97, 115, 107, 115, 32, 116, 104, 97, 116, 32, 114, 101, 113, 117, 105, 114, 101, 32, 117, 115, 105, 110, 103, 32, 116, 111, 111, 108, 115, 32, 118, 105, 97, 32, 99, 111, 100, 101, 46, 10, 10, 70, 101, 97, 116, 117, 114, 101, 115, 32, 111, 102, 32, 116, 104, 105, 115, 32, 117, 115, 101, 32, 99, 97, 115, 101, 32, 105, 110, 99, 108, 117, 100, 101, 58, 10, 10, 45, 32, 42, 42, 77, 117, 108, 116, 105, 45, 97, 103, 101, 110, 116, 32, 99, 111, 110, 118, 101, 114, 115, 97, 116, 105, 111, 110, 115, 42, 42, 58, 32, 65, 117, 116, 111, 71, 101, 110, 32, 97, 103, 101, 110, 116, 115, 32, 99, 97, 110, 32, 99, 111, 109, 109, 117, 110, 105, 99, 97, 116, 101, 32, 119, 105, 116, 104, 32, 101, 97, 99, 104, 32, 111, 116, 104, 101, 114, 32, 116, 111, 32, 115, 111, 108, 118, 101, 32, 116, 97, 115, 107, 115, 46, 32, 84, 104, 105, 115, 32, 97, 108, 108, 111, 119, 115, 32, 102, 111, 114, 32, 109, 111, 114, 101, 32, 99, 111, 109, 112, 108, 101, 120, 32, 97, 110, 100, 32, 115, 111, 112, 104, 105, 115, 116, 105, 99, 97, 116, 101, 100, 32, 97, 112, 112, 108, 105, 99, 97, 116, 105, 111, 110, 115, 32, 116, 104, 97, 110, 32, 119, 111, 117, 108, 100, 32, 98, 101, 32, 112, 111, 115, 115, 105, 98, 108, 101, 32, 119, 105, 116, 104, 32, 97, 32, 115, 105, 110, 103, 108, 101, 32, 76, 76, 77, 46, 10, 45, 32, 42, 42, 67, 117, 115, 116, 111, 109, 105, 122, 97, 116, 105, 111, 110, 42, 42, 58, 32, 65, 117, 116, 111, 71, 101, 110, 32, 97, 103, 101, 110, 116, 115, 32, 99, 97, 110, 32, 98, 101, 32, 99, 117, 115, 116, 111, 109, 105, 122, 101, 100, 32, 116, 111, 32, 109, 101, 101, 116, 32, 116, 104, 101, 32, 115, 112, 101, 99, 105, 102, 105, 99, 32, 110, 101, 101, 100, 115, 32, 111, 102, 32, 97, 110, 32, 97, 112, 112, 108, 105, 99, 97, 116, 105, 111, 110, 46, 32, 84, 104, 105, 115, 32, 105, 110, 99, 108, 117, 100, 101, 115, 32, 116, 104, 101, 32, 97, 98, 105, 108, 105, 116, 121, 32, 116, 111, 32, 99, 104, 111, 111, 115, 101, 32, 116, 104, 101, 32, 76, 76, 77, 115, 32, 116, 111, 32, 117, 115, 101, 44, 32, 116, 104, 101, 32, 116, 121, 112, 101, 115, 32, 111, 102, 32, 104, 117, 109, 97, 110, 32, 105, 110, 112, 117, 116, 32, 116, 111, 32, 97, 108, 108, 111, 119, 44, 32, 97, 110, 100, 32, 116, 104, 101, 32, 116, 111, 111, 108, 115, 32, 116, 111, 32, 101, 109, 112, 108, 111, 121, 46, 10, 45, 32, 42, 42, 72, 117, 109, 97, 110, 32, 112, 97, 114, 116, 105, 99, 105, 112, 97, 116, 105, 111, 110, 42, 42, 58, 32, 65, 117, 116, 111, 71, 101, 110, 32, 115, 101, 97, 109, 108, 101, 115, 115, 108, 121, 32, 97, 108, 108, 111, 119, 115, 32, 104, 117, 109, 97, 110, 32, 112, 97, 114, 116, 105, 99, 105, 112, 97, 116, 105, 111, 110, 46, 32, 84, 104, 105, 115, 32, 109, 101, 97, 110, 115, 32, 116, 104, 97, 116, 32, 104, 117, 109, 97, 110, 115, 32, 99, 97, 110, 32, 112, 114, 111, 118, 105, 100, 101, 32, 105, 110, 112, 117, 116, 32, 97, 110, 100, 32, 102, 101, 101, 100, 98, 97, 99, 107, 32, 116, 111, 32, 116, 104, 101, 32, 97, 103, 101, 110, 116, 115, 32, 97, 115, 32, 110, 101, 101, 100, 101, 100, 46, 10, 10, 70, 111, 114, 32, 91, 101, 120, 97, 109, 112, 108, 101, 93, 40, 104, 116, 116, 112, 115, 58, 47, 47, 103, 105, 116, 104, 117, 98, 46, 99, 111, 109, 47, 109, 105, 99, 114, 111, 115, 111, 102, 116, 47, 97, 117, 116, 111, 103, 101, 110, 47, 98, 108, 111, 98, 47, 109, 97, 105, 110, 47, 116, 101, 115, 116, 47, 116, 119, 111, 97, 103, 101, 110, 116, 46, 112, 121, 41, 44, 10, 10, 96, 96, 96, 112, 121, 116, 104, 111, 110, 10, 102, 114, 111, 109, 32, 97, 117, 116, 111, 103, 101, 110, 32, 105, 109, 112, 111, 114, 116, 32, 65, 115, 115, 105, 115, 116, 97, 110, 116, 65, 103, 101, 110, 116, 44, 32, 85, 115, 101, 114, 80, 114, 111, 120, 121, 65, 103, 101, 110, 116, 44, 32, 99, 111, 110, 102, 105, 103, 95, 108, 105, 115, 116, 95, 102, 114, 111, 109, 95, 106, 115, 111, 110, 10, 35, 32, 76, 111, 97, 100, 32, 76, 76, 77, 32, 105, 110, 102, 101, 114, 101, 110, 99, 101, 32, 101, 110, 100, 112, 111, 105, 110, 116, 115, 32, 102, 114, 111, 109, 32, 97, 110, 32, 101, 110, 118, 32, 118, 97, 114, 105, 97, 98, 108, 101, 32, 111, 114, 32, 97, 32, 102, 105, 108, 101, 10, 35, 32, 83, 101, 101, 32, 104, 116, 116, 112, 115, 58, 47, 47, 109, 105, 99, 114, 111, 115, 111, 102, 116, 46, 103, 105, 116, 104, 117, 98, 46, 105, 111, 47, 97, 117, 116, 111, 103, 101, 110, 47, 100, 111, 99, 115, 47, 70, 65, 81, 35, 115, 101, 116, 45, 121, 111, 117, 114, 45, 97, 112, 105, 45, 101, 110, 100, 112, 111, 105, 110, 116, 115, 10, 35, 32, 97, 110, 100, 32, 79, 65, 73, 95, 67, 79, 78, 70, 73, 71, 95, 76, 73, 83, 84, 95, 115, 97, 109, 112, 108, 101, 10, 99, 111, 110, 102, 105, 103, 95, 108, 105, 115, 116, 32, 61, 32, 99, 111, 110, 102, 105, 103, 95, 108, 105, 115, 116, 95, 102, 114, 111, 109, 95, 106, 115, 111, 110, 40, 101, 110, 118, 95, 111, 114, 95, 102, 105, 108, 101, 61, 34, 79, 65, 73, 95, 67, 79, 78, 70, 73, 71, 95, 76, 73, 83, 84, 34, 41, 10, 35, 32, 89, 111, 117, 32, 99, 97, 110, 32, 97, 108, 115, 111, 32, 115, 101, 116, 32, 99, 111, 110, 102, 105, 103, 95, 108, 105, 115, 116, 32, 100, 105, 114, 101, 99, 116, 108, 121, 32, 97, 115, 32, 97, 32, 108, 105, 115, 116, 44, 32, 102, 111, 114, 32, 101, 120, 97, 109, 112, 108, 101, 44, 32, 99, 111, 110, 102, 105, 103, 95, 108, 105, 115, 116, 32, 61, 32, 91, 123, 39, 109, 111, 100, 101, 108, 39, 58, 32, 39, 103, 112, 116, 45, 52, 39, 44, 32, 39, 97, 112, 105, 95, 107, 101, 121, 39, 58, 32, 39, 60, 121, 111, 117, 114, 32, 79, 112, 101, 110, 65, 73, 32, 65, 80, 73, 32, 107, 101, 121, 32, 104, 101, 114, 101, 62, 39, 125, 44, 93, 10, 97, 115, 115, 105, 115, 116, 97, 110, 116, 32, 61, 32, 65, 115, 115, 105, 115, 116, 97, 110, 116, 65, 103, 101, 110, 116, 40, 34, 97, 115, 115, 105, 115, 116, 97, 110, 116, 34, 44, 32, 108, 108, 109, 95, 99, 111, 110, 102, 105, 103, 61, 123, 34, 99, 111, 110, 102, 105, 103, 95, 108, 105, 115, 116, 34, 58, 32, 99, 111, 110, 102, 105, 103, 95, 108, 105, 115, 116, 125, 41, 10, 117, 115, 101, 114, 95, 112, 114, 111, 120, 121, 32, 61, 32, 85, 115, 101, 114, 80, 114, 111, 120, 121, 65, 103, 101, 110, 116, 40, 34, 117, 115, 101, 114, 95, 112, 114, 111, 120, 121, 34, 44, 32, 99, 111, 100, 101, 95, 101, 120, 101, 99, 117, 116, 105, 111, 110, 95, 99, 111, 110, 102, 105, 103, 61, 123, 34, 119, 111, 114, 107, 95, 100, 105, 114, 34, 58, 32, 34, 99, 111, 100, 105, 110, 103, 34, 44, 32, 34, 117, 115, 101, 95, 100, 111, 99, 107, 101, 114, 34, 58, 32, 70, 97, 108, 115, 101, 125, 41, 32, 35, 32, 73, 77, 80, 79, 82, 84, 65, 78, 84, 58, 32, 115, 101, 116, 32, 116, 111, 32, 84, 114, 117, 101, 32, 116, 111, 32, 114, 117, 110, 32, 99, 111, 100, 101, 32, 105, 110, 32, 100, 111, 99, 107, 101, 114, 44, 32, 114, 101, 99, 111, 109, 109, 101, 110, 100, 101, 100, 10, 117, 115, 101, 114, 95, 112, 114, 111, 120, 121, 46, 105, 110, 105, 116, 105, 97, 116, 101, 95, 99, 104, 97, 116, 40, 97, 115, 115, 105, 115, 116, 97, 110, 116, 44, 32, 109, 101, 115, 115, 97, 103, 101, 61, 34, 80, 108, 111, 116, 32, 97, 32, 99, 104, 97, 114, 116, 32, 111, 102, 32, 78, 86, 68, 65, 32, 97, 110, 100, 32, 84, 69, 83, 76, 65, 32, 115, 116, 111, 99, 107, 32, 112, 114, 105, 99, 101, 32, 99, 104, 97, 110, 103, 101, 32, 89, 84, 68, 46, 34, 41, 10, 35, 32, 84, 104, 105, 115, 32, 105, 110, 105, 116, 105, 97, 116, 101, 115, 32, 97, 110, 32, 97, 117, 116, 111, 109, 97, 116, 101, 100, 32, 99, 104, 97, 116, 32, 98, 101, 116, 119, 101, 101, 110, 32, 116, 104, 101, 32, 116, 119, 111, 32, 97, 103, 101, 110, 116, 115, 32, 116, 111, 32, 115, 111, 108, 118, 101, 32, 116, 104, 101, 32, 116, 97, 115, 107, 10, 96, 96, 96, 10, 10, 109, 111, 114, 101, 32, 112, 121, 116, 104, 111, 110, 32, 99, 111, 100, 101, 58, 10, 10, 96, 96, 96, 112, 121, 116, 104, 111, 110, 10, 32, 32, 32, 32, 100, 101, 102, 32, 99, 114, 101, 97, 116, 101, 40, 10, 32, 32, 32, 32, 32, 32, 32, 32, 115, 101, 108, 102, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 42, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 109, 101, 115, 115, 97, 103, 101, 115, 58, 32, 73, 116, 101, 114, 97, 98, 108, 101, 91, 67, 104, 97, 116, 67, 111, 109, 112, 108, 101, 116, 105, 111, 110, 77, 101, 115, 115, 97, 103, 101, 80, 97, 114, 97, 109, 93, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 109, 111, 100, 101, 108, 58, 32, 85, 110, 105, 111, 110, 91, 115, 116, 114, 44, 32, 67, 104, 97, 116, 77, 111, 100, 101, 108, 93, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 102, 114, 101, 113, 117, 101, 110, 99, 121, 95, 112, 101, 110, 97, 108, 116, 121, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 102, 108, 111, 97, 116, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 102, 117, 110, 99, 116, 105, 111, 110, 95, 99, 97, 108, 108, 58, 32, 99, 111, 109, 112, 108, 101, 116, 105, 111, 110, 95, 99, 114, 101, 97, 116, 101, 95, 112, 97, 114, 97, 109, 115, 46, 70, 117, 110, 99, 116, 105, 111, 110, 67, 97, 108, 108, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 102, 117, 110, 99, 116, 105, 111, 110, 115, 58, 32, 73, 116, 101, 114, 97, 98, 108, 101, 91, 99, 111, 109, 112, 108, 101, 116, 105, 111, 110, 95, 99, 114, 101, 97, 116, 101, 95, 112, 97, 114, 97, 109, 115, 46, 70, 117, 110, 99, 116, 105, 111, 110, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 108, 111, 103, 105, 116, 95, 98, 105, 97, 115, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 68, 105, 99, 116, 91, 115, 116, 114, 44, 32, 105, 110, 116, 93, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 108, 111, 103, 112, 114, 111, 98, 115, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 98, 111, 111, 108, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 109, 97, 120, 95, 116, 111, 107, 101, 110, 115, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 105, 110, 116, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 110, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 105, 110, 116, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 112, 114, 101, 115, 101, 110, 99, 101, 95, 112, 101, 110, 97, 108, 116, 121, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 102, 108, 111, 97, 116, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 114, 101, 115, 112, 111, 110, 115, 101, 95, 102, 111, 114, 109, 97, 116, 58, 32, 99, 111, 109, 112, 108, 101, 116, 105, 111, 110, 95, 99, 114, 101, 97, 116, 101, 95, 112, 97, 114, 97, 109, 115, 46, 82, 101, 115, 112, 111, 110, 115, 101, 70, 111, 114, 109, 97, 116, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 115, 101, 101, 100, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 105, 110, 116, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 115, 116, 111, 112, 58, 32, 85, 110, 105, 111, 110, 91, 79, 112, 116, 105, 111, 110, 97, 108, 91, 115, 116, 114, 93, 44, 32, 76, 105, 115, 116, 91, 115, 116, 114, 93, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 115, 116, 114, 101, 97, 109, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 76, 105, 116, 101, 114, 97, 108, 91, 70, 97, 108, 115, 101, 93, 93, 32, 124, 32, 76, 105, 116, 101, 114, 97, 108, 91, 84, 114, 117, 101, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 115, 116, 114, 101, 97, 109, 95, 111, 112, 116, 105, 111, 110, 115, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 67, 104, 97, 116, 67, 111, 109, 112, 108, 101, 116, 105, 111, 110, 83, 116, 114, 101, 97, 109, 79, 112, 116, 105, 111, 110, 115, 80, 97, 114, 97, 109, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 116, 101, 109, 112, 101, 114, 97, 116, 117, 114, 101, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 102, 108, 111, 97, 116, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 116, 111, 111, 108, 95, 99, 104, 111, 105, 99, 101, 58, 32, 67, 104, 97, 116, 67, 111, 109, 112, 108, 101, 116, 105, 111, 110, 84, 111, 111, 108, 67, 104, 111, 105, 99, 101, 79, 112, 116, 105, 111, 110, 80, 97, 114, 97, 109, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 116, 111, 111, 108, 115, 58, 32, 73, 116, 101, 114, 97, 98, 108, 101, 91, 67, 104, 97, 116, 67, 111, 109, 112, 108, 101, 116, 105, 111, 110, 84, 111, 111, 108, 80, 97, 114, 97, 109, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 116, 111, 112, 95, 108, 111, 103, 112, 114, 111, 98, 115, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 105, 110, 116, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 116, 111, 112, 95, 112, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 102, 108, 111, 97, 116, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 117, 115, 101, 114, 58, 32, 115, 116, 114, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 35, 32, 85, 115, 101, 32, 116, 104, 101, 32, 102, 111, 108, 108, 111, 119, 105, 110, 103, 32, 97, 114, 103, 117, 109, 101, 110, 116, 115, 32, 105, 102, 32, 121, 111, 117, 32, 110, 101, 101, 100, 32, 116, 111, 32, 112, 97, 115, 115, 32, 97, 100, 100, 105, 116, 105, 111, 110, 97, 108, 32, 112, 97, 114, 97, 109, 101, 116, 101, 114, 115, 32, 116, 111, 32, 116, 104, 101, 32, 65, 80, 73, 32, 116, 104, 97, 116, 32, 97, 114, 101, 110, 39, 116, 32, 97, 118, 97, 105, 108, 97, 98, 108, 101, 32, 118, 105, 97, 32, 107, 119, 97, 114, 103, 115, 46, 10, 32, 32, 32, 32, 32, 32, 32, 32, 35, 32, 84, 104, 101, 32, 101, 120, 116, 114, 97, 32, 118, 97, 108, 117, 101, 115, 32, 103, 105, 118, 101, 110, 32, 104, 101, 114, 101, 32, 116, 97, 107, 101, 32, 112, 114, 101, 99, 101, 100, 101, 110, 99, 101, 32, 111, 118, 101, 114, 32, 118, 97, 108, 117, 101, 115, 32, 100, 101, 102, 105, 110, 101, 100, 32, 111, 110, 32, 116, 104, 101, 32, 99, 108, 105, 101, 110, 116, 32, 111, 114, 32, 112, 97, 115, 115, 101, 100, 32, 116, 111, 32, 116, 104, 105, 115, 32, 109, 101, 116, 104, 111, 100, 46, 10, 32, 32, 32, 32, 32, 32, 32, 32, 101, 120, 116, 114, 97, 95, 104, 101, 97, 100, 101, 114, 115, 58, 32, 72, 101, 97, 100, 101, 114, 115, 32, 124, 32, 78, 111, 110, 101, 32, 61, 32, 78, 111, 110, 101, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 101, 120, 116, 114, 97, 95, 113, 117, 101, 114, 121, 58, 32, 81, 117, 101, 114, 121, 32, 124, 32, 78, 111, 110, 101, 32, 61, 32, 78, 111, 110, 101, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 101, 120, 116, 114, 97, 95, 98, 111, 100, 121, 58, 32, 66, 111, 100, 121, 32, 124, 32, 78, 111, 110, 101, 32, 61, 32, 78, 111, 110, 101, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 116, 105, 109, 101, 111, 117, 116, 58, 32, 102, 108, 111, 97, 116, 32, 124, 32, 104, 116, 116, 112, 120, 46, 84, 105, 109, 101, 111, 117, 116, 32, 124, 32, 78, 111, 110, 101, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 32, 32, 32, 32, 41, 32, 45, 62, 32, 67, 104, 97, 116, 67, 111, 109, 112, 108, 101, 116, 105, 111, 110, 32, 124, 32, 83, 116, 114, 101, 97, 109, 91, 67, 104, 97, 116, 67, 111, 109, 112, 108, 101, 116, 105, 111, 110, 67, 104, 117, 110, 107, 93, 58, 10, 32, 32, 32, 32, 32, 32, 32, 32, 114, 101, 116, 117, 114, 110, 32, 115, 101, 108, 102, 46, 95, 112, 111, 115, 116, 40, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 47, 99, 104, 97, 116, 47, 99, 111, 109, 112, 108, 101, 116, 105, 111, 110, 115, 34, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 98, 111, 100, 121, 61, 109, 97, 121, 98, 101, 95, 116, 114, 97, 110, 115, 102, 111, 114, 109, 40, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 123, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 109, 101, 115, 115, 97, 103, 101, 115, 34, 58, 32, 109, 101, 115, 115, 97, 103, 101, 115, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 109, 111, 100, 101, 108, 34, 58, 32, 109, 111, 100, 101, 108, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 102, 114, 101, 113, 117, 101, 110, 99, 121, 95, 112, 101, 110, 97, 108, 116, 121, 34, 58, 32, 102, 114, 101, 113, 117, 101, 110, 99, 121, 95, 112, 101, 110, 97, 108, 116, 121, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 102, 117, 110, 99, 116, 105, 111, 110, 95, 99, 97, 108, 108, 34, 58, 32, 102, 117, 110, 99, 116, 105, 111, 110, 95, 99, 97, 108, 108, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 102, 117, 110, 99, 116, 105, 111, 110, 115, 34, 58, 32, 102, 117, 110, 99, 116, 105, 111, 110, 115, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 108, 111, 103, 105, 116, 95, 98, 105, 97, 115, 34, 58, 32, 108, 111, 103, 105, 116, 95, 98, 105, 97, 115, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 108, 111, 103, 112, 114, 111, 98, 115, 34, 58, 32, 108, 111, 103, 112, 114, 111, 98, 115, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 109, 97, 120, 95, 116, 111, 107, 101, 110, 115, 34, 58, 32, 109, 97, 120, 95, 116, 111, 107, 101, 110, 115, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 110, 34, 58, 32, 110, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 112, 114, 101, 115, 101, 110, 99, 101, 95, 112, 101, 110, 97, 108, 116, 121, 34, 58, 32, 112, 114, 101, 115, 101, 110, 99, 101, 95, 112, 101, 110, 97, 108, 116, 121, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 114, 101, 115, 112, 111, 110, 115, 101, 95, 102, 111, 114, 109, 97, 116, 34, 58, 32, 114, 101, 115, 112, 111, 110, 115, 101, 95, 102, 111, 114, 109, 97, 116, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 115, 101, 101, 100, 34, 58, 32, 115, 101, 101, 100, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 115, 116, 111, 112, 34, 58, 32, 115, 116, 111, 112, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 115, 116, 114, 101, 97, 109, 34, 58, 32, 115, 116, 114, 101, 97, 109, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 115, 116, 114, 101, 97, 109, 95, 111, 112, 116, 105, 111, 110, 115, 34, 58, 32, 115, 116, 114, 101, 97, 109, 95, 111, 112, 116, 105, 111, 110, 115, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 116, 101, 109, 112, 101, 114, 97, 116, 117, 114, 101, 34, 58, 32, 116, 101, 109, 112, 101, 114, 97, 116, 117, 114, 101, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 116, 111, 111, 108, 95, 99, 104, 111, 105, 99, 101, 34, 58, 32, 116, 111, 111, 108, 95, 99, 104, 111, 105, 99, 101, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 116, 111, 111, 108, 115, 34, 58, 32, 116, 111, 111, 108, 115, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 116, 111, 112, 95, 108, 111, 103, 112, 114, 111, 98, 115, 34, 58, 32, 116, 111, 112, 95, 108, 111, 103, 112, 114, 111, 98, 115, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 116, 111, 112, 95, 112, 34, 58, 32, 116, 111, 112, 95, 112, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 34, 117, 115, 101, 114, 34, 58, 32, 117, 115, 101, 114, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 125, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 99, 111, 109, 112, 108, 101, 116, 105, 111, 110, 95, 99, 114, 101, 97, 116, 101, 95, 112, 97, 114, 97, 109, 115, 46, 67, 111, 109, 112, 108, 101, 116, 105, 111, 110, 67, 114, 101, 97, 116, 101, 80, 97, 114, 97, 109, 115, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 41, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 111, 112, 116, 105, 111, 110, 115, 61, 109, 97, 107, 101, 95, 114, 101, 113, 117, 101, 115, 116, 95, 111, 112, 116, 105, 111, 110, 115, 40, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 101, 120, 116, 114, 97, 95, 104, 101, 97, 100, 101, 114, 115, 61, 101, 120, 116, 114, 97, 95, 104, 101, 97, 100, 101, 114, 115, 44, 32, 101, 120, 116, 114, 97, 95, 113, 117, 101, 114, 121, 61, 101, 120, 116, 114, 97, 95, 113, 117, 101, 114, 121, 44, 32, 101, 120, 116, 114, 97, 95, 98, 111, 100, 121, 61, 101, 120, 116, 114, 97, 95, 98, 111, 100, 121, 44, 32, 116, 105, 109, 101, 111, 117, 116, 61, 116, 105, 109, 101, 111, 117, 116, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 41, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 99, 97, 115, 116, 95, 116, 111, 61, 67, 104, 97, 116, 67, 111, 109, 112, 108, 101, 116, 105, 111, 110, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 115, 116, 114, 101, 97, 109, 61, 115, 116, 114, 101, 97, 109, 32, 111, 114, 32, 70, 97, 108, 115, 101, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 115, 116, 114, 101, 97, 109, 95, 99, 108, 115, 61, 83, 116, 114, 101, 97, 109, 91, 67, 104, 97, 116, 67, 111, 109, 112, 108, 101, 116, 105, 111, 110, 67, 104, 117, 110, 107, 93, 44, 10, 32, 32, 32, 32, 32, 32, 32, 32, 41, 10, 96, 96, 96, 10]\n",
      "length: 5397\n"
     ]
    }
   ],
   "source": [
    "text = \"\"\"Autogen enables the next-gen LLM applications with a generic [multi-agent conversation](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat) framework. It offers customizable and conversable agents that integrate LLMs, tools, and humans.\n",
    "By automating chat among multiple capable agents, one can easily make them collectively perform tasks autonomously or with human feedback, including tasks that require using tools via code.\n",
    "\n",
    "Features of this use case include:\n",
    "\n",
    "- **Multi-agent conversations**: AutoGen agents can communicate with each other to solve tasks. This allows for more complex and sophisticated applications than would be possible with a single LLM.\n",
    "- **Customization**: AutoGen agents can be customized to meet the specific needs of an application. This includes the ability to choose the LLMs to use, the types of human input to allow, and the tools to employ.\n",
    "- **Human participation**: AutoGen seamlessly allows human participation. This means that humans can provide input and feedback to the agents as needed.\n",
    "\n",
    "For [example](https://github.com/microsoft/autogen/blob/main/test/twoagent.py),\n",
    "\n",
    "```python\n",
    "from autogen import AssistantAgent, UserProxyAgent, config_list_from_json\n",
    "# Load LLM inference endpoints from an env variable or a file\n",
    "# See https://microsoft.github.io/autogen/docs/FAQ#set-your-api-endpoints\n",
    "# and OAI_CONFIG_LIST_sample\n",
    "config_list = config_list_from_json(env_or_file=\"OAI_CONFIG_LIST\")\n",
    "# You can also set config_list directly as a list, for example, config_list = [{'model': 'gpt-4', 'api_key': '<your OpenAI API key here>'},]\n",
    "assistant = AssistantAgent(\"assistant\", llm_config={\"config_list\": config_list})\n",
    "user_proxy = UserProxyAgent(\"user_proxy\", code_execution_config={\"work_dir\": \"coding\", \"use_docker\": False}) # IMPORTANT: set to True to run code in docker, recommended\n",
    "user_proxy.initiate_chat(assistant, message=\"Plot a chart of NVDA and TESLA stock price change YTD.\")\n",
    "# This initiates an automated chat between the two agents to solve the task\n",
    "```\n",
    "\n",
    "more python code:\n",
    "\n",
    "```python\n",
    "    def create(\n",
    "        self,\n",
    "        *,\n",
    "        messages: Iterable[ChatCompletionMessageParam],\n",
    "        model: Union[str, ChatModel],\n",
    "        frequency_penalty: Optional[float] | NotGiven = NOT_GIVEN,\n",
    "        function_call: completion_create_params.FunctionCall | NotGiven = NOT_GIVEN,\n",
    "        functions: Iterable[completion_create_params.Function] | NotGiven = NOT_GIVEN,\n",
    "        logit_bias: Optional[Dict[str, int]] | NotGiven = NOT_GIVEN,\n",
    "        logprobs: Optional[bool] | NotGiven = NOT_GIVEN,\n",
    "        max_tokens: Optional[int] | NotGiven = NOT_GIVEN,\n",
    "        n: Optional[int] | NotGiven = NOT_GIVEN,\n",
    "        presence_penalty: Optional[float] | NotGiven = NOT_GIVEN,\n",
    "        response_format: completion_create_params.ResponseFormat | NotGiven = NOT_GIVEN,\n",
    "        seed: Optional[int] | NotGiven = NOT_GIVEN,\n",
    "        stop: Union[Optional[str], List[str]] | NotGiven = NOT_GIVEN,\n",
    "        stream: Optional[Literal[False]] | Literal[True] | NotGiven = NOT_GIVEN,\n",
    "        stream_options: Optional[ChatCompletionStreamOptionsParam] | NotGiven = NOT_GIVEN,\n",
    "        temperature: Optional[float] | NotGiven = NOT_GIVEN,\n",
    "        tool_choice: ChatCompletionToolChoiceOptionParam | NotGiven = NOT_GIVEN,\n",
    "        tools: Iterable[ChatCompletionToolParam] | NotGiven = NOT_GIVEN,\n",
    "        top_logprobs: Optional[int] | NotGiven = NOT_GIVEN,\n",
    "        top_p: Optional[float] | NotGiven = NOT_GIVEN,\n",
    "        user: str | NotGiven = NOT_GIVEN,\n",
    "        # Use the following arguments if you need to pass additional parameters to the API that aren't available via kwargs.\n",
    "        # The extra values given here take precedence over values defined on the client or passed to this method.\n",
    "        extra_headers: Headers | None = None,\n",
    "        extra_query: Query | None = None,\n",
    "        extra_body: Body | None = None,\n",
    "        timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,\n",
    "    ) -> ChatCompletion | Stream[ChatCompletionChunk]:\n",
    "        return self._post(\n",
    "            \"/chat/completions\",\n",
    "            body=maybe_transform(\n",
    "                {\n",
    "                    \"messages\": messages,\n",
    "                    \"model\": model,\n",
    "                    \"frequency_penalty\": frequency_penalty,\n",
    "                    \"function_call\": function_call,\n",
    "                    \"functions\": functions,\n",
    "                    \"logit_bias\": logit_bias,\n",
    "                    \"logprobs\": logprobs,\n",
    "                    \"max_tokens\": max_tokens,\n",
    "                    \"n\": n,\n",
    "                    \"presence_penalty\": presence_penalty,\n",
    "                    \"response_format\": response_format,\n",
    "                    \"seed\": seed,\n",
    "                    \"stop\": stop,\n",
    "                    \"stream\": stream,\n",
    "                    \"stream_options\": stream_options,\n",
    "                    \"temperature\": temperature,\n",
    "                    \"tool_choice\": tool_choice,\n",
    "                    \"tools\": tools,\n",
    "                    \"top_logprobs\": top_logprobs,\n",
    "                    \"top_p\": top_p,\n",
    "                    \"user\": user,\n",
    "                },\n",
    "                completion_create_params.CompletionCreateParams,\n",
    "            ),\n",
    "            options=make_request_options(\n",
    "                extra_headers=extra_headers, extra_query=extra_query, extra_body=extra_body, timeout=timeout\n",
    "            ),\n",
    "            cast_to=ChatCompletion,\n",
    "            stream=stream or False,\n",
    "            stream_cls=Stream[ChatCompletionChunk],\n",
    "        )\n",
    "```\n",
    "\"\"\"\n",
    "tokens = text.encode('utf-8') # which will produce raw byte strings\n",
    "tokens = list(map(int, tokens)) # convert the byte strings to integers\n",
    "print('----')\n",
    "print(text)\n",
    "print('length:', len(text))\n",
    "print('----')\n",
    "print(tokens)\n",
    "print('length:', len(tokens))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[(770, (32, 32)), (86, (111, 110)), (73, (101, 110)), (66, (10, 32)), (65, (116, 105)), (57, (44, 10)), (56, (105, 111)), (55, (58, 32)), (55, (32, 116)), (52, (97, 116)), (50, (116, 111)), (50, (101, 32)), (48, (32, 78)), (47, (110, 32)), (44, (114, 101)), (40, (115, 116)), (40, (32, 97)), (38, (115, 32)), (38, (101, 114)), (36, (115, 101)), (35, (97, 108)), (35, (32, 99)), (34, (108, 101)), (32, (116, 104)), (31, (114, 97)), (30, (97, 110)), (29, (110, 116)), (28, (118, 101)), (28, (116, 114)), (28, (111, 109)), (28, (97, 109)), (27, (124, 32)), (27, (103, 101)), (27, (101, 97)), (27, (99, 111)), (27, (78, 111)), (27, (61, 32)), (27, (32, 124)), (27, (32, 61)), (26, (116, 32)), (26, (101, 115)), (25, (105, 110)), (24, (110, 115)), (24, (34, 58)), (24, (32, 34)), (23, (116, 101)), (23, (112, 108)), (23, (109, 112)), (22, (111, 116)), (22, (105, 118)), (22, (101, 116)), (22, (44, 32)), (22, (32, 115)), (21, (112, 116)), (21, (110, 97)), (21, (108, 111)), (21, (105, 115)), (21, (104, 97)), (21, (100, 101)), (21, (84, 95)), (20, (116, 71)), (20, (104, 101)), (20, (95, 71)), (20, (86, 69)), (20, (79, 84)), (20, (78, 79)), (20, (78, 44)), (20, (73, 86)), (20, (71, 105)), (20, (71, 73)), (20, (69, 78)), (19, (114, 111)), (19, (111, 114)), (19, (110, 99)), (19, (109, 97)), (18, (117, 116)), (18, (105, 99)), (18, (97, 115)), (17, (97, 114)), (17, (95, 112)), (16, (111, 108)), (16, (111, 100)), (16, (105, 116)), (16, (101, 100)), (16, (99, 97)), (16, (93, 32)), (16, (79, 112)), (15, (112, 114)), (15, (111, 112)), (15, (111, 32)), (15, (109, 101)), (15, (108, 91)), (15, (101, 120)), (15, (101, 95)), (15, (100, 32)), (15, (97, 103)), (15, (95, 99)), (15, (32, 102)), (14, (117, 115)), (14, (115, 115)), (14, (111, 103)), (14, (110, 101)), (14, (32, 111)), (14, (32, 101)), (14, (32, 79)), (13, (115, 44)), (13, (112, 101)), (13, (111, 111)), (13, (108, 105)), (13, (102, 105)), (13, (32, 112)), (13, (32, 109)), (12, (117, 110)), (12, (117, 101)), (12, (115, 58)), (12, (112, 97)), (12, (99, 104)), (12, (67, 104)), (12, (32, 105)), (11, (121, 32)), (11, (120, 116)), (11, (114, 32)), (11, (108, 115)), (11, (101, 101)), (11, (99, 116)), (11, (99, 101)), (11, (98, 108)), (10, (116, 97)), (10, (111, 117)), (10, (110, 102)), (10, (110, 100)), (10, (108, 108)), (10, (107, 101)), (10, (104, 117)), (10, (97, 98)), (10, (95, 108)), (9, (116, 115)), (9, (116, 93)), (9, (115, 111)), (9, (115, 105)), (9, (115, 97)), (9, (115, 34)), (9, (114, 115)), (9, (112, 111)), (9, (108, 116)), (9, (105, 103)), (9, (104, 111)), (9, (97, 95)), (9, (67, 111)), (9, (32, 104)), (8, (116, 121)), (8, (116, 95)), (8, (116, 67)), (8, (113, 117)), (8, (111, 102)), (8, (110, 103)), (8, (110, 95)), (8, (109, 111)), (8, (105, 97)), (8, (102, 114)), (8, (102, 111)), (8, (101, 108)), (8, (101, 44)), (8, (99, 114)), (8, (97, 32)), (8, (96, 96)), (8, (35, 32)), (8, (32, 76)), (7, (117, 114)), (7, (117, 109)), (7, (115, 46)), (7, (111, 98)), (7, (111, 97)), (7, (109, 95)), (7, (104, 105)), (7, (103, 112)), (7, (103, 105)), (7, (103, 95)), (7, (97, 117)), (7, (46, 10)), (7, (32, 84)), (7, (32, 65)), (6, (114, 109)), (6, (112, 95)), (6, (111, 115)), (6, (111, 105)), (6, (109, 105)), (6, (109, 32)), (6, (102, 117)), (6, (102, 32)), (6, (101, 99)), (6, (98, 115)), (6, (97, 112)), (6, (97, 100)), (6, (95, 102)), (6, (95, 98)), (6, (42, 42)), (6, (32, 114)), (6, (32, 110)), (6, (32, 108)), (6, (10, 10)), (5, (120, 121)), (5, (119, 111)), (5, (119, 105)), (5, (116, 117)), (5, (116, 44)), (5, (115, 112)), (5, (111, 120)), (5, (111, 99)), (5, (110, 118)), (5, (110, 105)), (5, (109, 115)), (5, (108, 121)), (5, (108, 117)), (5, (105, 109)), (5, (105, 108)), (5, (104, 32)), (5, (103, 32)), (5, (102, 108)), (5, (101, 113)), (5, (101, 109)), (5, (100, 121)), (5, (100, 105)), (5, (99, 108)), (5, (99, 107)), (5, (98, 111)), (5, (95, 116)), (5, (95, 111)), (5, (91, 67)), (5, (84, 104)), (5, (80, 97)), (5, (76, 77)), (5, (76, 76)), (5, (34, 116)), (5, (32, 119)), (5, (32, 118)), (5, (32, 117)), (5, (32, 85)), (5, (32, 73)), (5, (10, 35)), (4, (121, 58)), (4, (121, 44)), (4, (118, 97)), (4, (117, 108)), (4, (116, 116)), (4, (116, 112)), (4, (116, 40)), (4, (115, 107)), (4, (114, 121)), (4, (114, 116)), (4, (114, 95)), (4, (114, 44)), (4, (112, 121)), (4, (111, 119)), (4, (110, 67)), (4, (110, 47)), (4, (104, 116)), (4, (102, 101)), (4, (101, 111)), (4, (101, 58)), (4, (100, 111)), (4, (98, 105)), (4, (98, 101)), (4, (93, 44)), (4, (91, 115)), (4, (91, 105)), (4, (91, 102)), (4, (85, 115)), (4, (73, 116)), (4, (65, 117)), (4, (65, 103)), (4, (47, 109)), (4, (47, 97)), (4, (46, 32)), (4, (41, 10)), (4, (40, 10)), (4, (34, 115)), (4, (34, 44)), (4, (32, 100)), (4, (32, 98)), (4, (32, 42)), (4, (32, 41)), (4, (10, 96)), (3, (121, 116)), (3, (121, 111)), (3, (121, 95)), (3, (121, 61)), (3, (121, 34)), (3, (120, 95)), (3, (118, 105)), (3, (117, 100)), (3, (117, 98)), (3, (116, 119)), (3, (116, 47)), (3, (116, 46)), (3, (116, 45)), (3, (116, 34)), (3, (115, 61)), (3, (115, 47)), (3, (114, 117)), (3, (114, 105)), (3, (114, 34)), (3, (112, 115)), (3, (112, 112)), (3, (111, 107)), (3, (111, 71)), (3, (110, 10)), (3, (109, 117)), (3, (109, 93)), (3, (108, 95)), (3, (107, 115)), (3, (105, 122)), (3, (105, 114)), (3, (105, 112)), (3, (105, 45)), (3, (102, 116)), (3, (101, 93)), (3, (101, 91)), (3, (99, 121)), (3, (99, 117)), (3, (99, 105)), (3, (98, 46)), (3, (97, 120)), (3, (97, 107)), (3, (97, 99)), (3, (95, 113)), (3, (95, 104)), (3, (93, 93)), (3, (83, 116)), (3, (76, 105)), (3, (73, 32)), (3, (71, 101)), (3, (70, 97)), (3, (65, 73)), (3, (61, 101)), (3, (58, 47)), (3, (58, 10)), (3, (47, 47)), (3, (46, 105)), (3, (45, 97)), (3, (45, 32)), (3, (42, 58)), (3, (41, 44)), (3, (41, 32)), (3, (34, 117)), (3, (34, 109)), (3, (34, 102)), (3, (32, 91)), (3, (32, 67)), (3, (32, 39)), (3, (32, 35)), (3, (10, 45)), (2, (125, 44)), (2, (125, 41)), (2, (123, 34)), (2, (122, 97)), (2, (121, 65)), (2, (121, 46)), (2, (120, 97)), (2, (119, 115)), (2, (117, 32)), (2, (116, 91)), (2, (116, 65)), (2, (116, 58)), (2, (115, 108)), (2, (114, 107)), (2, (114, 103)), (2, (114, 93)), (2, (114, 80)), (2, (112, 117)), (2, (112, 105)), (2, (112, 58)), (2, (112, 44)), (2, (112, 34)), (2, (111, 118)), (2, (111, 47)), (2, (110, 112)), (2, (110, 107)), (2, (110, 93)), (2, (110, 91)), (2, (110, 84)), (2, (110, 46)), (2, (110, 44)), (2, (110, 42)), (2, (109, 109)), (2, (109, 91)), (2, (108, 118)), (2, (108, 102)), (2, (108, 93)), (2, (108, 58)), (2, (108, 44)), (2, (108, 34)), (2, (108, 32)), (2, (107, 93)), (2, (107, 32)), (2, (106, 115)), (2, (105, 102)), (2, (103, 61)), (2, (101, 121)), (2, (101, 102)), (2, (101, 80)), (2, (101, 61)), (2, (101, 34)), (2, (101, 10)), (2, (100, 112)), (2, (100, 98)), (2, (100, 46)), (2, (99, 115)), (2, (99, 32)), (2, (98, 97)), (2, (97, 105)), (2, (96, 112)), (2, (96, 10)), (2, (95, 106)), (2, (95, 100)), (2, (95, 76)), (2, (95, 67)), (2, (93, 40)), (2, (85, 110)), (2, (84, 114)), (2, (84, 111)), (2, (83, 84)), (2, (80, 114)), (2, (80, 73)), (2, (79, 78)), (2, (79, 65)), (2, (78, 70)), (2, (77, 115)), (2, (77, 32)), (2, (76, 73)), (2, (73, 95)), (2, (73, 83)), (2, (73, 71)), (2, (71, 95)), (2, (70, 117)), (2, (70, 111)), (2, (70, 73)), (2, (67, 97)), (2, (67, 79)), (2, (65, 115)), (2, (65, 80)), (2, (65, 32)), (2, (61, 123)), (2, (61, 109)), (2, (61, 34)), (2, (47, 116)), (2, (47, 100)), (2, (47, 99)), (2, (46, 103)), (2, (46, 70)), (2, (40, 104)), (2, (40, 34)), (2, (39, 58)), (2, (34, 108)), (2, (34, 99)), (2, (34, 41)), (2, (32, 107)), (2, (32, 103)), (2, (32, 89)), (2, (32, 83)), (2, (32, 70)), (2, (10, 117)), (2, (10, 70)), (1, (123, 39)), (1, (123, 10)), (1, (122, 101)), (1, (121, 112)), (1, (121, 98)), (1, (121, 41)), (1, (121, 39)), (1, (120, 101)), (1, (120, 46)), (1, (120, 32)), (1, (119, 101)), (1, (119, 97)), (1, (119, 44)), (1, (118, 95)), (1, (118, 32)), (1, (117, 105)), (1, (116, 125)), (1, (116, 108)), (1, (116, 77)), (1, (116, 61)), (1, (116, 41)), (1, (116, 10)), (1, (115, 102)), (1, (115, 80)), (1, (115, 42)), (1, (115, 40)), (1, (115, 10)), (1, (114, 110)), (1, (114, 102)), (1, (114, 58)), (1, (114, 45)), (1, (112, 120)), (1, (112, 104)), (1, (111, 121)), (1, (111, 61)), (1, (110, 111)), (1, (110, 83)), (1, (110, 80)), (1, (110, 77)), (1, (110, 65)), (1, (110, 58)), (1, (110, 40)), (1, (110, 39)), (1, (110, 34)), (1, (109, 108)), (1, (109, 79)), (1, (109, 61)), (1, (109, 58)), (1, (109, 47)), (1, (109, 44)), (1, (109, 40)), (1, (109, 34)), (1, (108, 109)), (1, (108, 100)), (1, (108, 97)), (1, (108, 80)), (1, (108, 67)), (1, (108, 39)), (1, (107, 119)), (1, (107, 95)), (1, (107, 46)), (1, (107, 44)), (1, (107, 10)), (1, (105, 101)), (1, (105, 100)), (1, (105, 98)), (1, (105, 95)), (1, (103, 117)), (1, (103, 115)), (1, (103, 114)), (1, (103, 108)), (1, (103, 34)), (1, (102, 102)), (1, (102, 46)), (1, (102, 44)), (1, (101, 125)), (1, (101, 119)), (1, (101, 103)), (1, (101, 79)), (1, (101, 70)), (1, (101, 62)), (1, (101, 46)), (1, (101, 45)), (1, (101, 40)), (1, (100, 115)), (1, (100, 100)), (1, (100, 58)), (1, (100, 44)), (1, (100, 34)), (1, (100, 10)), (1, (98, 47)), (1, (97, 121)), (1, (97, 118)), (1, (95, 115)), (1, (95, 114)), (1, (95, 107)), (1, (95, 101)), (1, (93, 58)), (1, (93, 10)), (1, (91, 123)), (1, (91, 109)), (1, (91, 101)), (1, (91, 99)), (1, (91, 98)), (1, (91, 84)), (1, (91, 79)), (1, (91, 76)), (1, (91, 70)), (1, (91, 68)), (1, (89, 111)), (1, (89, 84)), (1, (86, 68)), (1, (84, 105)), (1, (84, 69)), (1, (84, 68)), (1, (84, 65)), (1, (84, 58)), (1, (84, 34)), (1, (83, 101)), (1, (83, 76)), (1, (82, 101)), (1, (82, 84)), (1, (81, 117)), (1, (81, 35)), (1, (80, 108)), (1, (80, 79)), (1, (79, 82)), (1, (78, 86)), (1, (78, 84)), (1, (77, 117)), (1, (77, 111)), (1, (77, 101)), (1, (77, 80)), (1, (77, 46)), (1, (76, 111)), (1, (76, 65)), (1, (73, 77)), (1, (72, 117)), (1, (72, 101)), (1, (70, 101)), (1, (70, 65)), (1, (69, 83)), (1, (68, 105)), (1, (68, 65)), (1, (68, 46)), (1, (67, 117)), (1, (67, 114)), (1, (66, 121)), (1, (66, 111)), (1, (65, 81)), (1, (65, 78)), (1, (62, 39)), (1, (62, 32)), (1, (61, 116)), (1, (61, 115)), (1, (61, 83)), (1, (61, 67)), (1, (60, 121)), (1, (52, 39)), (1, (47, 103)), (1, (47, 98)), (1, (47, 85)), (1, (47, 70)), (1, (46, 112)), (1, (46, 99)), (1, (46, 95)), (1, (46, 84)), (1, (46, 82)), (1, (46, 67)), (1, (46, 34)), (1, (45, 121)), (1, (45, 103)), (1, (45, 101)), (1, (45, 67)), (1, (45, 62)), (1, (45, 52)), (1, (44, 93)), (1, (42, 77)), (1, (42, 72)), (1, (42, 67)), (1, (42, 44)), (1, (40, 101)), (1, (40, 97)), (1, (39, 125)), (1, (39, 116)), (1, (39, 109)), (1, (39, 103)), (1, (39, 97)), (1, (39, 60)), (1, (39, 44)), (1, (35, 115)), (1, (34, 119)), (1, (34, 114)), (1, (34, 112)), (1, (34, 110)), (1, (34, 97)), (1, (34, 80)), (1, (34, 79)), (1, (34, 47)), (1, (32, 125)), (1, (32, 123)), (1, (32, 121)), (1, (32, 81)), (1, (32, 72)), (1, (32, 66)), (1, (32, 45)), (1, (10, 109)), (1, (10, 102)), (1, (10, 99)), (1, (10, 97)), (1, (10, 66))]\n"
     ]
    }
   ],
   "source": [
    "def get_stats(ids):\n",
    "    \"\"\"\n",
    "    Get statistics of the token ids. includes the most common token pairs.\n",
    "    \"\"\"\n",
    "    counts = {}\n",
    "    for pair in zip(ids, ids[1:]):\n",
    "        counts[pair] = counts.get(pair, 0) + 1\n",
    "    return counts\n",
    "\n",
    "stats = get_stats(tokens)\n",
    "# print(stats)\n",
    "print(sorted(((v,k) for k,v in stats.items()), reverse=True))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(' ', ' ')"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chr(32), chr(32) # the space character is the most common character in the text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[65, 117, 116, 111, 103, 101, 110, 32, 101, 110, 97, 98, 108, 101, 115, 32, 116, 104, 101, 32, 110, 101, 120, 116, 45, 103, 101, 110, 32, 76, 76, 77, 32, 97, 112, 112, 108, 105, 99, 97, 116, 105, 111, 110, 115, 32, 119, 105, 116, 104, 32, 97, 32, 103, 101, 110, 101, 114, 105, 99, 32, 91, 109, 117, 108, 116, 105, 45, 97, 103, 101, 110, 116, 32, 99, 111, 110, 118, 101, 114, 115, 97, 116, 105, 111, 110, 93, 40, 104, 116, 116, 112, 115, 58, 47, 47, 109, 105, 99, 114, 111, 115, 111, 102, 116, 46, 103, 105, 116, 104, 117, 98, 46, 105, 111, 47, 97, 117, 116, 111, 103, 101, 110, 47, 100, 111, 99, 115, 47, 85, 115, 101, 45, 67, 97, 115, 101, 115, 47, 97, 103, 101, 110, 116, 95, 99, 104, 97, 116, 41, 32, 102, 114, 97, 109, 101, 119, 111, 114, 107, 46, 32, 73, 116, 32, 111, 102, 102, 101, 114, 115, 32, 99, 117, 115, 116, 111, 109, 105, 122, 97, 98, 108, 101, 32, 97, 110, 100, 32, 99, 111, 110, 118, 101, 114, 115, 97, 98, 108, 101, 32, 97, 103, 101, 110, 116, 115, 32, 116, 104, 97, 116, 32, 105, 110, 116, 101, 103, 114, 97, 116, 101, 32, 76, 76, 77, 115, 44, 32, 116, 111, 111, 108, 115, 44, 32, 97, 110, 100, 32, 104, 117, 109, 97, 110, 115, 46, 10, 66, 121, 32, 97, 117, 116, 111, 109, 97, 116, 105, 110, 103, 32, 99, 104, 97, 116, 32, 97, 109, 111, 110, 103, 32, 109, 117, 108, 116, 105, 112, 108, 101, 32, 99, 97, 112, 97, 98, 108, 101, 32, 97, 103, 101, 110, 116, 115, 44, 32, 111, 110, 101, 32, 99, 97, 110, 32, 101, 97, 115, 105, 108, 121, 32, 109, 97, 107, 101, 32, 116, 104, 101, 109, 32, 99, 111, 108, 108, 101, 99, 116, 105, 118, 101, 108, 121, 32, 112, 101, 114, 102, 111, 114, 109, 32, 116, 97, 115, 107, 115, 32, 97, 117, 116, 111, 110, 111, 109, 111, 117, 115, 108, 121, 32, 111, 114, 32, 119, 105, 116, 104, 32, 104, 117, 109, 97, 110, 32, 102, 101, 101, 100, 98, 97, 99, 107, 44, 32, 105, 110, 99, 108, 117, 100, 105, 110, 103, 32, 116, 97, 115, 107, 115, 32, 116, 104, 97, 116, 32, 114, 101, 113, 117, 105, 114, 101, 32, 117, 115, 105, 110, 103, 32, 116, 111, 111, 108, 115, 32, 118, 105, 97, 32, 99, 111, 100, 101, 46, 10, 10, 70, 101, 97, 116, 117, 114, 101, 115, 32, 111, 102, 32, 116, 104, 105, 115, 32, 117, 115, 101, 32, 99, 97, 115, 101, 32, 105, 110, 99, 108, 117, 100, 101, 58, 10, 10, 45, 32, 42, 42, 77, 117, 108, 116, 105, 45, 97, 103, 101, 110, 116, 32, 99, 111, 110, 118, 101, 114, 115, 97, 116, 105, 111, 110, 115, 42, 42, 58, 32, 65, 117, 116, 111, 71, 101, 110, 32, 97, 103, 101, 110, 116, 115, 32, 99, 97, 110, 32, 99, 111, 109, 109, 117, 110, 105, 99, 97, 116, 101, 32, 119, 105, 116, 104, 32, 101, 97, 99, 104, 32, 111, 116, 104, 101, 114, 32, 116, 111, 32, 115, 111, 108, 118, 101, 32, 116, 97, 115, 107, 115, 46, 32, 84, 104, 105, 115, 32, 97, 108, 108, 111, 119, 115, 32, 102, 111, 114, 32, 109, 111, 114, 101, 32, 99, 111, 109, 112, 108, 101, 120, 32, 97, 110, 100, 32, 115, 111, 112, 104, 105, 115, 116, 105, 99, 97, 116, 101, 100, 32, 97, 112, 112, 108, 105, 99, 97, 116, 105, 111, 110, 115, 32, 116, 104, 97, 110, 32, 119, 111, 117, 108, 100, 32, 98, 101, 32, 112, 111, 115, 115, 105, 98, 108, 101, 32, 119, 105, 116, 104, 32, 97, 32, 115, 105, 110, 103, 108, 101, 32, 76, 76, 77, 46, 10, 45, 32, 42, 42, 67, 117, 115, 116, 111, 109, 105, 122, 97, 116, 105, 111, 110, 42, 42, 58, 32, 65, 117, 116, 111, 71, 101, 110, 32, 97, 103, 101, 110, 116, 115, 32, 99, 97, 110, 32, 98, 101, 32, 99, 117, 115, 116, 111, 109, 105, 122, 101, 100, 32, 116, 111, 32, 109, 101, 101, 116, 32, 116, 104, 101, 32, 115, 112, 101, 99, 105, 102, 105, 99, 32, 110, 101, 101, 100, 115, 32, 111, 102, 32, 97, 110, 32, 97, 112, 112, 108, 105, 99, 97, 116, 105, 111, 110, 46, 32, 84, 104, 105, 115, 32, 105, 110, 99, 108, 117, 100, 101, 115, 32, 116, 104, 101, 32, 97, 98, 105, 108, 105, 116, 121, 32, 116, 111, 32, 99, 104, 111, 111, 115, 101, 32, 116, 104, 101, 32, 76, 76, 77, 115, 32, 116, 111, 32, 117, 115, 101, 44, 32, 116, 104, 101, 32, 116, 121, 112, 101, 115, 32, 111, 102, 32, 104, 117, 109, 97, 110, 32, 105, 110, 112, 117, 116, 32, 116, 111, 32, 97, 108, 108, 111, 119, 44, 32, 97, 110, 100, 32, 116, 104, 101, 32, 116, 111, 111, 108, 115, 32, 116, 111, 32, 101, 109, 112, 108, 111, 121, 46, 10, 45, 32, 42, 42, 72, 117, 109, 97, 110, 32, 112, 97, 114, 116, 105, 99, 105, 112, 97, 116, 105, 111, 110, 42, 42, 58, 32, 65, 117, 116, 111, 71, 101, 110, 32, 115, 101, 97, 109, 108, 101, 115, 115, 108, 121, 32, 97, 108, 108, 111, 119, 115, 32, 104, 117, 109, 97, 110, 32, 112, 97, 114, 116, 105, 99, 105, 112, 97, 116, 105, 111, 110, 46, 32, 84, 104, 105, 115, 32, 109, 101, 97, 110, 115, 32, 116, 104, 97, 116, 32, 104, 117, 109, 97, 110, 115, 32, 99, 97, 110, 32, 112, 114, 111, 118, 105, 100, 101, 32, 105, 110, 112, 117, 116, 32, 97, 110, 100, 32, 102, 101, 101, 100, 98, 97, 99, 107, 32, 116, 111, 32, 116, 104, 101, 32, 97, 103, 101, 110, 116, 115, 32, 97, 115, 32, 110, 101, 101, 100, 101, 100, 46, 10, 10, 70, 111, 114, 32, 91, 101, 120, 97, 109, 112, 108, 101, 93, 40, 104, 116, 116, 112, 115, 58, 47, 47, 103, 105, 116, 104, 117, 98, 46, 99, 111, 109, 47, 109, 105, 99, 114, 111, 115, 111, 102, 116, 47, 97, 117, 116, 111, 103, 101, 110, 47, 98, 108, 111, 98, 47, 109, 97, 105, 110, 47, 116, 101, 115, 116, 47, 116, 119, 111, 97, 103, 101, 110, 116, 46, 112, 121, 41, 44, 10, 10, 96, 96, 96, 112, 121, 116, 104, 111, 110, 10, 102, 114, 111, 109, 32, 97, 117, 116, 111, 103, 101, 110, 32, 105, 109, 112, 111, 114, 116, 32, 65, 115, 115, 105, 115, 116, 97, 110, 116, 65, 103, 101, 110, 116, 44, 32, 85, 115, 101, 114, 80, 114, 111, 120, 121, 65, 103, 101, 110, 116, 44, 32, 99, 111, 110, 102, 105, 103, 95, 108, 105, 115, 116, 95, 102, 114, 111, 109, 95, 106, 115, 111, 110, 10, 35, 32, 76, 111, 97, 100, 32, 76, 76, 77, 32, 105, 110, 102, 101, 114, 101, 110, 99, 101, 32, 101, 110, 100, 112, 111, 105, 110, 116, 115, 32, 102, 114, 111, 109, 32, 97, 110, 32, 101, 110, 118, 32, 118, 97, 114, 105, 97, 98, 108, 101, 32, 111, 114, 32, 97, 32, 102, 105, 108, 101, 10, 35, 32, 83, 101, 101, 32, 104, 116, 116, 112, 115, 58, 47, 47, 109, 105, 99, 114, 111, 115, 111, 102, 116, 46, 103, 105, 116, 104, 117, 98, 46, 105, 111, 47, 97, 117, 116, 111, 103, 101, 110, 47, 100, 111, 99, 115, 47, 70, 65, 81, 35, 115, 101, 116, 45, 121, 111, 117, 114, 45, 97, 112, 105, 45, 101, 110, 100, 112, 111, 105, 110, 116, 115, 10, 35, 32, 97, 110, 100, 32, 79, 65, 73, 95, 67, 79, 78, 70, 73, 71, 95, 76, 73, 83, 84, 95, 115, 97, 109, 112, 108, 101, 10, 99, 111, 110, 102, 105, 103, 95, 108, 105, 115, 116, 32, 61, 32, 99, 111, 110, 102, 105, 103, 95, 108, 105, 115, 116, 95, 102, 114, 111, 109, 95, 106, 115, 111, 110, 40, 101, 110, 118, 95, 111, 114, 95, 102, 105, 108, 101, 61, 34, 79, 65, 73, 95, 67, 79, 78, 70, 73, 71, 95, 76, 73, 83, 84, 34, 41, 10, 35, 32, 89, 111, 117, 32, 99, 97, 110, 32, 97, 108, 115, 111, 32, 115, 101, 116, 32, 99, 111, 110, 102, 105, 103, 95, 108, 105, 115, 116, 32, 100, 105, 114, 101, 99, 116, 108, 121, 32, 97, 115, 32, 97, 32, 108, 105, 115, 116, 44, 32, 102, 111, 114, 32, 101, 120, 97, 109, 112, 108, 101, 44, 32, 99, 111, 110, 102, 105, 103, 95, 108, 105, 115, 116, 32, 61, 32, 91, 123, 39, 109, 111, 100, 101, 108, 39, 58, 32, 39, 103, 112, 116, 45, 52, 39, 44, 32, 39, 97, 112, 105, 95, 107, 101, 121, 39, 58, 32, 39, 60, 121, 111, 117, 114, 32, 79, 112, 101, 110, 65, 73, 32, 65, 80, 73, 32, 107, 101, 121, 32, 104, 101, 114, 101, 62, 39, 125, 44, 93, 10, 97, 115, 115, 105, 115, 116, 97, 110, 116, 32, 61, 32, 65, 115, 115, 105, 115, 116, 97, 110, 116, 65, 103, 101, 110, 116, 40, 34, 97, 115, 115, 105, 115, 116, 97, 110, 116, 34, 44, 32, 108, 108, 109, 95, 99, 111, 110, 102, 105, 103, 61, 123, 34, 99, 111, 110, 102, 105, 103, 95, 108, 105, 115, 116, 34, 58, 32, 99, 111, 110, 102, 105, 103, 95, 108, 105, 115, 116, 125, 41, 10, 117, 115, 101, 114, 95, 112, 114, 111, 120, 121, 32, 61, 32, 85, 115, 101, 114, 80, 114, 111, 120, 121, 65, 103, 101, 110, 116, 40, 34, 117, 115, 101, 114, 95, 112, 114, 111, 120, 121, 34, 44, 32, 99, 111, 100, 101, 95, 101, 120, 101, 99, 117, 116, 105, 111, 110, 95, 99, 111, 110, 102, 105, 103, 61, 123, 34, 119, 111, 114, 107, 95, 100, 105, 114, 34, 58, 32, 34, 99, 111, 100, 105, 110, 103, 34, 44, 32, 34, 117, 115, 101, 95, 100, 111, 99, 107, 101, 114, 34, 58, 32, 70, 97, 108, 115, 101, 125, 41, 32, 35, 32, 73, 77, 80, 79, 82, 84, 65, 78, 84, 58, 32, 115, 101, 116, 32, 116, 111, 32, 84, 114, 117, 101, 32, 116, 111, 32, 114, 117, 110, 32, 99, 111, 100, 101, 32, 105, 110, 32, 100, 111, 99, 107, 101, 114, 44, 32, 114, 101, 99, 111, 109, 109, 101, 110, 100, 101, 100, 10, 117, 115, 101, 114, 95, 112, 114, 111, 120, 121, 46, 105, 110, 105, 116, 105, 97, 116, 101, 95, 99, 104, 97, 116, 40, 97, 115, 115, 105, 115, 116, 97, 110, 116, 44, 32, 109, 101, 115, 115, 97, 103, 101, 61, 34, 80, 108, 111, 116, 32, 97, 32, 99, 104, 97, 114, 116, 32, 111, 102, 32, 78, 86, 68, 65, 32, 97, 110, 100, 32, 84, 69, 83, 76, 65, 32, 115, 116, 111, 99, 107, 32, 112, 114, 105, 99, 101, 32, 99, 104, 97, 110, 103, 101, 32, 89, 84, 68, 46, 34, 41, 10, 35, 32, 84, 104, 105, 115, 32, 105, 110, 105, 116, 105, 97, 116, 101, 115, 32, 97, 110, 32, 97, 117, 116, 111, 109, 97, 116, 101, 100, 32, 99, 104, 97, 116, 32, 98, 101, 116, 119, 101, 101, 110, 32, 116, 104, 101, 32, 116, 119, 111, 32, 97, 103, 101, 110, 116, 115, 32, 116, 111, 32, 115, 111, 108, 118, 101, 32, 116, 104, 101, 32, 116, 97, 115, 107, 10, 96, 96, 96, 10, 10, 109, 111, 114, 101, 32, 112, 121, 116, 104, 111, 110, 32, 99, 111, 100, 101, 58, 10, 10, 96, 96, 96, 112, 121, 116, 104, 111, 110, 10, 1000, 1000, 100, 101, 102, 32, 99, 114, 101, 97, 116, 101, 40, 10, 1000, 1000, 1000, 1000, 115, 101, 108, 102, 44, 10, 1000, 1000, 1000, 1000, 42, 44, 10, 1000, 1000, 1000, 1000, 109, 101, 115, 115, 97, 103, 101, 115, 58, 32, 73, 116, 101, 114, 97, 98, 108, 101, 91, 67, 104, 97, 116, 67, 111, 109, 112, 108, 101, 116, 105, 111, 110, 77, 101, 115, 115, 97, 103, 101, 80, 97, 114, 97, 109, 93, 44, 10, 1000, 1000, 1000, 1000, 109, 111, 100, 101, 108, 58, 32, 85, 110, 105, 111, 110, 91, 115, 116, 114, 44, 32, 67, 104, 97, 116, 77, 111, 100, 101, 108, 93, 44, 10, 1000, 1000, 1000, 1000, 102, 114, 101, 113, 117, 101, 110, 99, 121, 95, 112, 101, 110, 97, 108, 116, 121, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 102, 108, 111, 97, 116, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 1000, 1000, 1000, 1000, 102, 117, 110, 99, 116, 105, 111, 110, 95, 99, 97, 108, 108, 58, 32, 99, 111, 109, 112, 108, 101, 116, 105, 111, 110, 95, 99, 114, 101, 97, 116, 101, 95, 112, 97, 114, 97, 109, 115, 46, 70, 117, 110, 99, 116, 105, 111, 110, 67, 97, 108, 108, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 1000, 1000, 1000, 1000, 102, 117, 110, 99, 116, 105, 111, 110, 115, 58, 32, 73, 116, 101, 114, 97, 98, 108, 101, 91, 99, 111, 109, 112, 108, 101, 116, 105, 111, 110, 95, 99, 114, 101, 97, 116, 101, 95, 112, 97, 114, 97, 109, 115, 46, 70, 117, 110, 99, 116, 105, 111, 110, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 1000, 1000, 1000, 1000, 108, 111, 103, 105, 116, 95, 98, 105, 97, 115, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 68, 105, 99, 116, 91, 115, 116, 114, 44, 32, 105, 110, 116, 93, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 1000, 1000, 1000, 1000, 108, 111, 103, 112, 114, 111, 98, 115, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 98, 111, 111, 108, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 1000, 1000, 1000, 1000, 109, 97, 120, 95, 116, 111, 107, 101, 110, 115, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 105, 110, 116, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 1000, 1000, 1000, 1000, 110, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 105, 110, 116, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 1000, 1000, 1000, 1000, 112, 114, 101, 115, 101, 110, 99, 101, 95, 112, 101, 110, 97, 108, 116, 121, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 102, 108, 111, 97, 116, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 1000, 1000, 1000, 1000, 114, 101, 115, 112, 111, 110, 115, 101, 95, 102, 111, 114, 109, 97, 116, 58, 32, 99, 111, 109, 112, 108, 101, 116, 105, 111, 110, 95, 99, 114, 101, 97, 116, 101, 95, 112, 97, 114, 97, 109, 115, 46, 82, 101, 115, 112, 111, 110, 115, 101, 70, 111, 114, 109, 97, 116, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 1000, 1000, 1000, 1000, 115, 101, 101, 100, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 105, 110, 116, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 1000, 1000, 1000, 1000, 115, 116, 111, 112, 58, 32, 85, 110, 105, 111, 110, 91, 79, 112, 116, 105, 111, 110, 97, 108, 91, 115, 116, 114, 93, 44, 32, 76, 105, 115, 116, 91, 115, 116, 114, 93, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 1000, 1000, 1000, 1000, 115, 116, 114, 101, 97, 109, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 76, 105, 116, 101, 114, 97, 108, 91, 70, 97, 108, 115, 101, 93, 93, 32, 124, 32, 76, 105, 116, 101, 114, 97, 108, 91, 84, 114, 117, 101, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 1000, 1000, 1000, 1000, 115, 116, 114, 101, 97, 109, 95, 111, 112, 116, 105, 111, 110, 115, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 67, 104, 97, 116, 67, 111, 109, 112, 108, 101, 116, 105, 111, 110, 83, 116, 114, 101, 97, 109, 79, 112, 116, 105, 111, 110, 115, 80, 97, 114, 97, 109, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 1000, 1000, 1000, 1000, 116, 101, 109, 112, 101, 114, 97, 116, 117, 114, 101, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 102, 108, 111, 97, 116, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 1000, 1000, 1000, 1000, 116, 111, 111, 108, 95, 99, 104, 111, 105, 99, 101, 58, 32, 67, 104, 97, 116, 67, 111, 109, 112, 108, 101, 116, 105, 111, 110, 84, 111, 111, 108, 67, 104, 111, 105, 99, 101, 79, 112, 116, 105, 111, 110, 80, 97, 114, 97, 109, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 1000, 1000, 1000, 1000, 116, 111, 111, 108, 115, 58, 32, 73, 116, 101, 114, 97, 98, 108, 101, 91, 67, 104, 97, 116, 67, 111, 109, 112, 108, 101, 116, 105, 111, 110, 84, 111, 111, 108, 80, 97, 114, 97, 109, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 1000, 1000, 1000, 1000, 116, 111, 112, 95, 108, 111, 103, 112, 114, 111, 98, 115, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 105, 110, 116, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 1000, 1000, 1000, 1000, 116, 111, 112, 95, 112, 58, 32, 79, 112, 116, 105, 111, 110, 97, 108, 91, 102, 108, 111, 97, 116, 93, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 1000, 1000, 1000, 1000, 117, 115, 101, 114, 58, 32, 115, 116, 114, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 1000, 1000, 1000, 1000, 35, 32, 85, 115, 101, 32, 116, 104, 101, 32, 102, 111, 108, 108, 111, 119, 105, 110, 103, 32, 97, 114, 103, 117, 109, 101, 110, 116, 115, 32, 105, 102, 32, 121, 111, 117, 32, 110, 101, 101, 100, 32, 116, 111, 32, 112, 97, 115, 115, 32, 97, 100, 100, 105, 116, 105, 111, 110, 97, 108, 32, 112, 97, 114, 97, 109, 101, 116, 101, 114, 115, 32, 116, 111, 32, 116, 104, 101, 32, 65, 80, 73, 32, 116, 104, 97, 116, 32, 97, 114, 101, 110, 39, 116, 32, 97, 118, 97, 105, 108, 97, 98, 108, 101, 32, 118, 105, 97, 32, 107, 119, 97, 114, 103, 115, 46, 10, 1000, 1000, 1000, 1000, 35, 32, 84, 104, 101, 32, 101, 120, 116, 114, 97, 32, 118, 97, 108, 117, 101, 115, 32, 103, 105, 118, 101, 110, 32, 104, 101, 114, 101, 32, 116, 97, 107, 101, 32, 112, 114, 101, 99, 101, 100, 101, 110, 99, 101, 32, 111, 118, 101, 114, 32, 118, 97, 108, 117, 101, 115, 32, 100, 101, 102, 105, 110, 101, 100, 32, 111, 110, 32, 116, 104, 101, 32, 99, 108, 105, 101, 110, 116, 32, 111, 114, 32, 112, 97, 115, 115, 101, 100, 32, 116, 111, 32, 116, 104, 105, 115, 32, 109, 101, 116, 104, 111, 100, 46, 10, 1000, 1000, 1000, 1000, 101, 120, 116, 114, 97, 95, 104, 101, 97, 100, 101, 114, 115, 58, 32, 72, 101, 97, 100, 101, 114, 115, 32, 124, 32, 78, 111, 110, 101, 32, 61, 32, 78, 111, 110, 101, 44, 10, 1000, 1000, 1000, 1000, 101, 120, 116, 114, 97, 95, 113, 117, 101, 114, 121, 58, 32, 81, 117, 101, 114, 121, 32, 124, 32, 78, 111, 110, 101, 32, 61, 32, 78, 111, 110, 101, 44, 10, 1000, 1000, 1000, 1000, 101, 120, 116, 114, 97, 95, 98, 111, 100, 121, 58, 32, 66, 111, 100, 121, 32, 124, 32, 78, 111, 110, 101, 32, 61, 32, 78, 111, 110, 101, 44, 10, 1000, 1000, 1000, 1000, 116, 105, 109, 101, 111, 117, 116, 58, 32, 102, 108, 111, 97, 116, 32, 124, 32, 104, 116, 116, 112, 120, 46, 84, 105, 109, 101, 111, 117, 116, 32, 124, 32, 78, 111, 110, 101, 32, 124, 32, 78, 111, 116, 71, 105, 118, 101, 110, 32, 61, 32, 78, 79, 84, 95, 71, 73, 86, 69, 78, 44, 10, 1000, 1000, 41, 32, 45, 62, 32, 67, 104, 97, 116, 67, 111, 109, 112, 108, 101, 116, 105, 111, 110, 32, 124, 32, 83, 116, 114, 101, 97, 109, 91, 67, 104, 97, 116, 67, 111, 109, 112, 108, 101, 116, 105, 111, 110, 67, 104, 117, 110, 107, 93, 58, 10, 1000, 1000, 1000, 1000, 114, 101, 116, 117, 114, 110, 32, 115, 101, 108, 102, 46, 95, 112, 111, 115, 116, 40, 10, 1000, 1000, 1000, 1000, 1000, 1000, 34, 47, 99, 104, 97, 116, 47, 99, 111, 109, 112, 108, 101, 116, 105, 111, 110, 115, 34, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 98, 111, 100, 121, 61, 109, 97, 121, 98, 101, 95, 116, 114, 97, 110, 115, 102, 111, 114, 109, 40, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 123, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 34, 109, 101, 115, 115, 97, 103, 101, 115, 34, 58, 32, 109, 101, 115, 115, 97, 103, 101, 115, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 34, 109, 111, 100, 101, 108, 34, 58, 32, 109, 111, 100, 101, 108, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 34, 102, 114, 101, 113, 117, 101, 110, 99, 121, 95, 112, 101, 110, 97, 108, 116, 121, 34, 58, 32, 102, 114, 101, 113, 117, 101, 110, 99, 121, 95, 112, 101, 110, 97, 108, 116, 121, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 34, 102, 117, 110, 99, 116, 105, 111, 110, 95, 99, 97, 108, 108, 34, 58, 32, 102, 117, 110, 99, 116, 105, 111, 110, 95, 99, 97, 108, 108, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 34, 102, 117, 110, 99, 116, 105, 111, 110, 115, 34, 58, 32, 102, 117, 110, 99, 116, 105, 111, 110, 115, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 34, 108, 111, 103, 105, 116, 95, 98, 105, 97, 115, 34, 58, 32, 108, 111, 103, 105, 116, 95, 98, 105, 97, 115, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 34, 108, 111, 103, 112, 114, 111, 98, 115, 34, 58, 32, 108, 111, 103, 112, 114, 111, 98, 115, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 34, 109, 97, 120, 95, 116, 111, 107, 101, 110, 115, 34, 58, 32, 109, 97, 120, 95, 116, 111, 107, 101, 110, 115, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 34, 110, 34, 58, 32, 110, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 34, 112, 114, 101, 115, 101, 110, 99, 101, 95, 112, 101, 110, 97, 108, 116, 121, 34, 58, 32, 112, 114, 101, 115, 101, 110, 99, 101, 95, 112, 101, 110, 97, 108, 116, 121, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 34, 114, 101, 115, 112, 111, 110, 115, 101, 95, 102, 111, 114, 109, 97, 116, 34, 58, 32, 114, 101, 115, 112, 111, 110, 115, 101, 95, 102, 111, 114, 109, 97, 116, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 34, 115, 101, 101, 100, 34, 58, 32, 115, 101, 101, 100, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 34, 115, 116, 111, 112, 34, 58, 32, 115, 116, 111, 112, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 34, 115, 116, 114, 101, 97, 109, 34, 58, 32, 115, 116, 114, 101, 97, 109, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 34, 115, 116, 114, 101, 97, 109, 95, 111, 112, 116, 105, 111, 110, 115, 34, 58, 32, 115, 116, 114, 101, 97, 109, 95, 111, 112, 116, 105, 111, 110, 115, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 34, 116, 101, 109, 112, 101, 114, 97, 116, 117, 114, 101, 34, 58, 32, 116, 101, 109, 112, 101, 114, 97, 116, 117, 114, 101, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 34, 116, 111, 111, 108, 95, 99, 104, 111, 105, 99, 101, 34, 58, 32, 116, 111, 111, 108, 95, 99, 104, 111, 105, 99, 101, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 34, 116, 111, 111, 108, 115, 34, 58, 32, 116, 111, 111, 108, 115, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 34, 116, 111, 112, 95, 108, 111, 103, 112, 114, 111, 98, 115, 34, 58, 32, 116, 111, 112, 95, 108, 111, 103, 112, 114, 111, 98, 115, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 34, 116, 111, 112, 95, 112, 34, 58, 32, 116, 111, 112, 95, 112, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 34, 117, 115, 101, 114, 34, 58, 32, 117, 115, 101, 114, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 125, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 99, 111, 109, 112, 108, 101, 116, 105, 111, 110, 95, 99, 114, 101, 97, 116, 101, 95, 112, 97, 114, 97, 109, 115, 46, 67, 111, 109, 112, 108, 101, 116, 105, 111, 110, 67, 114, 101, 97, 116, 101, 80, 97, 114, 97, 109, 115, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 41, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 111, 112, 116, 105, 111, 110, 115, 61, 109, 97, 107, 101, 95, 114, 101, 113, 117, 101, 115, 116, 95, 111, 112, 116, 105, 111, 110, 115, 40, 10, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 101, 120, 116, 114, 97, 95, 104, 101, 97, 100, 101, 114, 115, 61, 101, 120, 116, 114, 97, 95, 104, 101, 97, 100, 101, 114, 115, 44, 32, 101, 120, 116, 114, 97, 95, 113, 117, 101, 114, 121, 61, 101, 120, 116, 114, 97, 95, 113, 117, 101, 114, 121, 44, 32, 101, 120, 116, 114, 97, 95, 98, 111, 100, 121, 61, 101, 120, 116, 114, 97, 95, 98, 111, 100, 121, 44, 32, 116, 105, 109, 101, 111, 117, 116, 61, 116, 105, 109, 101, 111, 117, 116, 10, 1000, 1000, 1000, 1000, 1000, 1000, 41, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 99, 97, 115, 116, 95, 116, 111, 61, 67, 104, 97, 116, 67, 111, 109, 112, 108, 101, 116, 105, 111, 110, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 115, 116, 114, 101, 97, 109, 61, 115, 116, 114, 101, 97, 109, 32, 111, 114, 32, 70, 97, 108, 115, 101, 44, 10, 1000, 1000, 1000, 1000, 1000, 1000, 115, 116, 114, 101, 97, 109, 95, 99, 108, 115, 61, 83, 116, 114, 101, 97, 109, 91, 67, 104, 97, 116, 67, 111, 109, 112, 108, 101, 116, 105, 111, 110, 67, 104, 117, 110, 107, 93, 44, 10, 1000, 1000, 1000, 1000, 41, 10, 96, 96, 96, 10]\n",
      "length:  4979\n"
     ]
    }
   ],
   "source": [
    "def merge(ids, pair, idx):\n",
    "    \"\"\"\n",
    "    BPE algorithm\n",
    "    ids: list of integers(tokens)\n",
    "    pair: tuple of consecutive integers\n",
    "    idx: new vocab token to replace the pair\n",
    "    \"\"\"\n",
    "    new_ids = []\n",
    "    i = 0\n",
    "    while i < len(ids):\n",
    "        if i < len(ids) - 1 and ids[i] == pair[0] and ids[i+1] == pair[1]:\n",
    "            new_ids.append(idx)\n",
    "            i += 2\n",
    "        else:\n",
    "            new_ids.append(ids[i])\n",
    "            i += 1\n",
    "    return new_ids\n",
    "\n",
    "# merge the most common pair\n",
    "tokens2 = merge(tokens, (32, 32), 1000)\n",
    "print(tokens2)\n",
    "print('length: ',len(tokens2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "merge (32, 32) to 256\n",
      "merge (256, 256) to 257\n",
      "merge (257, 257) to 258\n",
      "merge (111, 110) to 259\n",
      "merge (101, 110) to 260\n",
      "merge (116, 105) to 261\n",
      "merge (10, 258) to 262\n",
      "merge (58, 32) to 263\n",
      "merge (44, 262) to 264\n",
      "merge (261, 259) to 265\n",
      "merge (101, 32) to 266\n",
      "merge (116, 111) to 267\n",
      "merge (32, 78) to 268\n",
      "merge (97, 116) to 269\n",
      "merge (115, 32) to 270\n",
      "merge (101, 114) to 271\n",
      "merge (114, 101) to 272\n",
      "merge (97, 108) to 273\n",
      "merge (116, 104) to 274\n",
      "merge (115, 116) to 275\n",
      "merge (97, 110) to 276\n",
      "merge (260, 32) to 277\n",
      "merge (97, 109) to 278\n",
      "merge (108, 101) to 279\n",
      "merge (32, 124) to 280\n",
      "merge (105, 110) to 281\n",
      "merge (34, 263) to 282\n",
      "merge (111, 109) to 283\n",
      "merge (61, 268) to 284\n",
      "merge (44, 32) to 285\n",
      "merge (280, 268) to 286\n",
      "merge (257, 34) to 287\n",
      "merge (264, 258) to 288\n",
      "merge (115, 101) to 289\n",
      "merge (108, 111) to 290\n",
      "merge (84, 95) to 291\n",
      "merge (105, 118) to 292\n",
      "merge (292, 277) to 293\n",
      "merge (112, 265) to 294\n",
      "merge (111, 116) to 295\n"
     ]
    }
   ],
   "source": [
    "# complete cycle\n",
    "def get_stats(ids):\n",
    "    counts = {}\n",
    "    for pair in zip(ids, ids[1:]):\n",
    "        counts[pair] = counts.get(pair, 0) +1 \n",
    "    return counts\n",
    "\n",
    "def merge(ids, pair, idx):\n",
    "    newids = []\n",
    "    i = 0\n",
    "    while i < len(ids):\n",
    "        if i < len(ids) - 1 and ids[i] == pair[0] and ids [i+1] == pair[1]:\n",
    "            newids.append(idx)\n",
    "            i += 2\n",
    "        else:\n",
    "            newids.append(ids[i])\n",
    "            i += 1\n",
    "    return newids\n",
    "\n",
    "# merge all the common pairs and create a new vocab\n",
    "vocab_size = 296\n",
    "num_merges = vocab_size - 256 # the utf-8 vocab size is 256\n",
    "ids = list(tokens)\n",
    "\n",
    "\n",
    "merges = {}\n",
    "for i in range(num_merges):\n",
    "    stats = get_stats(ids)\n",
    "    pair = max(stats, key = stats.get) # get the most common pair\n",
    "    idx = 256 + i # new vocab token\n",
    "    print(f'merge {pair} to {idx}')\n",
    "    ids = merge(ids, pair, idx)\n",
    "    merges[pair] = idx\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tokens length:  5397\n",
      "new tokens length:  3365\n",
      "compression rate: 1.60X\n"
     ]
    }
   ],
   "source": [
    "print(\"tokens length: \", len(tokens))\n",
    "print(\"new tokens length: \", len(ids))\n",
    "print(f\"compression rate: {len(tokens) / len(ids):.2f}X\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### decoding\n",
    "\n",
    "Given the sequence of integers [0, vocab_size], converting it into a string."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "---\n",
      "Autogen enables the next-gen LLM applications with a generic [multi-agent conversation](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat) framework. It offers customizable and conversable agents that integrate LLMs, tools, and humans.\n",
      "By automating chat among multiple capable agents, one can easily make them collectively perform tasks autonomously or with human feedback, including tasks that require using tools via code.\n",
      "\n",
      "Features of this use case include:\n",
      "\n",
      "- **Multi-agent conversations**: AutoGen agents can communicate with each other to solve tasks. This allows for more complex and sophisticated applications than would be possible with a single LLM.\n",
      "- **Customization**: AutoGen agents can be customized to meet the specific needs of an application. This includes the ability to choose the LLMs to use, the types of human input to allow, and the tools to employ.\n",
      "- **Human participation**: AutoGen seamlessly allows human participation. This means that humans can provide input and feedback to the agents as needed.\n",
      "\n",
      "For [example](https://github.com/microsoft/autogen/blob/main/test/twoagent.py),\n",
      "\n",
      "```python\n",
      "from autogen import AssistantAgent, UserProxyAgent, config_list_from_json\n",
      "# Load LLM inference endpoints from an env variable or a file\n",
      "# See https://microsoft.github.io/autogen/docs/FAQ#set-your-api-endpoints\n",
      "# and OAI_CONFIG_LIST_sample\n",
      "config_list = config_list_from_json(env_or_file=\"OAI_CONFIG_LIST\")\n",
      "# You can also set config_list directly as a list, for example, config_list = [{'model': 'gpt-4', 'api_key': '<your OpenAI API key here>'},]\n",
      "assistant = AssistantAgent(\"assistant\", llm_config={\"config_list\": config_list})\n",
      "user_proxy = UserProxyAgent(\"user_proxy\", code_execution_config={\"work_dir\": \"coding\", \"use_docker\": False}) # IMPORTANT: set to True to run code in docker, recommended\n",
      "user_proxy.initiate_chat(assistant, message=\"Plot a chart of NVDA and TESLA stock price change YTD.\")\n",
      "# This initiates an automated chat between the two agents to solve the task\n",
      "```\n",
      "\n",
      "more python code:\n",
      "\n",
      "```python\n",
      "    def create(\n",
      "        self,\n",
      "        *,\n",
      "        messages: Iterable[ChatCompletionMessageParam],\n",
      "        model: Union[str, ChatModel],\n",
      "        frequency_penalty: Optional[float] | NotGiven = NOT_GIVEN,\n",
      "        function_call: completion_create_params.FunctionCall | NotGiven = NOT_GIVEN,\n",
      "        functions: Iterable[completion_create_params.Function] | NotGiven = NOT_GIVEN,\n",
      "        logit_bias: Optional[Dict[str, int]] | NotGiven = NOT_GIVEN,\n",
      "        logprobs: Optional[bool] | NotGiven = NOT_GIVEN,\n",
      "        max_tokens: Optional[int] | NotGiven = NOT_GIVEN,\n",
      "        n: Optional[int] | NotGiven = NOT_GIVEN,\n",
      "        presence_penalty: Optional[float] | NotGiven = NOT_GIVEN,\n",
      "        response_format: completion_create_params.ResponseFormat | NotGiven = NOT_GIVEN,\n",
      "        seed: Optional[int] | NotGiven = NOT_GIVEN,\n",
      "        stop: Union[Optional[str], List[str]] | NotGiven = NOT_GIVEN,\n",
      "        stream: Optional[Literal[False]] | Literal[True] | NotGiven = NOT_GIVEN,\n",
      "        stream_options: Optional[ChatCompletionStreamOptionsParam] | NotGiven = NOT_GIVEN,\n",
      "        temperature: Optional[float] | NotGiven = NOT_GIVEN,\n",
      "        tool_choice: ChatCompletionToolChoiceOptionParam | NotGiven = NOT_GIVEN,\n",
      "        tools: Iterable[ChatCompletionToolParam] | NotGiven = NOT_GIVEN,\n",
      "        top_logprobs: Optional[int] | NotGiven = NOT_GIVEN,\n",
      "        top_p: Optional[float] | NotGiven = NOT_GIVEN,\n",
      "        user: str | NotGiven = NOT_GIVEN,\n",
      "        # Use the following arguments if you need to pass additional parameters to the API that aren't available via kwargs.\n",
      "        # The extra values given here take precedence over values defined on the client or passed to this method.\n",
      "        extra_headers: Headers | None = None,\n",
      "        extra_query: Query | None = None,\n",
      "        extra_body: Body | None = None,\n",
      "        timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,\n",
      "    ) -> ChatCompletion | Stream[ChatCompletionChunk]:\n",
      "        return self._post(\n",
      "            \"/chat/completions\",\n",
      "            body=maybe_transform(\n",
      "                {\n",
      "                    \"messages\": messages,\n",
      "                    \"model\": model,\n",
      "                    \"frequency_penalty\": frequency_penalty,\n",
      "                    \"function_call\": function_call,\n",
      "                    \"functions\": functions,\n",
      "                    \"logit_bias\": logit_bias,\n",
      "                    \"logprobs\": logprobs,\n",
      "                    \"max_tokens\": max_tokens,\n",
      "                    \"n\": n,\n",
      "                    \"presence_penalty\": presence_penalty,\n",
      "                    \"response_format\": response_format,\n",
      "                    \"seed\": seed,\n",
      "                    \"stop\": stop,\n",
      "                    \"stream\": stream,\n",
      "                    \"stream_options\": stream_options,\n",
      "                    \"temperature\": temperature,\n",
      "                    \"tool_choice\": tool_choice,\n",
      "                    \"tools\": tools,\n",
      "                    \"top_logprobs\": top_logprobs,\n",
      "                    \"top_p\": top_p,\n",
      "                    \"user\": user,\n",
      "                },\n",
      "                completion_create_params.CompletionCreateParams,\n",
      "            ),\n",
      "            options=make_request_options(\n",
      "                extra_headers=extra_headers, extra_query=extra_query, extra_body=extra_body, timeout=timeout\n",
      "            ),\n",
      "            cast_to=ChatCompletion,\n",
      "            stream=stream or False,\n",
      "            stream_cls=Stream[ChatCompletionChunk],\n",
      "        )\n",
      "```\n",
      "\n",
      "length:  5397\n"
     ]
    }
   ],
   "source": [
    "vocab = {idx: bytes([idx]) for idx in range(256)} # utf-8 vocab\n",
    "for (p0, p1), idx in merges.items():\n",
    "    vocab[idx] = vocab[p0] + vocab[p1] # adding the extra vocab tokens (256 - 296)\n",
    "\n",
    "def decode(ids):\n",
    "    bytetokens = b\"\".join(vocab[i] for i in ids)\n",
    "    text = bytetokens.decode(\"utf-8\", errors=\"replace\") # if there are any errors, replace them with a question mark\n",
    "    return text\n",
    "\n",
    "print('---')\n",
    "print(decode(ids))\n",
    "print('length: ', len(decode(ids)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### encoding\n",
    "convert the string into the tokens"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[104, 107]\n"
     ]
    }
   ],
   "source": [
    "def encode(texts):\n",
    "    tokens = list(texts.encode('utf-8'))\n",
    "    while len(tokens) >=2:\n",
    "        stats = get_stats(tokens)\n",
    "        pair = min(stats, key=lambda p: merges.get(p, float('inf'))) # selects the pair with minimum prioroty\n",
    "        if pair not in merges:\n",
    "            break\n",
    "        idx = merges[pair]\n",
    "        tokens = merge(tokens, pair, idx)\n",
    "    return tokens\n",
    "\n",
    "print(encode(\"hk\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*the line ensures the algorithm respects the merge priorities defined\n",
    "```\n",
    "pair = min(stats, key=lambda p: merges.get(p, float('inf')))\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   presence_penalty \n"
     ]
    }
   ],
   "source": [
    "print(decode(encode(\"   presence_penalty \")))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}