SilentProgrammer commited on
Commit
02d2bd7
·
verified ·
1 Parent(s): 2aa5245

Upload 18 files

Browse files
data/input.txt ADDED
@@ -0,0 +1,839 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Q: What is the capital of France?
2
+ A: Paris.
3
+
4
+ Q: Who wrote Hamlet?
5
+ A: William Shakespeare.
6
+
7
+ Q: What is the largest planet?
8
+ A: Jupiter.
9
+
10
+ Q: What is the boiling point of water?
11
+ A: 100 degrees Celsius.
12
+
13
+ Q: What is the capital of Japan?
14
+ A: Tokyo.
15
+
16
+ Q: Who painted the Mona Lisa?
17
+ A: Leonardo da Vinci.
18
+
19
+ Q: What is the speed of light?
20
+ A: 299,792 kilometers per second.
21
+
22
+ Q: What is the capital of France?
23
+ A: Paris.
24
+
25
+ Q: What is the speed of light?
26
+ A: 299,792 kilometers per second.
27
+
28
+ Q: Who wrote Hamlet?
29
+ A: William Shakespeare.
30
+
31
+ Q: What is the largest planet?
32
+ A: Jupiter.
33
+
34
+ Q: What is the boiling point of water?
35
+ A: 100 degrees Celsius.
36
+
37
+ Q: What is the capital of Japan?
38
+ A: Tokyo.
39
+
40
+ Q: Who painted the Mona Lisa?
41
+ A: Leonardo da Vinci.
42
+
43
+ Q: What is the tallest mountain in the world?
44
+ A: Mount Everest.
45
+
46
+ Q: What is the chemical symbol for gold?
47
+ A: Au.
48
+
49
+ Q: What is the freezing point of water?
50
+ A: 0 degrees Celsius.
51
+
52
+ Q: Who invented the telephone?
53
+ A: Alexander Graham Bell.
54
+
55
+ Q: What is the largest ocean on Earth?
56
+ A: The Pacific Ocean.
57
+
58
+ Q: What is the process of photosynthesis?
59
+ A: Photosynthesis is the process by which plants use sunlight to make food from carbon dioxide and water.
60
+
61
+ Q: What is gravity?
62
+ A: Gravity is the force that attracts objects toward the center of the Earth.
63
+
64
+ Q: Who wrote Hamlet?
65
+ A: William Shakespeare.
66
+
67
+ Q: What is the capital of France?
68
+ A: Paris.
69
+
70
+ Q: What is the speed of light?
71
+ A: 299,792 kilometers per second.
72
+
73
+ Q: What is the largest ocean?
74
+ A: The Pacific Ocean.
75
+
76
+ Q: What is the boiling point of water?
77
+ A: 100 degrees Celsius.
78
+
79
+ Q: Who painted the Mona Lisa?
80
+ A: Leonardo da Vinci.
81
+
82
+ Q: What is the tallest mountain in the world?
83
+ A: Mount Everest.
84
+
85
+ Q: What is the chemical symbol for gold?
86
+ A: Au.
87
+
88
+ Q: What is the freezing point of water?
89
+ A: 0 degrees Celsius.
90
+
91
+ Q: Who invented the telephone?
92
+ A: Alexander Graham Bell.
93
+
94
+ Q: What is the largest planet?
95
+ A: Jupiter.
96
+
97
+ Q: What is gravity?
98
+ A: Gravity is the force that attracts objects toward the center of the Earth.
99
+
100
+ Q: What is the capital of France?
101
+ A: Paris.
102
+
103
+ Q: Who wrote Hamlet?
104
+ A: William Shakespeare.
105
+
106
+ Q: What is the speed of light?
107
+ A: 299,792 kilometers per second.
108
+
109
+ Q: What is the largest ocean?
110
+ A: The Pacific Ocean.
111
+
112
+ Q: What is the boiling point of water?
113
+ A: 100 degrees Celsius.
114
+
115
+ Q: Who painted the Mona Lisa?
116
+ A: Leonardo da Vinci.
117
+
118
+ Q: What is the tallest mountain in the world?
119
+ A: Mount Everest.
120
+
121
+ Q: What is the chemical symbol for gold?
122
+ A: Au.
123
+
124
+ Q: What is the freezing point of water?
125
+ A: 0 degrees Celsius.
126
+
127
+ Q: Who invented the telephone?
128
+ A: Alexander Graham Bell.
129
+
130
+ Q: What is the largest planet?
131
+ A: Jupiter.
132
+
133
+ Q: What is gravity?
134
+ A: Gravity is the force that attracts objects toward the center of the Earth.
135
+
136
+ Q: What is the main ingredient in bread?
137
+ A: Flour.
138
+
139
+ Q: What is the capital of Japan?
140
+ A: Tokyo.
141
+
142
+ Q: What is the largest continent?
143
+ A: Asia.
144
+
145
+ Q: What is the process of photosynthesis?
146
+ A: Photosynthesis is the process by which plants use sunlight to make food from carbon dioxide and water.
147
+
148
+ Q: What is the smallest prime number?
149
+ A: 2.
150
+
151
+ Q: What is the chemical symbol for water?
152
+ A: H2O.
153
+
154
+ Q: Who discovered penicillin?
155
+ A: Alexander Fleming.
156
+
157
+ Q: What is the tallest animal in the world?
158
+ A: The giraffe.
159
+
160
+ Q: What is the largest mammal?
161
+ A: The blue whale.
162
+
163
+ Q: What is the capital of Italy?
164
+ A: Rome.
165
+
166
+ Q: What is the largest desert in the world?
167
+ A: The Sahara Desert.
168
+
169
+ Q: What is the main gas in the air we breathe?
170
+ A: Nitrogen.
171
+
172
+ Q: What is the hardest natural substance?
173
+ A: Diamond.
174
+
175
+ Q: What is the largest country by area?
176
+ A: Russia.
177
+
178
+ Q: What is the longest river in the world?
179
+ A: The Nile.
180
+
181
+ Q: What is the main language spoken in Brazil?
182
+ A: Portuguese.
183
+
184
+ Q: What is the currency of the United States?
185
+ A: The dollar.
186
+
187
+ Q: What is the hottest planet in the solar system?
188
+ A: Venus.
189
+
190
+ Q: What is the largest island in the world?
191
+ A: Greenland.
192
+
193
+ Q: What is the capital of France?
194
+ A: Paris.
195
+
196
+ Q: Who wrote Hamlet?
197
+ A: William Shakespeare.
198
+
199
+ Q: What is the speed of light?
200
+ A: 299,792 kilometers per second.
201
+
202
+ Q: What is the largest ocean?
203
+ A: The Pacific Ocean.
204
+
205
+ Q: What is the boiling point of water?
206
+ A: 100 degrees Celsius.
207
+
208
+ Q: Who painted the Mona Lisa?
209
+ A: Leonardo da Vinci.
210
+
211
+ Q: What is the tallest mountain in the world?
212
+ A: Mount Everest.
213
+
214
+ Q: What is the chemical symbol for gold?
215
+ A: Au.
216
+
217
+ Q: What is the freezing point of water?
218
+ A: 0 degrees Celsius.
219
+
220
+ Q: Who invented the telephone?
221
+ A: Alexander Graham Bell.
222
+
223
+ Q: What is the largest planet?
224
+ A: Jupiter.
225
+
226
+ Q: What is gravity?
227
+ A: Gravity is the force that attracts objects toward the center of the Earth.
228
+
229
+ Q: What is the main ingredient in bread?
230
+ A: Flour.
231
+
232
+ Q: What is the capital of Japan?
233
+ A: Tokyo.
234
+
235
+ Q: What is the largest continent?
236
+ A: Asia.
237
+
238
+ Q: What is the process of photosynthesis?
239
+ A: Photosynthesis is the process by which plants use sunlight to make food from carbon dioxide and water.
240
+
241
+ Q: What is the smallest prime number?
242
+ A: 2.
243
+
244
+ Q: What is the chemical symbol for water?
245
+ A: H2O.
246
+
247
+ Q: Who discovered penicillin?
248
+ A: Alexander Fleming.
249
+
250
+ Q: What is the tallest animal in the world?
251
+ A: The giraffe.
252
+
253
+ Q: What is the largest mammal?
254
+ A: The blue whale.
255
+
256
+ Q: What is the capital of Italy?
257
+ A: Rome.
258
+
259
+ Q: What is the largest desert in the world?
260
+ A: The Sahara Desert.
261
+
262
+ Q: What is the main gas in the air we breathe?
263
+ A: Nitrogen.
264
+
265
+ Q: What is the hardest natural substance?
266
+ A: Diamond.
267
+
268
+ Q: What is the largest country by area?
269
+ A: Russia.
270
+
271
+ Q: What is the longest river in the world?
272
+ A: The Nile.
273
+
274
+ Q: What is the main language spoken in Brazil?
275
+ A: Portuguese.
276
+
277
+ Q: What is the currency of the United States?
278
+ A: The dollar.
279
+
280
+ Q: What is the hottest planet in the solar system?
281
+ A: Venus.
282
+
283
+ Q: What is the largest island in the world?
284
+ A: Greenland.
285
+
286
+ Q: What is the capital of Canada?
287
+ A: Ottawa.
288
+
289
+ Q: Who was the first President of the United States?
290
+ A: George Washington.
291
+
292
+ Q: What is the chemical symbol for iron?
293
+ A: Fe.
294
+
295
+ Q: What is the largest bone in the human body?
296
+ A: The femur.
297
+
298
+ Q: What is the main ingredient in glass?
299
+ A: Silica.
300
+
301
+ Q: What is the capital of Australia?
302
+ A: Canberra.
303
+
304
+ Q: What is the largest lake in the world?
305
+ A: The Caspian Sea.
306
+
307
+ Q: What is the main ingredient in chocolate?
308
+ A: Cocoa.
309
+
310
+ Q: What is the capital of Germany?
311
+ A: Berlin.
312
+
313
+ Q: What is the largest volcano in the world?
314
+ A: Mauna Loa.
315
+
316
+ Q: What is the main ingredient in pizza dough?
317
+ A: Flour.
318
+
319
+ Q: What is the capital of Russia?
320
+ A: Moscow.
321
+
322
+ Q: What is the largest bird in the world?
323
+ A: The ostrich.
324
+
325
+ Q: What is the main ingredient in sushi?
326
+ A: Rice.
327
+
328
+ Q: What is the capital of China?
329
+ A: Beijing.
330
+
331
+ Q: What is the largest fish in the world?
332
+ A: The whale shark.
333
+
334
+ Q: What is the main ingredient in guacamole?
335
+ A: Avocado.
336
+
337
+ Q: What is the capital of India?
338
+ A: New Delhi.
339
+
340
+ Q: What is the largest reptile in the world?
341
+ A: The saltwater crocodile.
342
+
343
+ Q: What is the main ingredient in hummus?
344
+ A: Chickpeas.
345
+
346
+ Q: What is the capital of Brazil?
347
+ A: Brasília.
348
+
349
+ Q: What is the largest flower in the world?
350
+ A: Rafflesia arnoldii.
351
+
352
+ Q: What is the main ingredient in bread?
353
+ A: Flour.
354
+
355
+ Q: What is the capital of France?
356
+ A: Paris.
357
+
358
+ Q: Who wrote Hamlet?
359
+ A: William Shakespeare.
360
+
361
+ Q: What is the speed of light?
362
+ A: 299,792 kilometers per second.
363
+
364
+ Q: What is the largest ocean?
365
+ A: The Pacific Ocean.
366
+
367
+ Q: What is the boiling point of water?
368
+ A: 100 degrees Celsius.
369
+
370
+ Q: Who painted the Mona Lisa?
371
+ A: Leonardo da Vinci.
372
+
373
+ Q: What is the tallest mountain in the world?
374
+ A: Mount Everest.
375
+
376
+ Q: What is the chemical symbol for gold?
377
+ A: Au.
378
+
379
+ Q: What is the freezing point of water?
380
+ A: 0 degrees Celsius.
381
+
382
+ Q: Who invented the telephone?
383
+ A: Alexander Graham Bell.
384
+
385
+ Q: What is the largest planet?
386
+ A: Jupiter.
387
+
388
+ Q: What is gravity?
389
+ A: Gravity is the force that attracts objects toward the center of the Earth.
390
+
391
+ Q: What is the main ingredient in bread?
392
+ A: Flour.
393
+
394
+ Q: What is the capital of Japan?
395
+ A: Tokyo.
396
+
397
+ Q: What is the largest continent?
398
+ A: Asia.
399
+
400
+ Q: What is the process of photosynthesis?
401
+ A: Photosynthesis is the process by which plants use sunlight to make food from carbon dioxide and water.
402
+
403
+ Q: What is the smallest prime number?
404
+ A: 2.
405
+
406
+ Q: What is the chemical symbol for water?
407
+ A: H2O.
408
+
409
+ Q: Who discovered penicillin?
410
+ A: Alexander Fleming.
411
+
412
+ Q: What is the tallest animal in the world?
413
+ A: The giraffe.
414
+
415
+ Q: What is the largest mammal?
416
+ A: The blue whale.
417
+
418
+ Q: What is the capital of Italy?
419
+ A: Rome.
420
+
421
+ Q: What is the largest desert in the world?
422
+ A: The Sahara Desert.
423
+
424
+ Q: What is the main gas in the air we breathe?
425
+ A: Nitrogen.
426
+
427
+ Q: What is the hardest natural substance?
428
+ A: Diamond.
429
+
430
+ Q: What is the largest country by area?
431
+ A: Russia.
432
+
433
+ Q: What is the longest river in the world?
434
+ A: The Nile.
435
+
436
+ Q: What is the main language spoken in Brazil?
437
+ A: Portuguese.
438
+
439
+ Q: What is the currency of the United States?
440
+ A: The dollar.
441
+
442
+ Q: What is the hottest planet in the solar system?
443
+ A: Venus.
444
+
445
+ Q: What is the largest island in the world?
446
+ A: Greenland.
447
+
448
+ Q: What is the capital of Canada?
449
+ A: Ottawa.
450
+
451
+ Q: Who was the first President of the United States?
452
+ A: George Washington.
453
+
454
+ Q: What is the chemical symbol for iron?
455
+ A: Fe.
456
+
457
+ Q: What is the largest bone in the human body?
458
+ A: The femur.
459
+
460
+ Q: What is the main ingredient in glass?
461
+ A: Silica.
462
+
463
+ Q: What is the capital of Australia?
464
+ A: Canberra.
465
+
466
+ Q: What is the largest lake in the world?
467
+ A: The Caspian Sea.
468
+
469
+ Q: What is the main ingredient in chocolate?
470
+ A: Cocoa.
471
+
472
+ Q: What is the capital of Germany?
473
+ A: Berlin.
474
+
475
+ Q: What is the largest volcano in the world?
476
+ A: Mauna Loa.
477
+
478
+ Q: What is the main ingredient in pizza dough?
479
+ A: Flour.
480
+
481
+ Q: What is the capital of Russia?
482
+ A: Moscow.
483
+
484
+ Q: What is the largest bird in the world?
485
+ A: The ostrich.
486
+
487
+ Q: What is the main ingredient in sushi?
488
+ A: Rice.
489
+
490
+ Q: What is the capital of China?
491
+ A: Beijing.
492
+
493
+ Q: What is the largest fish in the world?
494
+ A: The whale shark.
495
+
496
+ Q: What is the main ingredient in guacamole?
497
+ A: Avocado.
498
+
499
+ Q: What is the capital of India?
500
+ A: New Delhi.
501
+
502
+ Q: What is the largest reptile in the world?
503
+ A: The saltwater crocodile.
504
+
505
+ Q: What is the main ingredient in hummus?
506
+ A: Chickpeas.
507
+
508
+ Q: What is the capital of Brazil?
509
+ A: Brasília.
510
+
511
+ Q: What is the largest flower in the world?
512
+ A: Rafflesia arnoldii.
513
+
514
+ Q: What is the main ingredient in bread?
515
+ A: Flour.
516
+
517
+ Q: What is the capital of Spain?
518
+ A: Madrid.
519
+
520
+ Q: What is the largest peninsula in the world?
521
+ A: The Arabian Peninsula.
522
+
523
+ Q: What is the main ingredient in pasta?
524
+ A: Wheat.
525
+
526
+ Q: What is the capital of Egypt?
527
+ A: Cairo.
528
+
529
+ Q: What is the largest bay in the world?
530
+ A: The Bay of Bengal.
531
+
532
+ Q: What is the main ingredient in mayonnaise?
533
+ A: Eggs.
534
+
535
+ Q: What is the capital of South Africa?
536
+ A: Pretoria.
537
+
538
+ Q: What is the largest canyon in the world?
539
+ A: The Grand Canyon.
540
+
541
+ Q: What is the main ingredient in tofu?
542
+ A: Soybeans.
543
+
544
+ Q: What is the capital of Mexico?
545
+ A: Mexico City.
546
+
547
+ Q: What is the largest coral reef in the world?
548
+ A: The Great Barrier Reef.
549
+
550
+ Q: What is the main ingredient in ketchup?
551
+ A: Tomatoes.
552
+
553
+ Q: What is the capital of Argentina?
554
+ A: Buenos Aires.
555
+
556
+ Q: What is the largest waterfall in the world?
557
+ A: Angel Falls.
558
+
559
+ Q: What is the main ingredient in peanut butter?
560
+ A: Peanuts.
561
+
562
+ Q: What is the capital of Turkey?
563
+ A: Ankara.
564
+
565
+ Q: What is the largest stadium in the world?
566
+ A: Rungrado 1st of May Stadium.
567
+
568
+ Q: What is the main ingredient in miso soup?
569
+ A: Miso paste.
570
+
571
+ Q: What is the capital of Saudi Arabia?
572
+ A: Riyadh.
573
+
574
+ Q: What is the largest archipelago in the world?
575
+ A: Indonesia.
576
+
577
+ Q: What is the main ingredient in falafel?
578
+ A: Chickpeas.
579
+
580
+ Q: What is the capital of Thailand?
581
+ A: Bangkok.
582
+
583
+ Q: What is the largest cave in the world?
584
+ A: Son Doong Cave.
585
+
586
+ Q: What is the main ingredient in curry?
587
+ A: Spices.
588
+
589
+ Q: What is the capital of Sweden?
590
+ A: Stockholm.
591
+
592
+ Q: What is the largest glacier in the world?
593
+ A: Lambert Glacier.
594
+
595
+ Q: What is the main ingredient in risotto?
596
+ A: Rice.
597
+
598
+ Q: What is the capital of Norway?
599
+ A: Oslo.
600
+
601
+ Q: What is the largest hot desert in the world?
602
+ A: The Sahara Desert.
603
+
604
+ Q: What is the main ingredient in borscht?
605
+ A: Beets.
606
+
607
+ Q: What is the capital of Switzerland?
608
+ A: Bern.
609
+
610
+ Q: What is the largest freshwater lake in the world?
611
+ A: Lake Superior.
612
+
613
+ Q: What is the main ingredient in paella?
614
+ A: Rice.
615
+
616
+ Q: What is the capital of the Netherlands?
617
+ A: Amsterdam.
618
+
619
+ Q: What is the largest sea in the world?
620
+ A: The Philippine Sea.
621
+
622
+ Q: What is the main ingredient in pesto?
623
+ A: Basil.
624
+
625
+ Q: What is the capital of Greece?
626
+ A: Athens.
627
+
628
+ Q: What is the largest plateau in the world?
629
+ A: The Tibetan Plateau.
630
+
631
+ Q: What is the main ingredient in gazpacho?
632
+ A: Tomatoes.
633
+
634
+ Q: What is the capital of Portugal?
635
+ A: Lisbon.
636
+
637
+ Q: What is the largest delta in the world?
638
+ A: The Ganges Delta.
639
+
640
+ Q: What is the main ingredient in ramen?
641
+ A: Noodles.
642
+
643
+ Q: What is the capital of South Korea?
644
+ A: Seoul.
645
+
646
+ Q: What is the largest island country in the world?
647
+ A: Indonesia.
648
+
649
+ Q: What is the main ingredient in kimchi?
650
+ A: Cabbage.
651
+
652
+ Q: What is the capital of New Zealand?
653
+ A: Wellington.
654
+
655
+ Q: What is the largest peninsula in Europe?
656
+ A: The Scandinavian Peninsula.
657
+
658
+ Q: What is the main ingredient in gnocchi?
659
+ A: Potatoes.
660
+
661
+ Q: What is the capital of Denmark?
662
+ A: Copenhagen.
663
+
664
+ Q: What is the largest gulf in the world?
665
+ A: The Gulf of Mexico.
666
+
667
+ Q: What is the main ingredient in hummus?
668
+ A: Chickpeas.
669
+
670
+ Q: What is the capital of Finland?
671
+ A: Helsinki.
672
+
673
+ Q: What is the largest archipelago in Europe?
674
+ A: The British Isles.
675
+
676
+ Q: What is the main ingredient in moussaka?
677
+ A: Eggplant.
678
+
679
+ Q: What is the capital of Ireland?
680
+ A: Dublin.
681
+
682
+ Q: What is the largest lake in Africa?
683
+ A: Lake Victoria.
684
+
685
+ Q: What is the main ingredient in tabbouleh?
686
+ A: Parsley.
687
+
688
+ Q: What is the capital of Poland?
689
+ A: Warsaw.
690
+
691
+ Q: What is the largest river in Europe?
692
+ A: The Volga River.
693
+
694
+ Q: What is the main ingredient in sauerkraut?
695
+ A: Cabbage.
696
+
697
+ Q: What is the capital of Hungary?
698
+ A: Budapest.
699
+
700
+ Q: What is the largest city in the world by population?
701
+ A: Tokyo.
702
+
703
+ Q: What is the main ingredient in tempura?
704
+ A: Batter.
705
+
706
+ Q: What is the capital of Austria?
707
+ A: Vienna.
708
+
709
+ Q: What is the largest waterfall in Africa?
710
+ A: Victoria Falls.
711
+
712
+ Q: What is the main ingredient in baklava?
713
+ A: Phyllo dough.
714
+
715
+ Q: What is the capital of Belgium?
716
+ A: Brussels.
717
+
718
+ Q: What is the largest peninsula in North America?
719
+ A: The Labrador Peninsula.
720
+
721
+ Q: What is the main ingredient in couscous?
722
+ A: Semolina.
723
+
724
+ Q: What is the capital of Czech Republic?
725
+ A: Prague.
726
+
727
+ Q: What is the largest island in the Mediterranean Sea?
728
+ A: Sicily.
729
+
730
+ Q: What is the main ingredient in tzatziki?
731
+ A: Yogurt.
732
+
733
+ Q: What is the capital of Romania?
734
+ A: Bucharest.
735
+
736
+ Q: What is the largest city in Africa?
737
+ A: Lagos.
738
+
739
+ Q: What is the main ingredient in goulash?
740
+ A: Beef.
741
+
742
+ Q: What is the capital of Ukraine?
743
+ A: Kyiv.
744
+
745
+ Q: What is the largest city in South America?
746
+ A: São Paulo.
747
+
748
+ Q: What is the main ingredient in feijoada?
749
+ A: Beans.
750
+
751
+ Q: What is the capital of Chile?
752
+ A: Santiago.
753
+
754
+ Q: What is the largest city in Australia?
755
+ A: Sydney.
756
+
757
+ Q: What is the main ingredient in pavlova?
758
+ A: Egg whites.
759
+
760
+ Q: What is the capital of Peru?
761
+ A: Lima.
762
+
763
+ Q: What is the largest city in Canada?
764
+ A: Toronto.
765
+
766
+ Q: What is the main ingredient in poutine?
767
+ A: French fries.
768
+
769
+ Q: What is the capital of Colombia?
770
+ A: Bogotá.
771
+
772
+ Q: What is the largest city in India?
773
+ A: Mumbai.
774
+
775
+ Q: What is the main ingredient in biryani?
776
+ A: Rice.
777
+
778
+ Q: What is the capital of Pakistan?
779
+ A: Islamabad.
780
+
781
+ Q: What is the largest city in China?
782
+ A: Shanghai.
783
+
784
+ Q: What is the main ingredient in hot pot?
785
+ A: Broth.
786
+
787
+ Q: What is the capital of Indonesia?
788
+ A: Jakarta.
789
+
790
+ Q: What is the largest city in Russia?
791
+ A: Moscow.
792
+
793
+ Q: What is the main ingredient in borscht?
794
+ A: Beets.
795
+
796
+ Q: What is the capital of Saudi Arabia?
797
+ A: Riyadh.
798
+
799
+ Q: What is the largest city in Turkey?
800
+ A: Istanbul.
801
+
802
+ Q: What is the main ingredient in kebab?
803
+ A: Meat.
804
+
805
+ Q: What is the capital of Iran?
806
+ A: Tehran.
807
+
808
+ Q: What is the largest city in Egypt?
809
+ A: Cairo.
810
+
811
+ Q: What is the main ingredient in koshari?
812
+ A: Rice.
813
+
814
+ Q: What is the capital of Nigeria?
815
+ A: Abuja.
816
+
817
+ Q: What is the largest city in Nigeria?
818
+ A: Lagos.
819
+
820
+ Q: What is the main ingredient in jollof rice?
821
+ A: Rice.
822
+
823
+ Q: What is the capital of Kenya?
824
+ A: Nairobi.
825
+
826
+ Q: What is the largest city in Kenya?
827
+ A: Nairobi.
828
+
829
+ Q: What is the main ingredient in ugali?
830
+ A: Maize flour.
831
+
832
+ Q: What is the capital of Ethiopia?
833
+ A: Addis Ababa.
834
+
835
+ Q: What is the largest city in Ethiopia?
836
+ A: Addis Ababa.
837
+
838
+ Q: What is the main ingredient in injera?
839
+ A: Teff flour.
finetune.py ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from datasets import load_dataset
3
+ from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
4
+
5
+ # Load your data
6
+ dataset = load_dataset("json", data_files={"train": "qa_data.jsonl"})
7
+
8
+ # Choose a model (GPT-2 small is easy to start)
9
+ model_name = "gpt2"
10
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
11
+ model = AutoModelForCausalLM.from_pretrained(model_name)
12
+
13
+ # Add pad token if missing (GPT-2 doesn't have one by default)
14
+ if tokenizer.pad_token is None:
15
+ tokenizer.pad_token = tokenizer.eos_token
16
+
17
+ # Tokenize
18
+ def preprocess(example):
19
+ prompt = example["prompt"]
20
+ response = example["response"]
21
+ text = prompt + " " + response
22
+ tokens = tokenizer(
23
+ text,
24
+ truncation=True,
25
+ padding="max_length",
26
+ max_length=128,
27
+ )
28
+ tokens["labels"] = tokens["input_ids"].copy()
29
+ return tokens
30
+
31
+ tokenized = dataset["train"].map(preprocess)
32
+
33
+ # Training arguments
34
+ args = TrainingArguments(
35
+ output_dir="gpt2-finetuned-qa",
36
+ per_device_train_batch_size=2,
37
+ num_train_epochs=5,
38
+ logging_steps=10,
39
+ save_steps=50,
40
+ fp16=True if torch.cuda.is_available() else False,
41
+ report_to="none",
42
+ )
43
+
44
+ # Trainer
45
+ trainer = Trainer(
46
+ model=model,
47
+ args=args,
48
+ train_dataset=tokenized,
49
+ )
50
+
51
+ trainer.train()
52
+ model.save_pretrained("gpt2-finetuned-qa")
53
+ tokenizer.save_pretrained("gpt2-finetuned-qa")
generate.py ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from transformers import AutoTokenizer, AutoModelForCausalLM
2
+
3
+ model = AutoModelForCausalLM.from_pretrained("gpt2-finetuned-qa")
4
+ tokenizer = AutoTokenizer.from_pretrained("gpt2-finetuned-qa")
5
+
6
+ while True:
7
+ prompt = input("Q: ").strip()
8
+ if prompt.lower() in ["exit", "quit"]:
9
+ break
10
+ full_prompt = f"Q: {prompt}\nA:"
11
+ inputs = tokenizer(full_prompt, return_tensors="pt")
12
+ outputs = model.generate(**inputs, max_new_tokens=32, pad_token_id=tokenizer.eos_token_id)
13
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
14
+ print()
gpt2-finetuned-qa/config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "gpt2",
3
+ "activation_function": "gelu_new",
4
+ "architectures": [
5
+ "GPT2LMHeadModel"
6
+ ],
7
+ "attn_pdrop": 0.1,
8
+ "bos_token_id": 50256,
9
+ "embd_pdrop": 0.1,
10
+ "eos_token_id": 50256,
11
+ "initializer_range": 0.02,
12
+ "layer_norm_epsilon": 1e-05,
13
+ "model_type": "gpt2",
14
+ "n_ctx": 1024,
15
+ "n_embd": 768,
16
+ "n_head": 12,
17
+ "n_inner": null,
18
+ "n_layer": 12,
19
+ "n_positions": 1024,
20
+ "reorder_and_upcast_attn": false,
21
+ "resid_pdrop": 0.1,
22
+ "scale_attn_by_inverse_layer_idx": false,
23
+ "scale_attn_weights": true,
24
+ "summary_activation": null,
25
+ "summary_first_dropout": 0.1,
26
+ "summary_proj_to_labels": true,
27
+ "summary_type": "cls_index",
28
+ "summary_use_proj": true,
29
+ "task_specific_params": {
30
+ "text-generation": {
31
+ "do_sample": true,
32
+ "max_length": 50
33
+ }
34
+ },
35
+ "torch_dtype": "float32",
36
+ "transformers_version": "4.47.1",
37
+ "use_cache": true,
38
+ "vocab_size": 50257
39
+ }
gpt2-finetuned-qa/generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 50256,
4
+ "eos_token_id": 50256,
5
+ "transformers_version": "4.47.1"
6
+ }
gpt2-finetuned-qa/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
gpt2-finetuned-qa/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5d794b48566c887388ebae13afc50e77184fd881152e032b90b71997dbc350f6
3
+ size 497774208
gpt2-finetuned-qa/special_tokens_map.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<|endoftext|>",
3
+ "eos_token": "<|endoftext|>",
4
+ "pad_token": "<|endoftext|>",
5
+ "unk_token": "<|endoftext|>"
6
+ }
gpt2-finetuned-qa/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
gpt2-finetuned-qa/tokenizer_config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "50256": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ }
12
+ },
13
+ "bos_token": "<|endoftext|>",
14
+ "clean_up_tokenization_spaces": false,
15
+ "eos_token": "<|endoftext|>",
16
+ "extra_special_tokens": {},
17
+ "model_max_length": 1024,
18
+ "pad_token": "<|endoftext|>",
19
+ "tokenizer_class": "GPT2Tokenizer",
20
+ "unk_token": "<|endoftext|>"
21
+ }
gpt2-finetuned-qa/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
model/__pycache__/transformer.cpython-312.pyc ADDED
Binary file (2.24 kB). View file
 
model/transformer.py ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import torch.nn as nn
3
+
4
+ class SimpleGPT(nn.Module):
5
+ def __init__(self, vocab_size, block_size=8, n_embd=128, n_layer=4, n_head=4):
6
+ super().__init__()
7
+ self.token_emb = nn.Embedding(vocab_size, n_embd)
8
+ self.pos_emb = nn.Embedding(block_size, n_embd)
9
+ self.blocks = nn.ModuleList([
10
+ nn.TransformerEncoderLayer(d_model=n_embd, nhead=n_head, dropout=0.1)
11
+ for _ in range(n_layer)
12
+ ])
13
+ self.ln_f = nn.LayerNorm(n_embd)
14
+ self.head = nn.Linear(n_embd, vocab_size)
15
+ self.block_size = block_size
16
+
17
+ def forward(self, idx):
18
+ b, t = idx.size()
19
+ assert t <= self.block_size, "Sequence too long"
20
+ pos = torch.arange(0, t, dtype=torch.long, device=idx.device)
21
+ tok_emb = self.token_emb(idx)
22
+ pos_emb = self.pos_emb(pos)[None, :, :]
23
+ x = tok_emb + pos_emb
24
+ for block in self.blocks:
25
+ x = block(x)
26
+ x = self.ln_f(x)
27
+ logits = self.head(x)
28
+ return logits
qa_data.jsonl ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"prompt": "Q: What is the capital of France?\nA:", "response": "Paris."}
2
+ {"prompt": "Q: Who wrote Hamlet?\nA:", "response": "William Shakespeare."}
3
+ {"prompt": "Q: What is the largest ocean?\nA:", "response": "The Pacific Ocean."}
4
+ {"prompt": "Q: What is the boiling point of water?\nA:", "response": "100 degrees Celsius."}
5
+ {"prompt": "Q: Who painted the Mona Lisa?\nA:", "response": "Leonardo da Vinci."}
6
+ {"prompt": "Q: What is the tallest mountain in the world?\nA:", "response": "Mount Everest."}
7
+ {"prompt": "Q: What is the chemical symbol for gold?\nA:", "response": "Au."}
8
+ {"prompt": "Q: What is the freezing point of water?\nA:", "response": "0 degrees Celsius."}
9
+ {"prompt": "Q: Who invented the telephone?\nA:", "response": "Alexander Graham Bell."}
10
+ {"prompt": "Q: What is the largest planet?\nA:", "response": "Jupiter."}
11
+ {"prompt": "Q: What is gravity?\nA:", "response": "Gravity is the force that attracts objects toward the center of the Earth."}
12
+ {"prompt": "Q: What is the main ingredient in bread?\nA:", "response": "Flour."}
13
+ {"prompt": "Q: What is the capital of Japan?\nA:", "response": "Tokyo."}
14
+ {"prompt": "Q: What is the largest continent?\nA:", "response": "Asia."}
15
+ {"prompt": "Q: What is the process of photosynthesis?\nA:", "response": "Photosynthesis is the process by which plants use sunlight to make food from carbon dioxide and water."}
16
+ {"prompt": "Q: What is the smallest prime number?\nA:", "response": "2."}
17
+ {"prompt": "Q: What is the chemical symbol for water?\nA:", "response": "H2O."}
18
+ {"prompt": "Q: Who discovered penicillin?\nA:", "response": "Alexander Fleming."}
19
+ {"prompt": "Q: What is the tallest animal in the world?\nA:", "response": "The giraffe."}
20
+ {"prompt": "Q: What is the largest mammal?\nA:", "response": "The blue whale."}
21
+ {"prompt": "Q: What is the capital of Italy?\nA:", "response": "Rome."}
22
+ {"prompt": "Q: What is the largest desert in the world?\nA:", "response": "The Sahara Desert."}
23
+ {"prompt": "Q: What is the main gas in the air we breathe?\nA:", "response": "Nitrogen."}
24
+ {"prompt": "Q: What is the hardest natural substance?\nA:", "response": "Diamond."}
25
+ {"prompt": "Q: What is the largest country by area?\nA:", "response": "Russia."}
26
+ {"prompt": "Q: What is the longest river in the world?\nA:", "response": "The Nile."}
27
+ {"prompt": "Q: What is the main language spoken in Brazil?\nA:", "response": "Portuguese."}
28
+ {"prompt": "Q: What is the currency of the United States?\nA:", "response": "The dollar."}
29
+ {"prompt": "Q: What is the hottest planet in the solar system?\nA:", "response": "Venus."}
30
+ {"prompt": "Q: What is the largest island in the world?\nA:", "response": "Greenland."}
31
+ {"prompt": "Q: What is the capital of Canada?\nA:", "response": "Ottawa."}
32
+ {"prompt": "Q: Who was the first President of the United States?\nA:", "response": "George Washington."}
33
+ {"prompt": "Q: What is the chemical symbol for iron?\nA:", "response": "Fe."}
34
+ {"prompt": "Q: What is the largest bone in the human body?\nA:", "response": "The femur."}
35
+ {"prompt": "Q: What is the main ingredient in glass?\nA:", "response": "Silica."}
36
+ {"prompt": "Q: What is the capital of Australia?\nA:", "response": "Canberra."}
37
+ {"prompt": "Q: What is the largest lake in the world?\nA:", "response": "The Caspian Sea."}
38
+ {"prompt": "Q: What is the main ingredient in chocolate?\nA:", "response": "Cocoa."}
39
+ {"prompt": "Q: What is the capital of Germany?\nA:", "response": "Berlin."}
40
+ {"prompt": "Q: What is the largest volcano in the world?\nA:", "response": "Mauna Loa."}
41
+ {"prompt": "Q: What is the main ingredient in pizza dough?\nA:", "response": "Flour."}
42
+ {"prompt": "Q: What is the capital of Russia?\nA:", "response": "Moscow."}
43
+ {"prompt": "Q: What is the largest bird in the world?\nA:", "response": "The ostrich."}
44
+ {"prompt": "Q: What is the main ingredient in sushi?\nA:", "response": "Rice."}
45
+ {"prompt": "Q: What is the capital of China?\nA:", "response": "Beijing."}
46
+ {"prompt": "Q: What is the largest fish in the world?\nA:", "response": "The whale shark."}
47
+ {"prompt": "Q: What is the main ingredient in guacamole?\nA:", "response": "Avocado."}
48
+ {"prompt": "Q: What is the capital of India?\nA:", "response": "New Delhi."}
49
+ {"prompt": "Q: What is the largest reptile in the world?\nA:", "response": "The saltwater crocodile."}
50
+ {"prompt": "Q: What is the main ingredient in hummus?\nA:", "response": "Chickpeas."}
51
+ {"prompt": "Q: What is the capital of Brazil?\nA:", "response": "Brasília."}
52
+ {"prompt": "Q: What is the largest flower in the world?\nA:", "response": "Rafflesia arnoldii."}
53
+ {"prompt": "Q: What is the capital of Spain?\nA:", "response": "Madrid."}
54
+ {"prompt": "Q: What is the largest peninsula in the world?\nA:", "response": "The Arabian Peninsula."}
55
+ {"prompt": "Q: What is the main ingredient in pasta?\nA:", "response": "Wheat."}
56
+ {"prompt": "Q: What is the capital of Egypt?\nA:", "response": "Cairo."}
57
+ {"prompt": "Q: What is the largest bay in the world?\nA:", "response": "The Bay of Bengal."}
58
+ {"prompt": "Q: What is the main ingredient in mayonnaise?\nA:", "response": "Eggs."}
59
+ {"prompt": "Q: What is the capital of South Africa?\nA:", "response": "Pretoria."}
60
+ {"prompt": "Q: What is the largest canyon in the world?\nA:", "response": "The Grand Canyon."}
61
+ {"prompt": "Q: What is the main ingredient in tofu?\nA:", "response": "Soybeans."}
62
+ {"prompt": "Q: What is the capital of Mexico?\nA:", "response": "Mexico City."}
63
+ {"prompt": "Q: What is the largest coral reef in the world?\nA:", "response": "The Great Barrier Reef."}
64
+ {"prompt": "Q: What is the main ingredient in ketchup?\nA:", "response": "Tomatoes."}
65
+ {"prompt": "Q: What is the capital of Argentina?\nA:", "response": "Buenos Aires."}
66
+ {"prompt": "Q: What is the largest waterfall in the world?\nA:", "response": "Angel Falls."}
67
+ {"prompt": "Q: What is the main ingredient in peanut butter?\nA:", "response": "Peanuts."}
68
+ {"prompt": "Q: What is the capital of Turkey?\nA:", "response": "Ankara."}
69
+ {"prompt": "Q: What is the largest stadium in the world?\nA:", "response": "Rungrado 1st of May Stadium."}
70
+ {"prompt": "Q: What is the main ingredient in miso soup?\nA:", "response": "Miso paste."}
71
+ {"prompt": "Q: What is the capital of Saudi Arabia?\nA:", "response": "Riyadh."}
72
+ {"prompt": "Q: What is the largest archipelago in the world?\nA:", "response": "Indonesia."}
73
+ {"prompt": "Q: What is the main ingredient in falafel?\nA:", "response": "Chickpeas."}
74
+ {"prompt": "Q: What is the capital of Thailand?\nA:", "response": "Bangkok."}
75
+ {"prompt": "Q: What is the largest cave in the world?\nA:", "response": "Son Doong Cave."}
76
+ {"prompt": "Q: What is the main ingredient in curry?\nA:", "response": "Spices."}
77
+ {"prompt": "Q: What is the capital of Sweden?\nA:", "response": "Stockholm."}
78
+ {"prompt": "Q: What is the largest glacier in the world?\nA:", "response": "Lambert Glacier."}
79
+ {"prompt": "Q: What is the main ingredient in risotto?\nA:", "response": "Rice."}
80
+ {"prompt": "Q: What is the capital of Norway?\nA:", "response": "Oslo."}
81
+ {"prompt": "Q: What is the largest hot desert in the world?\nA:", "response": "The Sahara Desert."}
82
+ {"prompt": "Q: What is the main ingredient in borscht?\nA:", "response": "Beets."}
83
+ {"prompt": "Q: What is the capital of Switzerland?\nA:", "response": "Bern."}
84
+ {"prompt": "Q: What is the largest freshwater lake in the world?\nA:", "response": "Lake Superior."}
85
+ {"prompt": "Q: What is the main ingredient in paella?\nA:", "response": "Rice."}
86
+ {"prompt": "Q: What is the capital of the Netherlands?\nA:", "response": "Amsterdam."}
87
+ {"prompt": "Q: What is the largest sea in the world?\nA:", "response": "The Philippine Sea."}
88
+ {"prompt": "Q: What is the main ingredient in pesto?\nA:", "response": "Basil."}
89
+ {"prompt": "Q: What is the capital of Greece?\nA:", "response": "Athens."}
90
+ {"prompt": "Q: What is the largest plateau in the world?\nA:", "response": "The Tibetan Plateau."}
91
+ {"prompt": "Q: What is the main ingredient in gazpacho?\nA:", "response": "Tomatoes."}
92
+ {"prompt": "Q: What is the capital of Portugal?\nA:", "response": "Lisbon."}
93
+ {"prompt": "Q: What is the largest delta in the world?\nA:", "response": "The Ganges Delta."}
94
+ {"prompt": "Q: What is the main ingredient in ramen?\nA:", "response": "Noodles."}
95
+ {"prompt": "Q: What is the capital of South Korea?\nA:", "response": "Seoul."}
96
+ {"prompt": "Q: What is the largest island country in the world?\nA:", "response": "Indonesia."}
97
+ {"prompt": "Q: What is the main ingredient in kimchi?\nA:", "response": "Cabbage."}
98
+ {"prompt": "Q: What is the capital of New Zealand?\nA:", "response": "Wellington."}
99
+ {"prompt": "Q: What is the largest peninsula in Europe?\nA:", "response": "The Scandinavian Peninsula."}
100
+ {"prompt": "Q: What is the main ingredient in gnocchi?\nA:", "response": "Potatoes."}
101
+ {"prompt": "Q: What is the capital of Denmark?\nA:", "response": "Copenhagen."}
102
+ {"prompt": "Q: What is the largest gulf in the world?\nA:", "response": "The Gulf of Mexico."}
103
+ {"prompt": "Q: What is the main ingredient in hummus?\nA:", "response": "Chickpeas."}
104
+ {"prompt": "Q: What is the capital of Finland?\nA:", "response": "Helsinki."}
105
+ {"prompt": "Q: What is the largest archipelago in Europe?\nA:", "response": "The British Isles."}
106
+ {"prompt": "Q: What is the main ingredient in moussaka?\nA:", "response": "Eggplant."}
107
+ {"prompt": "Q: What is the capital of Ireland?\nA:", "response": "Dublin."}
108
+ {"prompt": "Q: What is the largest lake in Africa?\nA:", "response": "Lake Victoria."}
109
+ {"prompt": "Q: What is the main ingredient in tabbouleh?\nA:", "response": "Parsley."}
110
+ {"prompt": "Q: What is the capital of Poland?\nA:", "response": "Warsaw."}
111
+ {"prompt": "Q: What is the largest river in Europe?\nA:", "response": "The Volga River."}
112
+ {"prompt": "Q: What is the main ingredient in sauerkraut?\nA:", "response": "Cabbage."}
113
+ {"prompt": "Q: What is the capital of Hungary?\nA:", "response": "Budapest."}
114
+ {"prompt": "Q: What is the largest city in the world by population?\nA:", "response": "Tokyo."}
115
+ {"prompt": "Q: What is the main ingredient in tempura?\nA:", "response": "Batter."}
116
+ {"prompt": "Q: What is the capital of Austria?\nA:", "response": "Vienna."}
117
+ {"prompt": "Q: What is the largest waterfall in Africa?\nA:", "response": "Victoria Falls."}
118
+ {"prompt": "Q: What is the main ingredient in baklava?\nA:", "response": "Phyllo dough."}
119
+ {"prompt": "Q: What is the capital of Belgium?\nA:", "response": "Brussels."}
120
+ {"prompt": "Q: What is the largest peninsula in North America?\nA:", "response": "The Labrador Peninsula."}
121
+ {"prompt": "Q: What is the main ingredient in couscous?\nA:", "response": "Semolina."}
122
+ {"prompt": "Q: What is the capital of Czech Republic?\nA:", "response": "Prague."}
123
+ {"prompt": "Q: What is the largest island in the Mediterranean Sea?\nA:", "response": "Sicily."}
124
+ {"prompt": "Q: What is the main ingredient in tzatziki?\nA:", "response": "Yogurt."}
125
+ {"prompt": "Q: What is the capital of Romania?\nA:", "response": "Bucharest."}
126
+ {"prompt": "Q: What is the largest city in Africa?\nA:", "response": "Lagos."}
127
+ {"prompt": "Q: What is the main ingredient in goulash?\nA:", "response": "Beef."}
128
+ {"prompt": "Q: What is the capital of Ukraine?\nA:", "response": "Kyiv."}
129
+ {"prompt": "Q: What is the largest city in South America?\nA:", "response": "São Paulo."}
130
+ {"prompt": "Q: What is the main ingredient in feijoada?\nA:", "response": "Beans."}
131
+ {"prompt": "Q: What is the capital of Chile?\nA:", "response": "Santiago."}
132
+ {"prompt": "Q: What is the largest city in Australia?\nA:", "response": "Sydney."}
133
+ {"prompt": "Q: What is the main ingredient in pavlova?\nA:", "response": "Egg whites."}
134
+ {"prompt": "Q: What is the capital of Peru?\nA:", "response": "Lima."}
135
+ {"prompt": "Q: What is the largest city in Canada?\nA:", "response": "Toronto."}
136
+ {"prompt": "Q: What is the main ingredient in poutine?\nA:", "response": "French fries."}
137
+ {"prompt": "Q: What is the capital of Colombia?\nA:", "response": "Bogotá."}
138
+ {"prompt": "Q: What is the largest city in India?\nA:", "response": "Mumbai."}
139
+ {"prompt": "Q: What is the main ingredient in biryani?\nA:", "response": "Rice."}
140
+ {"prompt": "Q: What is the capital of Pakistan?\nA:", "response": "Islamabad."}
141
+ {"prompt": "Q: What is the largest city in China?\nA:", "response": "Shanghai."}
142
+ {"prompt": "Q: What is the main ingredient in hot pot?\nA:", "response": "Broth."}
143
+ {"prompt": "Q: What is the capital of Indonesia?\nA:", "response": "Jakarta."}
144
+ {"prompt": "Q: What is the largest city in Russia?\nA:", "response": "Moscow."}
145
+ {"prompt": "Q: What is the main ingredient in borscht?\nA:", "response": "Beets."}
146
+ {"prompt": "Q: What is the capital of Saudi Arabia?\nA:", "response": "Riyadh."}
147
+ {"prompt": "Q: What is the largest city in Turkey?\nA:", "response": "Istanbul."}
148
+ {"prompt": "Q: What is the main ingredient in kebab?\nA:", "response": "Meat."}
149
+ {"prompt": "Q: What is the capital of Iran?\nA:", "response": "Tehran."}
150
+ {"prompt": "Q: What is the largest city in Egypt?\nA:", "response": "Cairo."}
151
+ {"prompt": "Q: What is the main ingredient in koshari?\nA:", "response": "Rice."}
152
+ {"prompt": "Q: What is the capital of Nigeria?\nA:", "response": "Abuja."}
153
+ {"prompt": "Q: What is the largest city in Nigeria?\nA:", "response": "Lagos."}
154
+ {"prompt": "Q: What is the main ingredient in jollof rice?\nA:", "response": "Rice."}
155
+ {"prompt": "Q: What is the capital of Kenya?\nA:", "response": "Nairobi."}
156
+ {"prompt": "Q: What is the largest city in Kenya?\nA:", "response": "Nairobi."}
157
+ {"prompt": "Q: What is the main ingredient in ugali?\nA:", "response": "Maize flour."}
158
+ {"prompt": "Q: What is the capital of Ethiopia?\nA:", "response": "Addis Ababa."}
159
+ {"prompt": "Q: What is the largest city in Ethiopia?\nA:", "response": "Addis Ababa."}
160
+ {"prompt": "Q: What is the main ingredient in injera?\nA:", "response": "Teff flour."}
requirements.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ torch
2
+ tokenizers
3
+ tqdm
tokenizer/merges.txt ADDED
@@ -0,0 +1,745 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #version: 0.2
2
+ č Ċ
3
+ Ġ i
4
+ h e
5
+ Ġ t
6
+ Ġt he
7
+ a t
8
+ Ġ W
9
+ ĠW h
10
+ Ġi s
11
+ ĠWh at
12
+ Ġi n
13
+ e s
14
+ a r
15
+ i n
16
+ a l
17
+ Ġ c
18
+ Ġ o
19
+ a n
20
+ i t
21
+ e n
22
+ e r
23
+ es t
24
+ Ġo f
25
+ Ġ l
26
+ r e
27
+ Ġ w
28
+ g est
29
+ o r
30
+ ar gest
31
+ Ġl argest
32
+ Ġ m
33
+ d i
34
+ a in
35
+ a p
36
+ Ġ p
37
+ en t
38
+ g re
39
+ Ġc ap
40
+ it al
41
+ Ġcap ital
42
+ Ġm ain
43
+ Ġ T
44
+ o n
45
+ Ġin gre
46
+ di ent
47
+ Ġingre dient
48
+ i c
49
+ l d
50
+ Ġ s
51
+ Ġw or
52
+ Ġwor ld
53
+ Ġ S
54
+ ĠT he
55
+ Ġ A
56
+ Ġ f
57
+ i a
58
+ r a
59
+ i l
60
+ Ġ C
61
+ o u
62
+ Ġ b
63
+ Ġ P
64
+ Ġ B
65
+ e l
66
+ u s
67
+ r o
68
+ o l
69
+ i s
70
+ an d
71
+ Ġ d
72
+ e d
73
+ e t
74
+ Ġ M
75
+ c e
76
+ Ġ R
77
+ Ġ g
78
+ it y
79
+ Ġ G
80
+ at er
81
+ n t
82
+ Ġ F
83
+ Ġ L
84
+ ĠWh o
85
+ Ġ E
86
+ a k
87
+ Ġw ater
88
+ a m
89
+ a s
90
+ p e
91
+ in g
92
+ o t
93
+ Ġ N
94
+ a d
95
+ l e
96
+ m b
97
+ o s
98
+ t s
99
+ al l
100
+ Ġf or
101
+ g h
102
+ l an
103
+ Ġ I
104
+ ar d
105
+ ou nt
106
+ c i
107
+ h ot
108
+ s t
109
+ t h
110
+ v er
111
+ Ġ D
112
+ Ġ V
113
+ Ġc ity
114
+ Ġp lan
115
+ Ġs y
116
+ el s
117
+ e c
118
+ l and
119
+ o m
120
+ u r
121
+ Ġ J
122
+ Ġp o
123
+ e gre
124
+ i us
125
+ u l
126
+ Ġ H
127
+ in t
128
+ ĠC els
129
+ Ġd egre
130
+ all est
131
+ Ġpo int
132
+ ĠCels ius
133
+ Ġdegre es
134
+ a g
135
+ a i
136
+ c h
137
+ c o
138
+ e a
139
+ i gh
140
+ m ic
141
+ o k
142
+ r i
143
+ v ity
144
+ Ġ k
145
+ he mic
146
+ Ġt o
147
+ Ġc hemic
148
+ Ġw h
149
+ ra vity
150
+ ce an
151
+ ak e
152
+ mb ol
153
+ Ġsy mbol
154
+ igh t
155
+ Ġchemic al
156
+ l ou
157
+ r an
158
+ s ia
159
+ Ġ 2
160
+ er t
161
+ on d
162
+ ia m
163
+ il l
164
+ Ġplan et
165
+ lou r
166
+ a b
167
+ c es
168
+ s is
169
+ t a
170
+ u m
171
+ u n
172
+ u p
173
+ x and
174
+ y nt
175
+ Ġ O
176
+ Ġ h
177
+ he sis
178
+ Ġt allest
179
+ es ert
180
+ Ġp ro
181
+ ic a
182
+ ic e
183
+ ĠA le
184
+ ĠC h
185
+ Ġb o
186
+ Ġb y
187
+ Ġb re
188
+ os ynt
189
+ hot osynt
190
+ ces s
191
+ xand er
192
+ Ġpro cess
193
+ ĠAle xander
194
+ hotosynt hesis
195
+ a ci
196
+ e m
197
+ g u
198
+ h ar
199
+ t e
200
+ Ġ 1
201
+ Ġ ar
202
+ Ġ di
203
+ it er
204
+ re e
205
+ on e
206
+ ĠS h
207
+ ĠP ar
208
+ ĠB ra
209
+ el l
210
+ ĠR ice
211
+ ĠF lour
212
+ ĠL is
213
+ 0 0
214
+ 7 9
215
+ 9 9
216
+ e p
217
+ e on
218
+ f f
219
+ i d
220
+ i r
221
+ l o
222
+ l et
223
+ m al
224
+ o c
225
+ s ul
226
+ t ed
227
+ y o
228
+ Ġ U
229
+ Ġ n
230
+ at t
231
+ ĠW ill
232
+ es pe
233
+ ar e
234
+ ar th
235
+ in a
236
+ in ci
237
+ in sul
238
+ en insul
239
+ er s
240
+ Ġl ight
241
+ Ġw ro
242
+ ain ted
243
+ Ġp er
244
+ Ġp ainted
245
+ ĠT ok
246
+ on a
247
+ ic h
248
+ Ġs pe
249
+ Ġs ec
250
+ ĠS a
251
+ ĠS t
252
+ il e
253
+ il ing
254
+ il om
255
+ ĠC an
256
+ ĠB e
257
+ us sia
258
+ Ġd a
259
+ et ers
260
+ ĠM ona
261
+ ĠR ussia
262
+ ĠF ran
263
+ ĠL eon
264
+ ĠE arth
265
+ ak espe
266
+ am let
267
+ ĠI n
268
+ ard o
269
+ ĠV inci
270
+ ĠJ up
271
+ ĠH amlet
272
+ Ġk ilom
273
+ Ġ2 99
274
+ Ġbo iling
275
+ Ġ1 00
276
+ ĠSh akespe
277
+ ĠPar is
278
+ ĠLis a
279
+ 79 2
280
+ ĠWill iam
281
+ eninsul a
282
+ Ġwro te
283
+ ĠTok yo
284
+ Ġspe ed
285
+ Ġsec ond
286
+ ĠFran ce
287
+ ĠLeon ardo
288
+ ĠJup iter
289
+ Ġkilom eters
290
+ ĠShakespe are
291
+ a y
292
+ b j
293
+ b on
294
+ c ts
295
+ f ic
296
+ h at
297
+ h am
298
+ h one
299
+ n it
300
+ o d
301
+ o ld
302
+ o od
303
+ r y
304
+ u b
305
+ v ent
306
+ w ard
307
+ z il
308
+ z ing
309
+ Ġ 0
310
+ Ġ hot
311
+ Ġ att
312
+ Ġt el
313
+ Ġt hat
314
+ at es
315
+ Ġis land
316
+ Ġin vent
317
+ es ia
318
+ al e
319
+ Ġc ent
320
+ Ġo cean
321
+ Ġo bj
322
+ Ġm ount
323
+ ap an
324
+ ĠA u
325
+ Ġf ree
326
+ ra cts
327
+ ra ham
328
+ ĠP aci
329
+ ĠB ell
330
+ ĠM ount
331
+ Ġg ravity
332
+ Ġg old
333
+ ĠG ravity
334
+ ĠG raham
335
+ ĠE g
336
+ ĠE ver
337
+ Ġfor ce
338
+ ec ts
339
+ ĠJ apan
340
+ ag e
341
+ Ġto ward
342
+ Ġwh ale
343
+ ĠO cean
344
+ Ġh um
345
+ Ġbre ad
346
+ ĠBra zil
347
+ ep hone
348
+ ĠU nit
349
+ ĠSt ates
350
+ Ġatt racts
351
+ Ġtel ephone
352
+ Ġinvent ed
353
+ Ġcent er
354
+ Ġobj ects
355
+ Ġmount ain
356
+ Ġfree zing
357
+ ĠPaci fic
358
+ ĠEver est
359
+ ĠUnit ed
360
+ a v
361
+ e an
362
+ f r
363
+ h i
364
+ i an
365
+ k pe
366
+ l ight
367
+ o x
368
+ p t
369
+ r en
370
+ t r
371
+ t u
372
+ t in
373
+ u mb
374
+ Ġ K
375
+ Ġ us
376
+ Ġ and
377
+ Ġ ri
378
+ ar bon
379
+ Ġc ount
380
+ Ġc ur
381
+ Ġc arbon
382
+ an i
383
+ Ġl ake
384
+ or tu
385
+ Ġm ake
386
+ Ġp hotosynthesis
387
+ ic kpe
388
+ Ġs un
389
+ ĠS ea
390
+ ĠA us
391
+ ĠA fr
392
+ Ġf ro
393
+ Ġf ood
394
+ ĠP hotosynthesis
395
+ ĠP ortu
396
+ Ġd esert
397
+ ĠM e
398
+ ĠR om
399
+ ĠG re
400
+ ad a
401
+ st an
402
+ ĠD esert
403
+ Ġplan ts
404
+ ai ro
405
+ ch i
406
+ Ġwh ich
407
+ ĠCh ickpe
408
+ har a
409
+ Ġdi ox
410
+ id e
411
+ ĠSa hara
412
+ Ġus e
413
+ Ġri ver
414
+ Ġcount ry
415
+ Ġsun light
416
+ ĠAus tr
417
+ ĠAfr ica
418
+ Ġfro m
419
+ ĠChickpe as
420
+ Ġdiox ide
421
+ b ab
422
+ c y
423
+ d on
424
+ e w
425
+ e y
426
+ g al
427
+ g en
428
+ i j
429
+ i ra
430
+ l u
431
+ l ar
432
+ l in
433
+ l as
434
+ m e
435
+ m us
436
+ m ing
437
+ m allest
438
+ o p
439
+ o es
440
+ p ok
441
+ r es
442
+ s co
443
+ t al
444
+ t est
445
+ t or
446
+ t on
447
+ u t
448
+ u ro
449
+ w ater
450
+ x ic
451
+ Ġ ai
452
+ Ġ ani
453
+ at he
454
+ at ur
455
+ at oes
456
+ es e
457
+ al ia
458
+ Ġc on
459
+ an y
460
+ an gu
461
+ it ro
462
+ en ic
463
+ en us
464
+ en land
465
+ Ġl on
466
+ Ġl angu
467
+ Ġw e
468
+ Ġm am
469
+ di a
470
+ Ġp ri
471
+ Ġp eninsula
472
+ Ġp enic
473
+ Ġs ol
474
+ Ġs ub
475
+ Ġs mallest
476
+ Ġs pok
477
+ ĠS ou
478
+ ĠA m
479
+ ĠA ra
480
+ ĠA sia
481
+ ou s
482
+ ou gh
483
+ Ġb ir
484
+ Ġb lu
485
+ ĠP eninsula
486
+ ĠB u
487
+ ĠB er
488
+ ol lar
489
+ is h
490
+ Ġd ough
491
+ Ġd ollar
492
+ ĠM os
493
+ ĠR i
494
+ Ġg as
495
+ Ġg ira
496
+ ĠF le
497
+ ĠE uro
498
+ ing ton
499
+ ĠN ile
500
+ ĠN ew
501
+ ĠN itro
502
+ ĠI tal
503
+ ard est
504
+ st em
505
+ ver ed
506
+ ĠD el
507
+ ĠD iam
508
+ ĠV enus
509
+ Ġsy stem
510
+ co w
511
+ ill in
512
+ ab b
513
+ Ġh ardest
514
+ ĠCh ina
515
+ Ġbre athe
516
+ gu ese
517
+ Ġar ea
518
+ Ġdi sco
519
+ ff e
520
+ Ġn umb
521
+ Ġn atur
522
+ ĠCan ada
523
+ ĠIn don
524
+ ĠIn dia
525
+ Ġhot test
526
+ ĠEg g
527
+ Ġhum mus
528
+ ren cy
529
+ tin ent
530
+ Ġcur rency
531
+ ĠPortu guese
532
+ ĠMe xic
533
+ ĠRom e
534
+ ĠGre enland
535
+ stan ce
536
+ ĠAustr alia
537
+ tor ia
538
+ Ġai r
539
+ Ġani mal
540
+ Ġcon tinent
541
+ Ġlon gest
542
+ Ġlangu age
543
+ Ġmam mal
544
+ Ġpri me
545
+ Ġpenic illin
546
+ Ġsol ar
547
+ Ġsub stance
548
+ Ġspok en
549
+ ĠSou th
550
+ ĠAra b
551
+ Ġblu e
552
+ ĠMos cow
553
+ Ġgira ffe
554
+ ĠFle ming
555
+ ĠEuro pe
556
+ ĠNitro gen
557
+ ĠItal y
558
+ ĠDiam ond
559
+ Ġdisco vered
560
+ Ġnumb er
561
+ Ġnatur al
562
+ ĠIndon esia
563
+ ĠMexic o
564
+ a c
565
+ a un
566
+ a ff
567
+ b i
568
+ b er
569
+ c an
570
+ d y
571
+ d di
572
+ e f
573
+ e or
574
+ f all
575
+ g e
576
+ g er
577
+ h oc
578
+ h ington
579
+ i m
580
+ i z
581
+ i op
582
+ i ger
583
+ k i
584
+ k ey
585
+ l at
586
+ l es
587
+ l ia
588
+ l aci
589
+ l esia
590
+ m any
591
+ n a
592
+ n ol
593
+ o a
594
+ o g
595
+ o co
596
+ p el
597
+ p ian
598
+ r on
599
+ r ich
600
+ s Ã
601
+ s ch
602
+ t o
603
+ t ta
604
+ t water
605
+ u di
606
+ u ac
607
+ v ol
608
+ v oc
609
+ w a
610
+ w er
611
+ y a
612
+ y ad
613
+ y pt
614
+ z a
615
+ Ġ r
616
+ Ġ re
617
+ Ġ vol
618
+ Ń lia
619
+ Ġi ron
620
+ at e
621
+ ĠW as
622
+ al twater
623
+ Ġc ro
624
+ Ġc hoc
625
+ Ġo st
626
+ an g
627
+ en ya
628
+ er ica
629
+ er many
630
+ Ġw as
631
+ or sch
632
+ di i
633
+ di le
634
+ di um
635
+ Ġp as
636
+ Ġp iz
637
+ ĠT e
638
+ ĠT om
639
+ ĠT ur
640
+ ic toria
641
+ Ġs us
642
+ Ġs har
643
+ Ġs altwater
644
+ ĠS p
645
+ ĠS w
646
+ ĠS il
647
+ ĠA n
648
+ ĠA bab
649
+ ĠA ddi
650
+ ĠA voc
651
+ Ġf lour
652
+ Ġf em
653
+ Ġf ir
654
+ Ġf lo
655
+ Ġf ish
656
+ ra d
657
+ ĠC as
658
+ ĠC airo
659
+ ĠC abb
660
+ ĠC oco
661
+ Ġb one
662
+ Ġb orsch
663
+ ĠP h
664
+ ĠP res
665
+ ĠB r
666
+ ol e
667
+ ol ate
668
+ is o
669
+ et s
670
+ ĠM aun
671
+ ĠR aff
672
+ Ġg las
673
+ Ġg uac
674
+ ĠG eor
675
+ ĠG ermany
676
+ ĠF e
677
+ ĠF all
678
+ ĠL ag
679
+ ĠL ake
680
+ ĠL oa
681
+ ĠE th
682
+ Ġwater fall
683
+ am ole
684
+ ĠN or
685
+ ĠN airo
686
+ ĠN iger
687
+ ad o
688
+ ĠI s
689
+ ĠV ictoria
690
+ ul f
691
+ ag o
692
+ co dile
693
+ ea u
694
+ ta dium
695
+ un g
696
+ ĠO tta
697
+ Ġbo dy
698
+ Ġar chi
699
+ Ġar nol
700
+ ĠBra sÃ
701
+ id ent
702
+ ĠSa udi
703
+ ĠCan ber
704
+ ĠBe ij
705
+ ĠBe ets
706
+ ĠEg ypt
707
+ Ġhum an
708
+ av e
709
+ pt ile
710
+ ĠK enya
711
+ ĠAm erica
712
+ Ġbir d
713
+ ĠBer lin
714
+ ĠRi yad
715
+ ĠDel hi
716
+ ĠArab ia
717
+ can o
718
+ iop ia
719
+ lat eau
720
+ laci er
721
+ pel ago
722
+ Ġre ptile
723
+ Ġvol cano
724
+ ĠWas hington
725
+ Ġcro codile
726
+ Ġchoc olate
727
+ Ġost rich
728
+ Ġpiz za
729
+ ĠTom atoes
730
+ ĠTur key
731
+ Ġsus hi
732
+ Ġshar k
733
+ ĠSil ica
734
+ ĠAbab a
735
+ ĠAddi s
736
+ ĠAvoc ado
737
+ Ġfem ur
738
+ Ġfir st
739
+ Ġflo wer
740
+ ĠCas pian
741
+ ĠCabb age
742
+ ĠCoco a
743
+ Ġborsch t
744
+ ĠPres ident
745
+ ĠMaun a
tokenizer/train_tokenizer.py ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ from tokenizers import ByteLevelBPETokenizer
2
+ import os
3
+
4
+ input_path = os.path.join("..", "data", "input.txt")
5
+ if not os.path.exists(input_path):
6
+ input_path = os.path.join("data", "input.txt")
7
+ tokenizer = ByteLevelBPETokenizer()
8
+ tokenizer.train(files=input_path, vocab_size=1000, min_frequency=2)
9
+ tokenizer.save_model(os.path.dirname(__file__))
10
+ print("Tokenizer trained and saved.")
tokenizer/vocab.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"!":0,"\"":1,"#":2,"$":3,"%":4,"&":5,"'":6,"(":7,")":8,"*":9,"+":10,",":11,"-":12,".":13,"/":14,"0":15,"1":16,"2":17,"3":18,"4":19,"5":20,"6":21,"7":22,"8":23,"9":24,":":25,";":26,"<":27,"=":28,">":29,"?":30,"@":31,"A":32,"B":33,"C":34,"D":35,"E":36,"F":37,"G":38,"H":39,"I":40,"J":41,"K":42,"L":43,"M":44,"N":45,"O":46,"P":47,"Q":48,"R":49,"S":50,"T":51,"U":52,"V":53,"W":54,"X":55,"Y":56,"Z":57,"[":58,"\\":59,"]":60,"^":61,"_":62,"`":63,"a":64,"b":65,"c":66,"d":67,"e":68,"f":69,"g":70,"h":71,"i":72,"j":73,"k":74,"l":75,"m":76,"n":77,"o":78,"p":79,"q":80,"r":81,"s":82,"t":83,"u":84,"v":85,"w":86,"x":87,"y":88,"z":89,"{":90,"|":91,"}":92,"~":93,"¡":94,"¢":95,"£":96,"¤":97,"¥":98,"¦":99,"§":100,"¨":101,"©":102,"ª":103,"«":104,"¬":105,"®":106,"¯":107,"°":108,"±":109,"²":110,"³":111,"´":112,"µ":113,"¶":114,"·":115,"¸":116,"¹":117,"º":118,"»":119,"¼":120,"½":121,"¾":122,"¿":123,"À":124,"Á":125,"Â":126,"Ã":127,"Ä":128,"Å":129,"Æ":130,"Ç":131,"È":132,"É":133,"Ê":134,"Ë":135,"Ì":136,"Í":137,"Î":138,"Ï":139,"Ð":140,"Ñ":141,"Ò":142,"Ó":143,"Ô":144,"Õ":145,"Ö":146,"×":147,"Ø":148,"Ù":149,"Ú":150,"Û":151,"Ü":152,"Ý":153,"Þ":154,"ß":155,"à":156,"á":157,"â":158,"ã":159,"ä":160,"å":161,"æ":162,"ç":163,"è":164,"é":165,"ê":166,"ë":167,"ì":168,"í":169,"î":170,"ï":171,"ð":172,"ñ":173,"ò":174,"ó":175,"ô":176,"õ":177,"ö":178,"÷":179,"ø":180,"ù":181,"ú":182,"û":183,"ü":184,"ý":185,"þ":186,"ÿ":187,"Ā":188,"ā":189,"Ă":190,"ă":191,"Ą":192,"ą":193,"Ć":194,"ć":195,"Ĉ":196,"ĉ":197,"Ċ":198,"ċ":199,"Č":200,"č":201,"Ď":202,"ď":203,"Đ":204,"đ":205,"Ē":206,"ē":207,"Ĕ":208,"ĕ":209,"Ė":210,"ė":211,"Ę":212,"ę":213,"Ě":214,"ě":215,"Ĝ":216,"ĝ":217,"Ğ":218,"ğ":219,"Ġ":220,"ġ":221,"Ģ":222,"ģ":223,"Ĥ":224,"ĥ":225,"Ħ":226,"ħ":227,"Ĩ":228,"ĩ":229,"Ī":230,"ī":231,"Ĭ":232,"ĭ":233,"Į":234,"į":235,"İ":236,"ı":237,"IJ":238,"ij":239,"Ĵ":240,"ĵ":241,"Ķ":242,"ķ":243,"ĸ":244,"Ĺ":245,"ĺ":246,"Ļ":247,"ļ":248,"Ľ":249,"ľ":250,"Ŀ":251,"ŀ":252,"Ł":253,"ł":254,"Ń":255,"čĊ":256,"Ġi":257,"he":258,"Ġt":259,"Ġthe":260,"at":261,"ĠW":262,"ĠWh":263,"Ġis":264,"ĠWhat":265,"Ġin":266,"es":267,"ar":268,"in":269,"al":270,"Ġc":271,"Ġo":272,"an":273,"it":274,"en":275,"er":276,"est":277,"Ġof":278,"Ġl":279,"re":280,"Ġw":281,"gest":282,"or":283,"argest":284,"Ġlargest":285,"Ġm":286,"di":287,"ain":288,"ap":289,"Ġp":290,"ent":291,"gre":292,"Ġcap":293,"ital":294,"Ġcapital":295,"Ġmain":296,"ĠT":297,"on":298,"Ġingre":299,"dient":300,"Ġingredient":301,"ic":302,"ld":303,"Ġs":304,"Ġwor":305,"Ġworld":306,"ĠS":307,"ĠThe":308,"ĠA":309,"Ġf":310,"ia":311,"ra":312,"il":313,"ĠC":314,"ou":315,"Ġb":316,"ĠP":317,"ĠB":318,"el":319,"us":320,"ro":321,"ol":322,"is":323,"and":324,"Ġd":325,"ed":326,"et":327,"ĠM":328,"ce":329,"ĠR":330,"Ġg":331,"ity":332,"ĠG":333,"ater":334,"nt":335,"ĠF":336,"ĠL":337,"ĠWho":338,"ĠE":339,"ak":340,"Ġwater":341,"am":342,"as":343,"pe":344,"ing":345,"ot":346,"ĠN":347,"ad":348,"le":349,"mb":350,"os":351,"ts":352,"all":353,"Ġfor":354,"gh":355,"lan":356,"ĠI":357,"ard":358,"ount":359,"ci":360,"hot":361,"st":362,"th":363,"ver":364,"ĠD":365,"ĠV":366,"Ġcity":367,"Ġplan":368,"Ġsy":369,"els":370,"ec":371,"land":372,"om":373,"ur":374,"ĠJ":375,"Ġpo":376,"egre":377,"ius":378,"ul":379,"ĠH":380,"int":381,"ĠCels":382,"Ġdegre":383,"allest":384,"Ġpoint":385,"ĠCelsius":386,"Ġdegrees":387,"ag":388,"ai":389,"ch":390,"co":391,"ea":392,"igh":393,"mic":394,"ok":395,"ri":396,"vity":397,"Ġk":398,"hemic":399,"Ġto":400,"Ġchemic":401,"Ġwh":402,"ravity":403,"cean":404,"ake":405,"mbol":406,"Ġsymbol":407,"ight":408,"Ġchemical":409,"lou":410,"ran":411,"sia":412,"Ġ2":413,"ert":414,"ond":415,"iam":416,"ill":417,"Ġplanet":418,"lour":419,"ab":420,"ces":421,"sis":422,"ta":423,"um":424,"un":425,"up":426,"xand":427,"ynt":428,"ĠO":429,"Ġh":430,"hesis":431,"Ġtallest":432,"esert":433,"Ġpro":434,"ica":435,"ice":436,"ĠAle":437,"ĠCh":438,"Ġbo":439,"Ġby":440,"Ġbre":441,"osynt":442,"hotosynt":443,"cess":444,"xander":445,"Ġprocess":446,"ĠAlexander":447,"hotosynthesis":448,"aci":449,"em":450,"gu":451,"har":452,"te":453,"Ġ1":454,"Ġar":455,"Ġdi":456,"iter":457,"ree":458,"one":459,"ĠSh":460,"ĠPar":461,"ĠBra":462,"ell":463,"ĠRice":464,"ĠFlour":465,"ĠLis":466,"00":467,"79":468,"99":469,"ep":470,"eon":471,"ff":472,"id":473,"ir":474,"lo":475,"let":476,"mal":477,"oc":478,"sul":479,"ted":480,"yo":481,"ĠU":482,"Ġn":483,"att":484,"ĠWill":485,"espe":486,"are":487,"arth":488,"ina":489,"inci":490,"insul":491,"eninsul":492,"ers":493,"Ġlight":494,"Ġwro":495,"ainted":496,"Ġper":497,"Ġpainted":498,"ĠTok":499,"ona":500,"ich":501,"Ġspe":502,"Ġsec":503,"ĠSa":504,"ĠSt":505,"ile":506,"iling":507,"ilom":508,"ĠCan":509,"ĠBe":510,"ussia":511,"Ġda":512,"eters":513,"ĠMona":514,"ĠRussia":515,"ĠFran":516,"ĠLeon":517,"ĠEarth":518,"akespe":519,"amlet":520,"ĠIn":521,"ardo":522,"ĠVinci":523,"ĠJup":524,"ĠHamlet":525,"Ġkilom":526,"Ġ299":527,"Ġboiling":528,"Ġ100":529,"ĠShakespe":530,"ĠParis":531,"ĠLisa":532,"792":533,"ĠWilliam":534,"eninsula":535,"Ġwrote":536,"ĠTokyo":537,"Ġspeed":538,"Ġsecond":539,"ĠFrance":540,"ĠLeonardo":541,"ĠJupiter":542,"Ġkilometers":543,"ĠShakespeare":544,"ay":545,"bj":546,"bon":547,"cts":548,"fic":549,"hat":550,"ham":551,"hone":552,"nit":553,"od":554,"old":555,"ood":556,"ry":557,"ub":558,"vent":559,"ward":560,"zil":561,"zing":562,"Ġ0":563,"Ġhot":564,"Ġatt":565,"Ġtel":566,"Ġthat":567,"ates":568,"Ġisland":569,"Ġinvent":570,"esia":571,"ale":572,"Ġcent":573,"Ġocean":574,"Ġobj":575,"Ġmount":576,"apan":577,"ĠAu":578,"Ġfree":579,"racts":580,"raham":581,"ĠPaci":582,"ĠBell":583,"ĠMount":584,"Ġgravity":585,"Ġgold":586,"ĠGravity":587,"ĠGraham":588,"ĠEg":589,"ĠEver":590,"Ġforce":591,"ects":592,"ĠJapan":593,"age":594,"Ġtoward":595,"Ġwhale":596,"ĠOcean":597,"Ġhum":598,"Ġbread":599,"ĠBrazil":600,"ephone":601,"ĠUnit":602,"ĠStates":603,"Ġattracts":604,"Ġtelephone":605,"Ġinvented":606,"Ġcenter":607,"Ġobjects":608,"Ġmountain":609,"Ġfreezing":610,"ĠPacific":611,"ĠEverest":612,"ĠUnited":613,"av":614,"ean":615,"fr":616,"hi":617,"ian":618,"kpe":619,"light":620,"ox":621,"pt":622,"ren":623,"tr":624,"tu":625,"tin":626,"umb":627,"ĠK":628,"Ġus":629,"Ġand":630,"Ġri":631,"arbon":632,"Ġcount":633,"Ġcur":634,"Ġcarbon":635,"ani":636,"Ġlake":637,"ortu":638,"Ġmake":639,"Ġphotosynthesis":640,"ickpe":641,"Ġsun":642,"ĠSea":643,"ĠAus":644,"ĠAfr":645,"Ġfro":646,"Ġfood":647,"ĠPhotosynthesis":648,"ĠPortu":649,"Ġdesert":650,"ĠMe":651,"ĠRom":652,"ĠGre":653,"ada":654,"stan":655,"ĠDesert":656,"Ġplants":657,"airo":658,"chi":659,"Ġwhich":660,"ĠChickpe":661,"hara":662,"Ġdiox":663,"ide":664,"ĠSahara":665,"Ġuse":666,"Ġriver":667,"Ġcountry":668,"Ġsunlight":669,"ĠAustr":670,"ĠAfrica":671,"Ġfrom":672,"ĠChickpeas":673,"Ġdioxide":674,"bab":675,"cy":676,"don":677,"ew":678,"ey":679,"gal":680,"gen":681,"ij":682,"ira":683,"lu":684,"lar":685,"lin":686,"las":687,"me":688,"mus":689,"ming":690,"mallest":691,"op":692,"oes":693,"pok":694,"res":695,"sco":696,"tal":697,"test":698,"tor":699,"ton":700,"ut":701,"uro":702,"water":703,"xic":704,"Ġai":705,"Ġani":706,"athe":707,"atur":708,"atoes":709,"ese":710,"alia":711,"Ġcon":712,"any":713,"angu":714,"itro":715,"enic":716,"enus":717,"enland":718,"Ġlon":719,"Ġlangu":720,"Ġwe":721,"Ġmam":722,"dia":723,"Ġpri":724,"Ġpeninsula":725,"Ġpenic":726,"Ġsol":727,"Ġsub":728,"Ġsmallest":729,"Ġspok":730,"ĠSou":731,"ĠAm":732,"ĠAra":733,"ĠAsia":734,"ous":735,"ough":736,"Ġbir":737,"Ġblu":738,"ĠPeninsula":739,"ĠBu":740,"ĠBer":741,"ollar":742,"ish":743,"Ġdough":744,"Ġdollar":745,"ĠMos":746,"ĠRi":747,"Ġgas":748,"Ġgira":749,"ĠFle":750,"ĠEuro":751,"ington":752,"ĠNile":753,"ĠNew":754,"ĠNitro":755,"ĠItal":756,"ardest":757,"stem":758,"vered":759,"ĠDel":760,"ĠDiam":761,"ĠVenus":762,"Ġsystem":763,"cow":764,"illin":765,"abb":766,"Ġhardest":767,"ĠChina":768,"Ġbreathe":769,"guese":770,"Ġarea":771,"Ġdisco":772,"ffe":773,"Ġnumb":774,"Ġnatur":775,"ĠCanada":776,"ĠIndon":777,"ĠIndia":778,"Ġhottest":779,"ĠEgg":780,"Ġhummus":781,"rency":782,"tinent":783,"Ġcurrency":784,"ĠPortuguese":785,"ĠMexic":786,"ĠRome":787,"ĠGreenland":788,"stance":789,"ĠAustralia":790,"toria":791,"Ġair":792,"Ġanimal":793,"Ġcontinent":794,"Ġlongest":795,"Ġlanguage":796,"Ġmammal":797,"Ġprime":798,"Ġpenicillin":799,"Ġsolar":800,"Ġsubstance":801,"Ġspoken":802,"ĠSouth":803,"ĠArab":804,"Ġblue":805,"ĠMoscow":806,"Ġgiraffe":807,"ĠFleming":808,"ĠEurope":809,"ĠNitrogen":810,"ĠItaly":811,"ĠDiamond":812,"Ġdiscovered":813,"Ġnumber":814,"Ġnatural":815,"ĠIndonesia":816,"ĠMexico":817,"ac":818,"aun":819,"aff":820,"bi":821,"ber":822,"can":823,"dy":824,"ddi":825,"ef":826,"eor":827,"fall":828,"ge":829,"ger":830,"hoc":831,"hington":832,"im":833,"iz":834,"iop":835,"iger":836,"ki":837,"key":838,"lat":839,"les":840,"lia":841,"laci":842,"lesia":843,"many":844,"na":845,"nol":846,"oa":847,"og":848,"oco":849,"pel":850,"pian":851,"ron":852,"rich":853,"sÃ":854,"sch":855,"to":856,"tta":857,"twater":858,"udi":859,"uac":860,"vol":861,"voc":862,"wa":863,"wer":864,"ya":865,"yad":866,"ypt":867,"za":868,"Ġr":869,"Ġre":870,"Ġvol":871,"Ńlia":872,"Ġiron":873,"ate":874,"ĠWas":875,"altwater":876,"Ġcro":877,"Ġchoc":878,"Ġost":879,"ang":880,"enya":881,"erica":882,"ermany":883,"Ġwas":884,"orsch":885,"dii":886,"dile":887,"dium":888,"Ġpas":889,"Ġpiz":890,"ĠTe":891,"ĠTom":892,"ĠTur":893,"ictoria":894,"Ġsus":895,"Ġshar":896,"Ġsaltwater":897,"ĠSp":898,"ĠSw":899,"ĠSil":900,"ĠAn":901,"ĠAbab":902,"ĠAddi":903,"ĠAvoc":904,"Ġflour":905,"Ġfem":906,"Ġfir":907,"Ġflo":908,"Ġfish":909,"rad":910,"ĠCas":911,"ĠCairo":912,"ĠCabb":913,"ĠCoco":914,"Ġbone":915,"Ġborsch":916,"ĠPh":917,"ĠPres":918,"ĠBr":919,"ole":920,"olate":921,"iso":922,"ets":923,"ĠMaun":924,"ĠRaff":925,"Ġglas":926,"Ġguac":927,"ĠGeor":928,"ĠGermany":929,"ĠFe":930,"ĠFall":931,"ĠLag":932,"ĠLake":933,"ĠLoa":934,"ĠEth":935,"Ġwaterfall":936,"amole":937,"ĠNor":938,"ĠNairo":939,"ĠNiger":940,"ado":941,"ĠIs":942,"ĠVictoria":943,"ulf":944,"ago":945,"codile":946,"eau":947,"tadium":948,"ung":949,"ĠOtta":950,"Ġbody":951,"Ġarchi":952,"Ġarnol":953,"ĠBrasÃ":954,"ident":955,"ĠSaudi":956,"ĠCanber":957,"ĠBeij":958,"ĠBeets":959,"ĠEgypt":960,"Ġhuman":961,"ave":962,"ptile":963,"ĠKenya":964,"ĠAmerica":965,"Ġbird":966,"ĠBerlin":967,"ĠRiyad":968,"ĠDelhi":969,"ĠArabia":970,"cano":971,"iopia":972,"lateau":973,"lacier":974,"pelago":975,"Ġreptile":976,"Ġvolcano":977,"ĠWashington":978,"Ġcrocodile":979,"Ġchocolate":980,"Ġostrich":981,"Ġpizza":982,"ĠTomatoes":983,"ĠTurkey":984,"Ġsushi":985,"Ġshark":986,"ĠSilica":987,"ĠAbaba":988,"ĠAddis":989,"ĠAvocado":990,"Ġfemur":991,"Ġfirst":992,"Ġflower":993,"ĠCaspian":994,"ĠCabbage":995,"ĠCocoa":996,"Ġborscht":997,"ĠPresident":998,"ĠMauna":999}