Upload tokenizer

Browse files

Files changed (4) hide show

README.md +199 -0
special_tokens_map.json +7 -0
tokenizer.json +1013 -0
tokenizer_config.json +52 -0

README.md ADDED Viewed

	@@ -0,0 +1,199 @@

+---
+library_name: transformers
+tags: []
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "bos_token": "<s>",
+  "eos_token": "</s>",
+  "mask_token": "<mask>",
+  "pad_token": "<pad>",
+  "unk_token": "<unk>"
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,1013 @@

+{
+  "version": "1.0",
+  "truncation": null,
+  "padding": null,
+  "added_tokens": [
+    {
+      "id": 0,
+      "content": "<s>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    {
+      "id": 1,
+      "content": "</s>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    {
+      "id": 2,
+      "content": "<unk>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    {
+      "id": 3,
+      "content": "<pad>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    {
+      "id": 4,
+      "content": "<mask>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    }
+  ],
+  "normalizer": null,
+  "pre_tokenizer": {
+    "type": "ByteLevel",
+    "add_prefix_space": true,
+    "trim_offsets": true,
+    "use_regex": true
+  },
+  "post_processor": {
+    "type": "TemplateProcessing",
+    "single": [
+      {
+        "Sequence": {
+          "id": "A",
+          "type_id": 0
+        }
+      },
+      {
+        "SpecialToken": {
+          "id": "</s>",
+          "type_id": 0
+        }
+      }
+    ],
+    "pair": [
+      {
+        "Sequence": {
+          "id": "A",
+          "type_id": 0
+        }
+      },
+      {
+        "SpecialToken": {
+          "id": "</s>",
+          "type_id": 0
+        }
+      },
+      {
+        "Sequence": {
+          "id": "B",
+          "type_id": 1
+        }
+      },
+      {
+        "SpecialToken": {
+          "id": "</s>",
+          "type_id": 1
+        }
+      }
+    ],
+    "special_tokens": {
+      "</s>": {
+        "id": "</s>",
+        "ids": [
+          1
+        ],
+        "tokens": [
+          "</s>"
+        ]
+      },
+      "<s>": {
+        "id": "<s>",
+        "ids": [
+          0
+        ],
+        "tokens": [
+          "<s>"
+        ]
+      }
+    }
+  },
+  "decoder": {
+    "type": "ByteLevel",
+    "add_prefix_space": true,
+    "trim_offsets": true,
+    "use_regex": true
+  },
+  "model": {
+    "type": "BPE",
+    "dropout": null,
+    "unk_token": "<unk>",
+    "continuing_subword_prefix": null,
+    "end_of_word_suffix": null,
+    "fuse_unk": false,
+    "byte_fallback": false,
+    "vocab": {
+      "<s>": 0,
+      "</s>": 1,
+      "<unk>": 2,
+      "<pad>": 3,
+      "<mask>": 4,
+      "!": 5,
+      "%": 6,
+      "'": 7,
+      "+": 8,
+      ",": 9,
+      "-": 10,
+      ".": 11,
+      "/": 12,
+      "0": 13,
+      "1": 14,
+      "2": 15,
+      "3": 16,
+      "4": 17,
+      "5": 18,
+      "6": 19,
+      "7": 20,
+      "8": 21,
+      "9": 22,
+      ":": 23,
+      "?": 24,
+      "A": 25,
+      "B": 26,
+      "C": 27,
+      "D": 28,
+      "E": 29,
+      "F": 30,
+      "G": 31,
+      "H": 32,
+      "I": 33,
+      "J": 34,
+      "K": 35,
+      "L": 36,
+      "M": 37,
+      "N": 38,
+      "O": 39,
+      "P": 40,
+      "Q": 41,
+      "R": 42,
+      "S": 43,
+      "T": 44,
+      "U": 45,
+      "V": 46,
+      "W": 47,
+      "X": 48,
+      "Y": 49,
+      "Z": 50,
+      "a": 51,
+      "b": 52,
+      "c": 53,
+      "d": 54,
+      "e": 55,
+      "f": 56,
+      "g": 57,
+      "h": 58,
+      "i": 59,
+      "j": 60,
+      "k": 61,
+      "l": 62,
+      "m": 63,
+      "n": 64,
+      "o": 65,
+      "p": 66,
+      "q": 67,
+      "r": 68,
+      "s": 69,
+      "t": 70,
+      "u": 71,
+      "v": 72,
+      "w": 73,
+      "x": 74,
+      "y": 75,
+      "z": 76,
+      "¡": 77,
+      "¢": 78,
+      "£": 79,
+      "¤": 80,
+      "¥": 81,
+      "§": 82,
+      "¨": 83,
+      "©": 84,
+      "ª": 85,
+      "«": 86,
+      "®": 87,
+      "¯": 88,
+      "±": 89,
+      "³": 90,
+      "´": 91,
+      "¶": 92,
+      "º": 93,
+      "¼": 94,
+      "½": 95,
+      "¾": 96,
+      "Â": 97,
+      "Ã": 98,
+      "Ä": 99,
+      "Å": 100,
+      "Ç": 101,
+      "È": 102,
+      "â": 103,
+      "Ġ": 104,
+      "Ģ": 105,
+      "ģ": 106,
+      "Ĥ": 107,
+      "ĥ": 108,
+      "Ħ": 109,
+      "ĩ": 110,
+      "Ī": 111,
+      "ī": 112,
+      "Į": 113,
+      "į": 114,
+      "İ": 115,
+      "ı": 116,
+      "ĵ": 117,
+      "ĸ": 118,
+      "Ĺ": 119,
+      "ĺ": 120,
+      "Ļ": 121,
+      "ļ": 122,
+      "Ľ": 123,
+      "ŀ": 124,
+      "Ł": 125,
+      "ł": 126,
+      "Ń": 127,
+      "ÃŃ": 128,
+      "Ã¡": 129,
+      "Ġp": 130,
+      "Ġ,": 131,
+      "Ġt": 132,
+      "Ġs": 133,
+      "ÄĽ": 134,
+      "Ġn": 135,
+      "Ġv": 136,
+      "Ġj": 137,
+      "Ġa": 138,
+      "ro": 139,
+      "Ã©": 140,
+      "ÅĻ": 141,
+      "Å¾": 142,
+      "nÃŃ": 143,
+      "Ġz": 144,
+      "ch": 145,
+      "st": 146,
+      "Ġd": 147,
+      "��m": 148,
+      "Ġk": 149,
+      "ov": 150,
+      "Äį": 151,
+      "le": 152,
+      "li": 153,
+      "Å¡": 154,
+      "Ġo": 155,
+      "Ġpo": 156,
+      "Ã½": 157,
+      "me": 158,
+      "at": 159,
+      "Ġb": 160,
+      "Ġje": 161,
+      "Ġ.": 162,
+      "Ġto": 163,
+      "Å¾e": 164,
+      "la": 165,
+      "ou": 166,
+      "Ġna": 167,
+      "ÅĻe": 168,
+      "ak": 169,
+      "Ġpro": 170,
+      "te": 171,
+      "ra": 172,
+      "Å¯": 173,
+      "it": 174,
+      "ce": 175,
+      "sk": 176,
+      "Ġne": 177,
+      "Ġse": 178,
+      "in": 179,
+      "to": 180,
+      "no": 181,
+      "ho": 182,
+      "rÃ¡": 183,
+      "uj": 184,
+      "de": 185,
+      "nÄĽ": 186,
+      "Ġza": 187,
+      "Ġu": 188,
+      "an": 189,
+      "em": 190,
+      "ci": 191,
+      "Ġdo": 192,
+      "Ġtak": 193,
+      "ĠÅ¾e": 194,
+      "ni": 195,
+      "po": 196,
+      "ÅĻÃŃ": 197,
+      "dy": 198,
+      "mi": 199,
+      "Ġko": 200,
+      "en": 201,
+      "Ġby": 202,
+      "Ġkte": 203,
+      "lo": 204,
+      "Ġjs": 205,
+      "by": 206,
+      "ÃŃm": 207,
+      "ti": 208,
+      "rop": 209,
+      "Ġh": 210,
+      "vrop": 211,
+      "ku": 212,
+      "Ġst": 213,
+      "ĠE": 214,
+      "ĠnÃ¡": 215,
+      "Ã¡nÃŃ": 216,
+      "ĠpÅĻe": 217,
+      "ÅĻi": 218,
+      "ĠÄį": 219,
+      "Ġvy": 220,
+      "nÃ©": 221,
+      "Ġve": 222,
+      "ky": 223,
+      "us": 224,
+      "Ã½ch": 225,
+      "Ġaby": 226,
+      "Å¡e": 227,
+      "Ġbu": 228,
+      "ÄĽk": 229,
+      "Ġob": 230,
+      "na": 231,
+      "ru": 232,
+      "ovat": 233,
+      "re": 234,
+      "Å¡ÃŃ": 235,
+      "Ġro": 236,
+      "nost": 237,
+      "cÃŃ": 238,
+      "ĠEvrop": 239,
+      "se": 240,
+      "Ġjak": 241,
+      "skÃ©": 242,
+      "uji": 243,
+      "nÃ¡": 244,
+      "do": 245,
+      "vo": 246,
+      "ck": 247,
+      "Ġkter": 248,
+      "Ġe": 249,
+      "uje": 250,
+      "ÃŃme": 251,
+      "ne": 252,
+      "Ġpod": 253,
+      "ar": 254,
+      "Ġmus": 255,
+      "mÄĽ": 256,
+      "al": 257,
+      "ko": 258,
+      "ĠpÅĻi": 259,
+      "ovÃ¡nÃŃ": 260,
+      "Ġte": 261,
+      "cho": 262,
+      "jÃŃ": 263,
+      "dÄĽ": 264,
+      "Ġch": 265,
+      "ĠvÃ¡": 266,
+      "ĠzÃ¡": 267,
+      "vÄĽ": 268,
+      "va": 269,
+      "ĠpÅĻed": 270,
+      "ka": 271,
+      "Ġmo": 272,
+      "ĠtakÃ©": 273,
+      "vÃ¡": 274,
+      "Ġta": 275,
+      "tÄĽ": 276,
+      "ĠdÄĽk": 277,
+      "Ġale": 278,
+      "ny": 279,
+      "ve": 280,
+      "mu": 281,
+      "or": 282,
+      "lÃ¡": 283,
+      "du": 284,
+      "ĠdÄĽkuji": 285,
+      "Ġhla": 286,
+      "nu": 287,
+      "nt": 288,
+      "Ġbude": 289,
+      "Ġod": 290,
+      "Ãº": 291,
+      "co": 292,
+      "ĠprÃ¡": 293,
+      "da": 294,
+      "Ġsi": 295,
+      "ĠmÃ¡": 296,
+      "lu": 297,
+      "Ġi": 298,
+      "dnÃŃ": 299,
+      "mo": 300,
+      "ĠÃº": 301,
+      "je": 302,
+      "ĠstÃ¡": 303,
+      "Ġhlas": 304,
+      "ÃŃt": 305,
+      "ÅĻeb": 306,
+      "ĠkterÃ©": 307,
+      "Ġspo": 308,
+      "ze": 309,
+      "vnÃŃ": 310,
+      "Ġjsme": 311,
+      "lnÃŃ": 312,
+      "ly": 313,
+      "Ġf": 314,
+      "Ġin": 315,
+      "ova": 316,
+      "sti": 317,
+      "ĠpÅĻÃŃ": 318,
+      "vÃŃ": 319,
+      "Ġre": 320,
+      "Ġmy": 321,
+      "Ġjsou": 322,
+      "er": 323,
+      "vi": 324,
+      "Ġpan": 325,
+      "ĠmÄĽ": 326,
+      "Å¡i": 327,
+      "Ġjed": 328,
+      "Ġco": 329,
+      "chom": 330,
+      "Ġroz": 331,
+      "pra": 332,
+      "ovÃ©": 333,
+      "ez": 334,
+      "ji": 335,
+      "Ġkon": 336,
+      "Å¾it": 337,
+      "Ġpot": 338,
+      "bo": 339,
+      "Ġproto": 340,
+      "tu": 341,
+      "Ġzd": 342,
+      "hod": 343,
+      "Ã½m": 344,
+      "ĠmusÃŃme": 345,
+      "vr": 346,
+      "lÃ©": 347,
+      "gi": 348,
+      "ovo": 349,
+      "Å¾i": 350,
+      "nou": 351,
+      "Ġ2": 352,
+      "kra": 353,
+      "ĠvÃ½": 354,
+      "Ġpa": 355,
+      "ri": 356,
+      "Ġce": 357,
+      "pe": 358,
+      "sta": 359,
+      "ty": 360,
+      "Äįe": 361,
+      "Ġli": 362,
+      "ÅĪ": 363,
+      "sa": 364,
+      "kÃ¡": 365,
+      "ĠhlasovÃ¡nÃŃ": 366,
+      "ĠtÃ©": 367,
+      "Ġ?": 368,
+      "dou": 369,
+      "ĠtÄĽ": 370,
+      "ĠdÅ¯": 371,
+      "ati": 372,
+      "sl": 373,
+      "ĠjÃ¡": 374,
+      "Ġpra": 375,
+      "dÃ¡": 376,
+      "Ã¡lnÃŃ": 377,
+      "Ġty": 378,
+      "Ġ1": 379,
+      "ĠvÄĽ": 380,
+      "Ġzem": 381,
+      "nosti": 382,
+      "Ġtady": 383,
+      "vrh": 384,
+      "si": 385,
+      "Ġdi": 386,
+      "ĠvÅ¡e": 387,
+      "Ġproti": 388,
+      "pa": 389,
+      "kla": 390,
+      "ato": 391,
+      "Ġsou": 392,
+      "leÅ¾it": 393,
+      "lÃŃ": 394,
+      "Ġnej": 395,
+      "stu": 396,
+      "len": 397,
+      "ĠpanÃŃ": 398,
+      "Ġtedy": 399,
+      "di": 400,
+      "ujeme": 401,
+      "ĠpÅĻij": 402,
+      "ment": 403,
+      "ckÃ©": 404,
+      "Ġevrop": 405,
+      "ĠÅĻe": 406,
+      "Å¯m": 407,
+      "ĠpotÅĻeb": 408,
+      "tÃŃ": 409,
+      "so": 410,
+      "ma": 411,
+      "Å¯Å¾e": 412,
+      "Ġuni": 413,
+      "Ġpr": 414,
+      "Ġabychom": 415,
+      "nÃ½": 416,
+      "Ġkdy": 417,
+      "Ġjako": 418,
+      "ĠnÃ¡vrh": 419,
+      "Ġle": 420,
+      "Äı": 421,
+      "ĠpÅĻedse": 422,
+      "ry": 423,
+      "ĠbÃ½": 424,
+      "itu": 425,
+      "ujÃŃ": 426,
+      "ÄįnÃŃ": 427,
+      "ĠP": 428,
+      "Ġmi": 429,
+      "ĠvÅ¡ech": 430,
+      "as": 431,
+      "Ġtom": 432,
+      "ĠbÃ½t": 433,
+      "Ġpos": 434,
+      "vÃ©": 435,
+      "liti": 436,
+      "zi": 437,
+      "Ġdal": 438,
+      "Ġbudou": 439,
+      "leÄį": 440,
+      "ĠK": 441,
+      "Å¡tÄĽ": 442,
+      "nov": 443,
+      "Ġten": 444,
+      "leg": 445,
+      "Ġtoho": 446,
+      "tÃ¡": 447,
+      "Ġvel": 448,
+      "Ã¡st": 449,
+      "Ġbo": 450,
+      "Ġlid": 451,
+      "skÃ¡": 452,
+      "Äįan": 453,
+      "ate": 454,
+      "my": 455,
+      "Ġpoliti": 456,
+      "Ġpodpo": 457,
+      "Ġtu": 458,
+      "stup": 459,
+      "ĠprotoÅ¾e": 460,
+      "za": 461,
+      "ĠobÄįan": 462,
+      "prÃ¡": 463,
+      "nit": 464,
+      "ĠspoleÄį": 465,
+      "Ġpane": 466,
+      "ĠR": 467,
+      "rov": 468,
+      "Ġjsem": 469,
+      "ĠÅĻÃŃ": 470,
+      "ĠdÃ¡": 471,
+      "ta": 472,
+      "ĠEvropskÃ©": 473,
+      "Ġkdo": 474,
+      "ĠÄįlen": 475,
+      "Ġkomi": 476,
+      "led": 477,
+      "Ġsku": 478,
+      "ĠnÄĽ": 479,
+      "ĠmÃ¡me": 480,
+      "Ġkoleg": 481,
+      "skÃ½ch": 482,
+      "ĠU": 483,
+      "Ġpoku": 484,
+      "ĠnaÅ¡e": 485,
+      "ĠdalÅ¡ÃŃ": 486,
+      "ha": 487,
+      "vy": 488,
+      "Ġpar": 489,
+      "rÅ¾e": 490,
+      "ĠÄįÃ¡st": 491,
+      "ĠrÃ¡": 492,
+      "jin": 493,
+      "Å¾ÃŃ": 494,
+      "ste": 495,
+      "Ġsl": 496,
+      "Ġni": 497,
+      "ĠkterÃ¡": 498,
+      "ovÃ¡": 499
+    },
+    "merges": [
+      "Ã Ń",
+      "Ã ¡",
+      "Ġ p",
+      "Ġ ,",
+      "Ġ t",
+      "Ġ s",
+      "Ä Ľ",
+      "Ġ n",
+      "Ġ v",
+      "Ġ j",
+      "Ġ a",
+      "r o",
+      "Ã ©",
+      "Å Ļ",
+      "Å ¾",
+      "n ÃŃ",
+      "Ġ z",
+      "c h",
+      "s t",
+      "Ġ d",
+      "Ġ m",
+      "Ġ k",
+      "o v",
+      "Ä į",
+      "l e",
+      "l i",
+      "Å ¡",
+      "Ġ o",
+      "Ġp o",
+      "Ã ½",
+      "m e",
+      "a t",
+      "Ġ b",
+      "Ġj e",
+      "Ġ .",
+      "Ġt o",
+      "Å¾ e",
+      "l a",
+      "o u",
+      "Ġn a",
+      "ÅĻ e",
+      "a k",
+      "Ġp ro",
+      "t e",
+      "r a",
+      "Å ¯",
+      "i t",
+      "c e",
+      "s k",
+      "Ġn e",
+      "Ġs e",
+      "i n",
+      "t o",
+      "n o",
+      "h o",
+      "r Ã¡",
+      "u j",
+      "d e",
+      "n ÄĽ",
+      "Ġz a",
+      "Ġ u",
+      "a n",
+      "e m",
+      "c i",
+      "Ġd o",
+      "Ġt ak",
+      "Ġ Å¾e",
+      "n i",
+      "p o",
+      "ÅĻ ÃŃ",
+      "d y",
+      "m i",
+      "Ġk o",
+      "e n",
+      "Ġb y",
+      "Ġk te",
+      "l o",
+      "Ġj s",
+      "b y",
+      "ÃŃ m",
+      "t i",
+      "ro p",
+      "Ġ h",
+      "v rop",
+      "k u",
+      "Ġs t",
+      "Ġ E",
+      "Ġn Ã¡",
+      "Ã¡ nÃŃ",
+      "Ġp ÅĻe",
+      "ÅĻ i",
+      "Ġ Äį",
+      "Ġv y",
+      "n Ã©",
+      "Ġv e",
+      "k y",
+      "u s",
+      "Ã½ ch",
+      "Ġa by",
+      "Å¡ e",
+      "Ġb u",
+      "ÄĽ k",
+      "Ġo b",
+      "n a",
+      "r u",
+      "ov at",
+      "r e",
+      "Å¡ ÃŃ",
+      "Ġ ro",
+      "no st",
+      "c ÃŃ",
+      "ĠE vrop",
+      "s e",
+      "Ġj ak",
+      "sk Ã©",
+      "uj i",
+      "n Ã¡",
+      "d o",
+      "v o",
+      "c k",
+      "Ġkte r",
+      "Ġ e",
+      "uj e",
+      "ÃŃ me",
+      "n e",
+      "Ġpo d",
+      "a r",
+      "Ġm us",
+      "m ÄĽ",
+      "a l",
+      "k o",
+      "Ġp ÅĻi",
+      "ov Ã¡nÃŃ",
+      "Ġt e",
+      "ch o",
+      "j ÃŃ",
+      "d ÄĽ",
+      "Ġ ch",
+      "Ġv Ã¡",
+      "Ġz Ã¡",
+      "v ÄĽ",
+      "v a",
+      "ĠpÅĻe d",
+      "k a",
+      "Ġm o",
+      "Ġtak Ã©",
+      "v Ã¡",
+      "Ġt a",
+      "t ÄĽ",
+      "Ġd ÄĽk",
+      "Ġa le",
+      "n y",
+      "v e",
+      "m u",
+      "o r",
+      "l Ã¡",
+      "d u",
+      "ĠdÄĽk uji",
+      "Ġh la",
+      "n u",
+      "n t",
+      "Ġbu de",
+      "Ġo d",
+      "Ã º",
+      "c o",
+      "Ġp rÃ¡",
+      "d a",
+      "Ġs i",
+      "Ġm Ã¡",
+      "l u",
+      "Ġ i",
+      "d nÃŃ",
+      "m o",
+      "Ġ Ãº",
+      "j e",
+      "Ġst Ã¡",
+      "Ġhla s",
+      "ÃŃ t",
+      "ÅĻe b",
+      "Ġkter Ã©",
+      "Ġs po",
+      "z e",
+      "v nÃŃ",
+      "Ġjs me",
+      "l nÃŃ",
+      "l y",
+      "Ġ f",
+      "Ġ in",
+      "ov a",
+      "st i",
+      "Ġp ÅĻÃŃ",
+      "v ÃŃ",
+      "Ġ re",
+      "Ġm y",
+      "Ġjs ou",
+      "e r",
+      "v i",
+      "Ġp an",
+      "Ġm ÄĽ",
+      "Å¡ i",
+      "Ġje d",
+      "Ġ co",
+      "cho m",
+      "Ġro z",
+      "p ra",
+      "ov Ã©",
+      "e z",
+      "j i",
+      "Ġko n",
+      "Å¾ it",
+      "Ġpo t",
+      "b o",
+      "Ġpro to",
+      "t u",
+      "Ġz d",
+      "ho d",
+      "Ã½ m",
+      "Ġmus ÃŃme",
+      "v r",
+      "l Ã©",
+      "g i",
+      "ov o",
+      "Å¾ i",
+      "n ou",
+      "Ġ 2",
+      "k ra",
+      "Ġv Ã½",
+      "Ġp a",
+      "r i",
+      "Ġ ce",
+      "p e",
+      "st a",
+      "t y",
+      "Äį e",
+      "Ġ li",
+      "Å Ī",
+      "s a",
+      "k Ã¡",
+      "Ġhlas ovÃ¡nÃŃ",
+      "Ġt Ã©",
+      "Ġ ?",
+      "d ou",
+      "Ġt ÄĽ",
+      "Ġd Å¯",
+      "at i",
+      "s l",
+      "Ġj Ã¡",
+      "Ġp ra",
+      "d Ã¡",
+      "Ã¡ lnÃŃ",
+      "Ġt y",
+      "Ġ 1",
+      "Ġv ÄĽ",
+      "Ġz em",
+      "nost i",
+      "Ġta dy",
+      "vr h",
+      "s i",
+      "Ġd i",
+      "Ġv Å¡e",
+      "Ġpro ti",
+      "p a",
+      "k la",
+      "at o",
+      "Ġs ou",
+      "le Å¾it",
+      "l ÃŃ",
+      "Ġne j",
+      "st u",
+      "le n",
+      "Ġpa nÃŃ",
+      "Ġte dy",
+      "d i",
+      "uje me",
+      "ĠpÅĻi j",
+      "me nt",
+      "ck Ã©",
+      "Ġe vrop",
+      "Ġ ÅĻe",
+      "Å¯ m",
+      "Ġpot ÅĻeb",
+      "t ÃŃ",
+      "s o",
+      "m a",
+      "Å¯ Å¾e",
+      "Ġu ni",
+      "Ġp r",
+      "Ġaby chom",
+      "n Ã½",
+      "Ġk dy",
+      "Ġjak o",
+      "ĠnÃ¡ vrh",
+      "Ġ le",
+      "Ä ı",
+      "ĠpÅĻed se",
+      "r y",
+      "Ġb Ã½",
+      "it u",
+      "uj ÃŃ",
+      "Äį nÃŃ",
+      "Ġ P",
+      "Ġm i",
+      "ĠvÅ¡e ch",
+      "a s",
+      "Ġto m",
+      "ĠbÃ½ t",
+      "Ġpo s",
+      "v Ã©",
+      "li ti",
+      "z i",
+      "Ġd al",
+      "Ġbu dou",
+      "le Äį",
+      "Ġ K",
+      "Å¡ tÄĽ",
+      "n ov",
+      "Ġt en",
+      "le g",
+      "Ġto ho",
+      "t Ã¡",
+      "Ġve l",
+      "Ã¡ st",
+      "Ġb o",
+      "Ġli d",
+      "sk Ã¡",
+      "Äį an",
+      "at e",
+      "m y",
+      "Ġpo liti",
+      "Ġpod po",
+      "Ġt u",
+      "stu p",
+      "Ġproto Å¾e",
+      "z a",
+      "Ġob Äįan",
+      "p rÃ¡",
+      "n it",
+      "Ġspo leÄį",
+      "Ġpan e",
+      "Ġ R",
+      "ro v",
+      "Ġjs em",
+      "Ġ ÅĻÃŃ",
+      "Ġd Ã¡",
+      "t a",
+      "ĠEvrop skÃ©",
+      "Ġk do",
+      "ĠÄį len",
+      "Ġko mi",
+      "le d",
+      "Ġs ku",
+      "Ġn ÄĽ",
+      "ĠmÃ¡ me",
+      "Ġko leg",
+      "sk Ã½ch",
+      "Ġ U",
+      "Ġpo ku",
+      "Ġna Å¡e",
+      "Ġdal Å¡ÃŃ",
+      "h a",
+      "v y",
+      "Ġp ar",
+      "r Å¾e",
+      "ĠÄį Ã¡st",
+      "Ġ rÃ¡",
+      "j in",
+      "Å¾ ÃŃ",
+      "st e",
+      "Ġs l",
+      "Ġn i",
+      "Ġkte rÃ¡",
+      "ov Ã¡"
+    ]
+  }
+}

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,52 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "4": {
+      "content": "<mask>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "</s>",
+  "mask_token": "<mask>",
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<pad>",
+  "tokenizer_class": "PreTrainedTokenizerFast",
+  "unk_token": "<unk>"
+}