Update README.md
Browse files
README.md
CHANGED
@@ -54,11 +54,13 @@ model = GPT2LMHeadModel.from_pretrained(path_to_folder_with_checkpoint_files)
|
|
54 |
You should first pretokenize your text using the [MosesTokenizer](https://pypi.org/project/mosestokenizer/):
|
55 |
|
56 |
```python
|
|
|
|
|
57 |
with MosesTokenizer('en') as pretokenize:
|
58 |
pretokenized_text = " ".join(pretokenize(text_string))
|
59 |
```
|
60 |
|
61 |
-
|
62 |
|
63 |
```python
|
64 |
from transformers import GPT2TokenizerFast
|
|
|
54 |
You should first pretokenize your text using the [MosesTokenizer](https://pypi.org/project/mosestokenizer/):
|
55 |
|
56 |
```python
|
57 |
+
from mosestokenizer import MosesTokenizer
|
58 |
+
|
59 |
with MosesTokenizer('en') as pretokenize:
|
60 |
pretokenized_text = " ".join(pretokenize(text_string))
|
61 |
```
|
62 |
|
63 |
+
Then, to BPE tokenize your text for this model, you should use the [tokenizer trained on Wikitext-103](https://huggingface.co/Kristijan/wikitext-103-tokenizer_v2):
|
64 |
|
65 |
```python
|
66 |
from transformers import GPT2TokenizerFast
|