Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,6 @@
|
|
|
|
|
|
|
|
1 |
# Kurmanji Tokenizer
|
2 |
|
3 |
This repository contains the Kurmanji Tokenizer trained on a 50 million token text corpus. The tokenizer was specifically developed to support the Kurmanji dialect of Kurdish, ensuring accurate and efficient tokenization for natural language processing tasks in this language.
|
@@ -34,4 +37,4 @@ tokenizer = PreTrainedTokenizerFast.from_pretrained("asosoft/KurmanjiTokenizer-W
|
|
34 |
# Example usage
|
35 |
text = "Navê min Ali ye."
|
36 |
tokens = tokenizer.encode(text)
|
37 |
-
print(tokens)
|
|
|
1 |
+
---
|
2 |
+
pipeline_tag: feature-extraction
|
3 |
+
---
|
4 |
# Kurmanji Tokenizer
|
5 |
|
6 |
This repository contains the Kurmanji Tokenizer trained on a 50 million token text corpus. The tokenizer was specifically developed to support the Kurmanji dialect of Kurdish, ensuring accurate and efficient tokenization for natural language processing tasks in this language.
|
|
|
37 |
# Example usage
|
38 |
text = "Navê min Ali ye."
|
39 |
tokens = tokenizer.encode(text)
|
40 |
+
print(tokens)
|