asosoft
/

KurmanjiTokenizer-Whisper

Feature Extraction

Model card Files Files and versions Community

abdulhade commited on Sep 2, 2024

Commit

f434b1d

·

verified ·

1 Parent(s): 5f145ff

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -1,3 +1,6 @@
 # Kurmanji Tokenizer
 This repository contains the Kurmanji Tokenizer trained on a 50 million token text corpus. The tokenizer was specifically developed to support the Kurmanji dialect of Kurdish, ensuring accurate and efficient tokenization for natural language processing tasks in this language.
@@ -34,4 +37,4 @@ tokenizer = PreTrainedTokenizerFast.from_pretrained("asosoft/KurmanjiTokenizer-W
 # Example usage
 text = "Navê min Ali ye."
 tokens = tokenizer.encode(text)
-print(tokens)

+---
+pipeline_tag: feature-extraction
+---
 # Kurmanji Tokenizer
 This repository contains the Kurmanji Tokenizer trained on a 50 million token text corpus. The tokenizer was specifically developed to support the Kurmanji dialect of Kurdish, ensuring accurate and efficient tokenization for natural language processing tasks in this language.
 # Example usage
 text = "Navê min Ali ye."
 tokens = tokenizer.encode(text)
+print(tokens)