--- license: cc-by-4.0 language: - he inference: false --- # DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew State-of-the-art language model for Hebrew, released [here](https://arxiv.org/abs/2308.16687). This is the fine-tuned model for the prefix segmentation task. For the base model, see [here](https://huggingface.co/dicta-il/dictabert). For the morphology model, see [here](https://huggingface.co/dicta-il/dictabert-morph). For the QA model, see [here](https://huggingface.co/dicta-il/dictabert-heq). Sample usage: ```python from transformers import AutoModel, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained('dicta-il/dictabert-seg') model = AutoModel.from_pretrained('dicta-il/dictabert-seg', trust_remote_code=True) model.eval() sentence = 'בשנת 1948 השלים אפרים קישון את לימודיו בפיסול מתכת ובתולדות האמנות והחל לפרסם מאמרים הומוריסטיים' print(model.predict([sentence], tokenizer)) ``` Output: ```json [ [ [ "[CLS]" ], [ "ב","שנת" ], [ "1948" ], [ "השלים" ], [ "אפרים" ], [ "קישון" ], [ "את" ], [ "לימודיו" ], [ "ב","פיסול" ], [ "מתכת" ], [ "וב","תולדות" ], [ "ה","אמנות" ], [ "ו","החל" ], [ "לפרסם" ], [ "מאמרים" ], [ "הומוריסטיים" ], [ "[SEP]" ] ] ] ``` ## Citation If you use DictaBERT in your research, please cite ```DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew``` **BibTeX:** ```bibtex @misc{shmidman2023dictabert, title={DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew}, author={Shaltiel Shmidman and Avi Shmidman and Moshe Koppel}, year={2023}, eprint={2308.16687}, archivePrefix={arXiv}, primaryClass={cs.CL} } ``` ## License Shield: [![CC BY 4.0][cc-by-shield]][cc-by] This work is licensed under a [Creative Commons Attribution 4.0 International License][cc-by]. [![CC BY 4.0][cc-by-image]][cc-by] [cc-by]: http://creativecommons.org/licenses/by/4.0/ [cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png [cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg