dicta-il
/

dictabert

Model card Files Files and versions

Shaltiel commited on Aug 29, 2023

Commit

7ea3a1a

·

1 Parent(s): 1a3a03d

Update README.md

Files changed (1) hide show

README.md +53 -0

README.md CHANGED Viewed

@@ -1,3 +1,56 @@
 ---
 license: cc-by-4.0
 ---

 ---
 license: cc-by-4.0
+language:
+- he
 ---
+# DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew
+State-of-the-art language model for Hebrew, as released [here](link to arxiv).
+This is the base model pretrained with the masked-language-modeling objective.
+Sample usage:
+```
+from transformers import AutoModelForMaskedLM, AutoTokenzier
+tokenizer = AutoTokenzier.from_pretrained('dicta-il/dictabert')
+model = AutoModelForMaskedLM.from_pretrained('dicta-il/dictabert')
+model.eval()
+sentence = 'בשנת 1948 השלים אפרים קישון את [MASK] בפיסול מתכת ובתולדות האמנות והחל לפרסם מאמרים הומוריסטיים'
+output = model(tokenizer.encode(sentence, return_tensors='pt'))
+# the [MASK] is the 7th token (including [CLS])
+top_2 = torch.topk(output.logits[0, 7, :], 2)[1]
+print('\n'.join(tokenizer.convert_ids_to_tokens(top_2))) # should print לימודיו / התמחותו
+```
+## Citation
+If you use DictaBERT in your research, please cite ```DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew```
+**BibTeX:**
+To add
+## License
+Shield: [![CC BY 4.0][cc-by-shield]][cc-by]
+This work is licensed under a
+[Creative Commons Attribution 4.0 International License][cc-by].
+[![CC BY 4.0][cc-by-image]][cc-by]
+[cc-by]: http://creativecommons.org/licenses/by/4.0/
+[cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png
+[cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg