Shaltiel commited on
Commit
7ea3a1a
1 Parent(s): 1a3a03d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md CHANGED
@@ -1,3 +1,56 @@
1
  ---
2
  license: cc-by-4.0
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-4.0
3
+ language:
4
+ - he
5
  ---
6
+ # DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew
7
+
8
+ State-of-the-art language model for Hebrew, as released [here](link to arxiv).
9
+
10
+ This is the base model pretrained with the masked-language-modeling objective.
11
+
12
+ Sample usage:
13
+
14
+ ```
15
+ from transformers import AutoModelForMaskedLM, AutoTokenzier
16
+
17
+ tokenizer = AutoTokenzier.from_pretrained('dicta-il/dictabert')
18
+ model = AutoModelForMaskedLM.from_pretrained('dicta-il/dictabert')
19
+
20
+ model.eval()
21
+
22
+ sentence = '讘砖谞转 1948 讛砖诇讬诐 讗驻专讬诐 拽讬砖讜谉 讗转 [MASK] 讘驻讬住讜诇 诪转讻转 讜讘转讜诇讚讜转 讛讗诪谞讜转 讜讛讞诇 诇驻专住诐 诪讗诪专讬诐 讛讜诪讜专讬住讟讬讬诐'
23
+
24
+ output = model(tokenizer.encode(sentence, return_tensors='pt'))
25
+ # the [MASK] is the 7th token (including [CLS])
26
+ top_2 = torch.topk(output.logits[0, 7, :], 2)[1]
27
+ print('\n'.join(tokenizer.convert_ids_to_tokens(top_2))) # should print 诇讬诪讜讚讬讜 / 讛转诪讞讜转讜
28
+
29
+ ```
30
+
31
+
32
+ ## Citation
33
+
34
+ If you use DictaBERT in your research, please cite ```DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew```
35
+
36
+ **BibTeX:**
37
+
38
+ To add
39
+
40
+ ## License
41
+
42
+ Shield: [![CC BY 4.0][cc-by-shield]][cc-by]
43
+
44
+ This work is licensed under a
45
+ [Creative Commons Attribution 4.0 International License][cc-by].
46
+
47
+ [![CC BY 4.0][cc-by-image]][cc-by]
48
+
49
+ [cc-by]: http://creativecommons.org/licenses/by/4.0/
50
+ [cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png
51
+ [cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg
52
+
53
+
54
+
55
+
56
+