ONNX
Hebrew
bert
dingerstner commited on
Commit
10430bb
verified
1 Parent(s): ae95370

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -0
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ language:
4
+ - he
5
+ ---
6
+ # DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew
7
+
8
+ State-of-the-art language model for Hebrew, released [here](https://arxiv.org/abs/2308.16687).
9
+
10
+ This is the fine-tuned BERT-base model for the named-entity-recognition task.
11
+
12
+ For the bert-base models for other tasks, see [here](https://huggingface.co/collections/dicta-il/dictabert-6588e7cc08f83845fc42a18b).
13
+
14
+ Sample usage:
15
+
16
+ ```python
17
+ from transformers import pipeline
18
+ oracle = pipeline('ner', model='dicta-il/dictabert-ner', aggregation_strategy='simple')
19
+ # if we set aggregation_strategy to simple, we need to define a decoder for the tokenizer. Note that the last wordpiece of a group will still be emitted
20
+ from tokenizers.decoders import WordPiece
21
+ oracle.tokenizer.backend_tokenizer.decoder = WordPiece()
22
+ sentence = '''讚讜讚 讘谉-讙讜专讬讜谉 (16 讘讗讜拽讟讜讘专 1886 - 讜' 讘讻住诇讜 转砖诇"讚) 讛讬讛 诪讚讬谞讗讬 讬砖专讗诇讬 讜专讗砖 讛诪诪砖诇讛 讛专讗砖讜谉 砖诇 诪讚讬谞转 讬砖专讗诇.'''
23
+ oracle(sentence)
24
+ ```
25
+
26
+ Output:
27
+ ```json
28
+ [
29
+ {
30
+ "entity_group": "PER",
31
+ "score": 0.9999443,
32
+ "word": "讚讜讚 讘谉 - 讙讜专讬讜谉",
33
+ "start": 0,
34
+ "end": 13
35
+ },
36
+ {
37
+ "entity_group": "TIMEX",
38
+ "score": 0.99987966,
39
+ "word": "16 讘讗讜拽讟讜讘专 1886",
40
+ "start": 15,
41
+ "end": 31
42
+ },
43
+ {
44
+ "entity_group": "TIMEX",
45
+ "score": 0.9998579,
46
+ "word": "讜' 讘讻住诇讜 转砖诇\"讚",
47
+ "start": 34,
48
+ "end": 48
49
+ },
50
+ {
51
+ "entity_group": "TTL",
52
+ "score": 0.99963045,
53
+ "word": "讜专讗砖 讛诪诪砖诇讛",
54
+ "start": 68,
55
+ "end": 79
56
+ },
57
+ {
58
+ "entity_group": "GPE",
59
+ "score": 0.9997943,
60
+ "word": "讬砖专讗诇",
61
+ "start": 96,
62
+ "end": 101
63
+ }
64
+ ]
65
+ ```
66
+
67
+ ## Citation
68
+
69
+ If you use DictaBERT in your research, please cite ```DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew```
70
+
71
+ **BibTeX:**
72
+
73
+ ```bibtex
74
+ @misc{shmidman2023dictabert,
75
+ title={DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew},
76
+ author={Shaltiel Shmidman and Avi Shmidman and Moshe Koppel},
77
+ year={2023},
78
+ eprint={2308.16687},
79
+ archivePrefix={arXiv},
80
+ primaryClass={cs.CL}
81
+ }
82
+ ```
83
+
84
+ ## License
85
+
86
+ Shield: [![CC BY 4.0][cc-by-shield]][cc-by]
87
+
88
+ This work is licensed under a
89
+ [Creative Commons Attribution 4.0 International License][cc-by].
90
+
91
+ [![CC BY 4.0][cc-by-image]][cc-by]
92
+
93
+ [cc-by]: http://creativecommons.org/licenses/by/4.0/
94
+ [cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png
95
+ [cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg
96
+
97
+