File size: 1,542 Bytes
6a7a3d9
 
6558638
 
 
 
 
6a7a3d9
6558638
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2b257bd
 
 
6558638
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
---
license: apache-2.0
language:
- he
library_name: transformers
tags:
- bert
---

> Update 2023-5-23: This model is `BEREL` version 1.0. We are now happy to provide a much improved `BEREL_2.0`.


# Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language

When using BEREL, please reference:


Avi Shmidman, Joshua Guedalia, Shaltiel Shmidman, Cheyn Shmuel Shmidman, Eli Handel, Moshe Koppel, "Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language", Aug 2022 [arXiv:2208.01875]



1. Usage:

```python
from transformers import AutoTokenizer, BertForMaskedLM

tokenizer = AutoTokenizer.from_pretrained('dicta-il/BEREL')
model = BertForMaskedLM.from_pretrained('dicta-il/BEREL')

# for evaluation, disable dropout
model.eval()
```

> NOTE: This code will **not** work and provide bad results if you use `BertTokenizer`. Please use `AutoTokenizer` or `BertTokenizerFast`.

2.  Demo site:
You can experiment with the model in a GUI interface here: https://dicta-bert-demo.netlify.app/?genre=rabbinic
- The main part of the GUI consists of word buttons visualizing the tokenization of the sentences. Clicking on a button masks it, and then three BEREL word predictions are shown. Clicking on that bubble expands it to 10 predictions; alternatively, ctrl-clicking on that initial bubble expands to 30 predictions.
- Ctrl-clicking adjacent word buttons combines them into a single token for the mask.
- The edit box on top contains the input sentence; this can be modified at will, and the word-buttons will adjust as relevant.