fragata commited on
Commit
4700a34
1 Parent(s): 2d4b337

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -0
README.md CHANGED
@@ -1,3 +1,53 @@
1
  ---
 
 
 
 
 
2
  license: cc-by-nc-4.0
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+
3
+ language:
4
+ - hu
5
+ tags:
6
+ - fill-mask
7
  license: cc-by-nc-4.0
8
+ widget:
9
+ - text: "Elmesélek egy történetet a nyelvtechnológiáról."
10
  ---
11
+
12
+ # PULI BERT-Large
13
+
14
+ For further details, see [our demo site](https://juniper.nytud.hu/demo/nlp).
15
+
16
+ - Hungarian BERT large model
17
+ - Trained with Megatron-DeepSpeed [github](https://github.com/microsoft/Megatron-DeepSpeed)
18
+ - Dataset: 36.3 billion words
19
+ - Checkpoint: 150 000 steps
20
+
21
+ ## Limitations
22
+
23
+ - max_seq_length = 1024
24
+
25
+
26
+ ## Citation
27
+ If you use this model, please cite the following paper:
28
+
29
+ ```
30
+ @inproceedings {yang-gpt3,
31
+ title = {Jönnek a nagyok! GPT-3, GPT-2 és BERT large nyelvmodellek magyar nyelvre},
32
+ booktitle = {XIX. Magyar Számítógépes Nyelvészeti Konferencia (MSZNY 2023)},
33
+ year = {2023},
34
+ publisher = {Szegedi Tudományegyetem},
35
+ address = {Szeged, Hungary},
36
+ author = {Yang, Zijian Győző and Dodé, Réka and Ferenczi, Gergő and Héja, Enikő and Kőrös, Ádám and Laki, László János and Ligeti-Nagy, Noémi and Jelencsik-Mátyus, Kinga and Vadász, Noémi and Váradi, Tamás},
37
+ pages = {0}
38
+ }
39
+
40
+ ```
41
+
42
+ ## Usage
43
+
44
+ ```python
45
+ from transformers import BertTokenizer, MegatronBertModel
46
+
47
+ tokenizer = BertTokenizer.from_pretrained('NYTK/PULI-BERT-large')
48
+ model = MegatronBertModel.from_pretrained('NYTK/PULI-BERT-large')
49
+ text = "Replace me by any text you'd like."
50
+ encoded_input = tokenizer(text, return_tensors='pt')
51
+ output = model(**encoded_input)
52
+
53
+ ```