Fill-Mask
Transformers
PyTorch
bert
Inference Endpoints
File size: 1,294 Bytes
7b3c3aa
 
a1a64c1
90b5393
 
 
5fdd890
90b5393
3a04fc1
bcf8a7d
90b5393
 
bcf8a7d
90b5393
5a365c4
90b5393
5a365c4
f0f8f93
 
 
 
 
 
 
 
90b5393
bcf8a7d
 
e22abe9
 
 
 
 
 
 
bcf8a7d
 
90b5393
b08df0f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
license: apache-2.0
pipeline_tag: fill-mask
---


# manchuBERT

manchuBERT is a BERT-base model trained with romanized Manchu data from scratch.  
[ManNER & ManPOS](https://aclanthology.org/2024.lrec-main.961.pdf) are fine-tuned manchuBERT models.


## Data

manchuBERT utilizes the data augmentation method from [Mergen: The First Manchu-Korean Machine Translation Model Trained on Augmented Data](https://arxiv.org/pdf/2311.17492.pdf).

|          **Data**          | **Number of Sentences(before augmentation)** |
|:---------------------------:|:-----------------------:|
|   Manwén Lˇaodàng–Taizong   |          2,220          |
|      Ilan gurun i bithe     |          41,904         |
|      Gin ping mei bithe     |          21,376         |
|      Yùzhì Q¯ıngwénjiàn     |          11,954         |
| Yùzhì Zengdìng Q¯ıngwénjiàn |          18,420         |
|    Manwén Lˇaodàng–Taizu    |          22,578         |
|   Manchu-Korean Dictionary  |          40,583         |

## Citation
```
@misc {jean_seo_2024,
	author       = { {Jean Seo} },
	title        = { manchuBERT (Revision 64133be) },
	year         = 2024,
	url          = { https://huggingface.co/seemdog/manchuBERT },
	doi          = { 10.57967/hf/1599 },
	publisher    = { Hugging Face }
}
```