File size: 1,294 Bytes

7b3c3aa
 
a1a64c1
90b5393
 
 
5fdd890
90b5393
3a04fc1
bcf8a7d
90b5393
 
bcf8a7d
90b5393
5a365c4
90b5393
5a365c4
f0f8f93
 
 
 
 
 
 
 
90b5393
bcf8a7d
 
e22abe9
 
 
 
 
 
 
bcf8a7d
 
90b5393
b08df0f

---
license: apache-2.0
pipeline_tag: fill-mask
---


# manchuBERT

manchuBERT is a BERT-base model trained with romanized Manchu data from scratch.  
[ManNER & ManPOS](https://aclanthology.org/2024.lrec-main.961.pdf) are fine-tuned manchuBERT models.


## Data

manchuBERT utilizes the data augmentation method from [Mergen: The First Manchu-Korean Machine Translation Model Trained on Augmented Data](https://arxiv.org/pdf/2311.17492.pdf).

|          **Data**          | **Number of Sentences(before augmentation)** |
|:---------------------------:|:-----------------------:|
|   Manwén Lˇaodàng–Taizong   |          2,220          |
|      Ilan gurun i bithe     |          41,904         |
|      Gin ping mei bithe     |          21,376         |
|      Yùzhì Q¯ıngwénjiàn     |          11,954         |
| Yùzhì Zengdìng Q¯ıngwénjiàn |          18,420         |
|    Manwén Lˇaodàng–Taizu    |          22,578         |
|   Manchu-Korean Dictionary  |          40,583         |

## Citation
```
@misc {jean_seo_2024,
	author       = { {Jean Seo} },
	title        = { manchuBERT (Revision 64133be) },
	year         = 2024,
	url          = { https://huggingface.co/seemdog/manchuBERT },
	doi          = { 10.57967/hf/1599 },
	publisher    = { Hugging Face }
}
```