File size: 2,215 Bytes
7c4963e 4225bca 7c4963e 4225bca 1a1a1b5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
---
library_name: transformers
tags:
- legal
- roberta
- phobert
license: apache-2.0
datasets:
- NghiemAbe/Legal-corpus-indexing
language:
- vi
pipeline_tag: fill-mask
---
# Phobert Base model with Legal domain
**Experiment performed with Transformers version 4.38.2**\
Vi-Legal-PhoBert model for Legal domain based on [vinai/phobert-base-v2](https://huggingface.co/vinai/phobert-base-v2), then continued MLM pretraining for 154600 steps with token-level on [Legal Corpus](https://huggingface.co/datasets/NghiemAbe/Legal-corpus-indexing) so the model can learn to legal domain.
## Usage
Fill mask example:
```python:
from transformers import RobertaForMaskedLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("NghiemAbe/Vi-Legal-PhoBert")
model = RobertaLongForMaskedLM.from_pretrained("NghiemAbe/Vi-Legal-PhoBert")
```
## Metric
I evaluated my [Dev-Legal-Dataset](https://huggingface.co/datasets/NghiemAbe/dev_legal) and here are the results:
| Model | Paramaters | Language Type | Length | R@1 | R@5 | R@10 | R@20 | R@100 | MRR@5 | MRR@10 | MRR@20 | MRR@100 | Accuracy Masked|
|-------------------------------|------------|---------------|--------|-------|-------|-------|-------|-------|-------|--------|--------|---------|---------|
| vinai/phobert-base-v2 | 125M | vi | 256 | 0.266 | 0.482 | 0.601 | 0.702 | 0.841 | 0.356 | 0.372 | 0.379 | 0.382 | 0.522|
| FacebookAI/xlm-roberta-base | 279M | mul | 512 | 0.012 | 0.042 | 0.064 | 0.091 | 0.207 | 0.025 | 0.028 | 0.030 | 0.033 | x|
| Geotrend/bert-base-vi-cased | 179M | vi | 512 | 0.098 | 0.175 | 0.202 | 0.241 | 0.356 | 0.131 | 0.136 | 0.139 | 0.142 | x|
| NlpHUST/roberta-base-vn | x | vi | 512 | 0.050 | 0.097 | 0.126 | 0.163 | 0.369 | 0.071 | 0.076 | 0.078 | 0.083 | x|
| aisingapore/sealion-bert-base| x | mul | 512 | 0.002 | 0.007 | 0.021 | 0.036 | 0.106 | 0.003 | 0.005 | 0.006 | 0.008 | x|
| **Vi-Legal-PhoBert** | 125M | vi | 256 | **0.290**| **0.560**| **0.707**| **0.819**| **0.935**| **0.410**| **0.430**| **0.437**| **0.440**|**0.8401**|
|