|
--- |
|
language: |
|
- vi |
|
thumbnail: "https://raw.githubusercontent.com/kldarek/polbert/master/img/polbert.png" |
|
tags: |
|
- transfomer |
|
- sbert |
|
- legaltext |
|
- vietnamese |
|
license: "mit" |
|
datasets: |
|
- vietnamese-legal-text |
|
--- |
|
|
|
# Vietnamese Legal Text BERT |
|
#### Table of contents |
|
1. [Introduction](#introduction) |
|
2. [Using Vietnamese Legal Text BERT](#transformers) |
|
- [Installation](#install2) |
|
- [Pre-trained models](#models2) |
|
- [Example usage](#usage2) |
|
|
|
# <a name="introduction"></a> Using Vietnamese Legal Text BERT `hmthanh/VietnamLegalText-SBERT` |
|
|
|
|
|
## <a name="transformers"></a> Using Vietnamese Legal Text BERT `transformers` |
|
|
|
### Installation <a name="install2"></a> |
|
|
|
- Install `transformers` with pip: |
|
|
|
```pip install transformers```<br /> |
|
|
|
- Install `tokenizers` with pip: |
|
|
|
```pip install tokenizers``` |
|
|
|
### Pre-trained models <a name="models2"></a> |
|
|
|
|
|
Model | #params | Arch. | Max length | Pre-training data |
|
---|---|---|---|--- |
|
`hmthanh/VietnamLegalText-SBERT` | 135M | base | 256 | 20GB of texts |
|
|
|
### Example usage <a name="usage2"></a> |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModel, AutoTokenizer |
|
|
|
phobert = AutoModel.from_pretrained("hmthanh/VietnamLegalText-SBERT") |
|
tokenizer = AutoTokenizer.from_pretrained("hmthanh/VietnamLegalText-SBERT") |
|
|
|
sentence = 'Vượt đèn đỏ bị phạt bao nhiêu tiền?' |
|
|
|
input_ids = torch.tensor([tokenizer.encode(sentence)]) |
|
|
|
with torch.no_grad(): |
|
features = phobert(input_ids) # Models outputs are now tuples |
|
``` |
|
|