metadata
language:
- vi
thumbnail: https://raw.githubusercontent.com/kldarek/polbert/master/img/polbert.png
tags:
- transfomer
- sbert
- legaltext
- vietnamese
license: mit
datasets:
- vietnamese-legal-text
Vietnamese Legal Text BERT
Table of contents
Using Vietnamese Legal Text BERT hmthanh/VietnamLegalText-SBERT
Using Vietnamese Legal Text BERT transformers
Installation
- Install
transformers
with pip:
pip install transformers
- Install
tokenizers
with pip:
pip install tokenizers
Pre-trained models
Model | #params | Arch. | Max length | Pre-training data |
---|---|---|---|---|
hmthanh/VietnamLegalText-SBERT |
135M | base | 256 | 20GB of texts |
Example usage
import torch
from transformers import AutoModel, AutoTokenizer
phobert = AutoModel.from_pretrained("hmthanh/VietnamLegalText-SBERT")
tokenizer = AutoTokenizer.from_pretrained("hmthanh/VietnamLegalText-SBERT")
sentence = 'Vượt đèn đỏ bị phạt bao nhiêu tiền?'
input_ids = torch.tensor([tokenizer.encode(sentence)])
with torch.no_grad():
features = phobert(input_ids) # Models outputs are now tuples