metadata
license: mit
Vietnamese Legal Text BERT
Table of contents
Using Vietnamese Legal Text BERT hmthanh/VietnamLegalText-SBERT
Pre-trained PhoBERT models are the state-of-the-art language models for Vietnamese (Pho, i.e. "Phở", is a popular food in Vietnam):
Using Vietnamese Legal Text BERT transformers
Installation
Install
transformers
with pip:pip install transformers
Install
tokenizers
with pip:pip install tokenizers
Pre-trained models
Model | #params | Arch. | Max length | Pre-training data |
---|---|---|---|---|
hmthanh/VietnamLegalText-SBERT |
135M | base | 256 | 20GB of texts |
Example usage
import torch
from transformers import AutoModel, AutoTokenizer
phobert = AutoModel.from_pretrained("hmthanh/VietnamLegalText-SBERT")
tokenizer = AutoTokenizer.from_pretrained("hmthanh/VietnamLegalText-SBERT")
sentence = 'Chúng_tôi là những nghiên_cứu_viên .'
input_ids = torch.tensor([tokenizer.encode(sentence)])
with torch.no_grad():
features = phobert(input_ids) # Models outputs are now tuples