hmthanh's picture
Update README.md
2867842
|
raw
history blame
1.44 kB
metadata
license: mit

Vietnamese Legal Text BERT

Table of contents

  1. Introduction
  2. Using Vietnamese Legal Text BERT

Using Vietnamese Legal Text BERT hmthanh/VietnamLegalText-SBERT

Pre-trained PhoBERT models are the state-of-the-art language models for Vietnamese (Pho, i.e. "Phở", is a popular food in Vietnam):

Using Vietnamese Legal Text BERT transformers

Installation

  • Install transformers with pip: pip install transformers

  • Install tokenizers with pip: pip install tokenizers

Pre-trained models

Model #params Arch. Max length Pre-training data
hmthanh/VietnamLegalText-SBERT 135M base 256 20GB of texts

Example usage

import torch
from transformers import AutoModel, AutoTokenizer

phobert = AutoModel.from_pretrained("hmthanh/VietnamLegalText-SBERT")
tokenizer = AutoTokenizer.from_pretrained("hmthanh/VietnamLegalText-SBERT")

sentence = 'Chúng_tôi là những nghiên_cứu_viên .'  

input_ids = torch.tensor([tokenizer.encode(sentence)])

with torch.no_grad():
    features = phobert(input_ids)  # Models outputs are now tuples