Rajan
/

NepaliBERT

Model card Files Files and versions Community

NepaliBERT / README.md

r4j4n

🤗🤗🤗🤗

996c3b8 almost 4 years ago

|

history blame contribute delete

1.28 kB


	# NepaliBERT(Phase 1)
	NEPALIBERT is a state-of-the-art language model for Nepali based on the BERT model. The model is trained using a masked language modeling (MLM).

	# Loading the model and tokenizer
	1. clone the model repo
	```
	git lfs install
	git clone https://huggingface.co/Rajan/NepaliBERT
	```
	2. Loading the Tokenizer
	```
	from transformers import BertTokenizer
	vocab_file_dir = './NepaliBERT/'
	tokenizer = BertTokenizer.from_pretrained(vocab_file_dir,
	strip_accents=False,
	clean_text=False )
	```
	3. Loading the model:
	```
	from transformers import BertForMaskedLM
	model = BertForMaskedLM.from_pretrained('./NepaliBERT')
	```

	The easiest way to check whether our language model is learning anything interesting is via the ```FillMaskPipeline```.

	Pipelines are simple wrappers around tokenizers and models, and the 'fill-mask' one will let you input a sequence containing a masked token (here, [mask]) and return a list of the most probable filled sequences, with their probabilities.

	```
	from transformers import pipeline

	fill_mask = pipeline(
	"fill-mask",
	model=model,
	tokenizer=tokenizer
	)
	```
	For more info visit the [GITHUB🤗](https://github.com/R4j4n/NepaliBERT)