MuRIL - Unofficial

Multilingual Representations for Indian Languages : Google open sourced this BERT model pre-trained on 17 Indian languages, and their transliterated counterparts.

The model was trained using a self-supervised masked language modeling task. We do whole word masking with a maximum of 80 predictions. The model was trained for 1000K steps, with a batch size of 4096, and a max sequence length of 512.

Original model on TFHub: https://tfhub.dev/google/MuRIL/1

Official release now on HuggingFace (March 2021) https://huggingface.co/google/muril-base-cased

License: Apache 2.0

About this upload

I ported the TFHub .pb model to .h5 and then pytorch_model.bin for compatibility with Transformers.

Downloads last month
17
Safetensors
Model size
238M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for monsoon-nlp/muril-adapted-local

Finetunes
1 model