law-ai
/

InLegalBERT

Inference Endpoints

Model card Files Files and versions Community

law-ai commited on Sep 11, 2022

Commit

8005ecd

•

1 Parent(s): 4b9c41c

Update README.md

Files changed (1) hide show

README.md +14 -5

README.md CHANGED Viewed

@@ -19,14 +19,23 @@ In total, our dataset contains around 5.4 million Indian legal documents (all in
 The raw text corpus size is around 27 GB.
 ### Training Objective
-This model is initialized with the [LEGAL-BERT-SC model](https://huggingface.co/nlpaueb/legal-bert-base-uncased) from the paper [LEGAL-BERT: The Muppets straight out of Law School](https://aclanthology.org/2020.findings-emnlp.261/)
 ### Usage
-Using the tokenizer (same as LegalBERT
 ```python
-from transformers import AutoTokenizer, AutoModel, BertForPreTraining
-tokenizer = AutoTokenizer.from_pretrained("nlpaueb/legal-bert-base-uncased")
-model = AutoModel.from_pretrained("nlpaueb/legal-bert-base-uncased")
 ```
 ### Citation

 The raw text corpus size is around 27 GB.
 ### Training Objective
+This model is initialized with the [LEGAL-BERT-SC model](https://huggingface.co/nlpaueb/legal-bert-base-uncased) from the paper [LEGAL-BERT: The Muppets straight out of Law School](https://aclanthology.org/2020.findings-emnlp.261/). In our work, we refer to this model as LegalBERT, and our re-trained model as InLegalBERT.
 ### Usage
+Using the tokenizer (same as [LegalBERT](https://huggingface.co/nlpaueb/legal-bert-base-uncased))
 ```python
+from transformers import AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("law-ai/InLegalBERT")
+```
+Using the model to get embeddings/representations for a sentence
+```python
+from transformers import AutoModel
+model = AutoModel.from_pretrained("law-ai/InLegalBERT")
+```
+Using the model for further pre-training with MLM and NSP
+```python
+from transformers import BertForPreTraining
+model_with_pretraining_heads = BertForPreTraining.from_pretrained("law-ai/InLegalBERT")
 ```
 ### Citation