suayptalha
/

medBERT-base

@@ -12,30 +12,30 @@ tags:
 - math
 ---
-# **mathBERT-base**
-This repository contains a BERT-based model, **mathBERT-base**, fine-tuned on the *ddrg/named_math_formulas* dataset for the task of **Masked Language Modeling (MLM)**. The model is trained to predict masked tokens in mathematical formulas and expressions. The goal of this project is to improve the model's understanding and generation of math-related formulas in natural language contexts.
 ## **Model Architecture**
 - **Base Model**: `bert-base-uncased`
-- **Task**: Masked Language Modeling (MLM) for mathematical formulas
 - **Tokenizer**: BERT's WordPiece tokenizer
 ## **Usage**
 ### **Loading the Pre-trained Model**
-You can load the pre-trained **mathBERT-base** model using the Hugging Face `transformers` library:
-```python
 from transformers import BertTokenizer, BertForMaskedLM
 import torch
-tokenizer = BertTokenizer.from_pretrained('suayptalha/mathBERT-base')
-model = BertForMaskedLM.from_pretrained('suayptalha/mathBERT-base').to("cuda")
-input_text = "The area of a circle is given by the formula A = πr^2."
-masked_text = input_text.replace("circle", tokenizer.mask_token)
 inputs = tokenizer(masked_text, return_tensors='pt').to("cuda")
@@ -45,19 +45,19 @@ predicted_token_id = torch.argmax(outputs.logits, dim=-1)
 predicted_token = tokenizer.decode(predicted_token_id[0, inputs['input_ids'].shape[1] - 1])
 print(predicted_token)
-```
 ### **Fine-tuning the Model**
-To fine-tune the **mathBERT-base** model on your own dataset, follow these steps:
-1. Prepare your dataset (e.g., mathematical formulas) in text format.
 2. Tokenize the dataset and apply masking.
 3. Train the model using the provided training loop.
 Here's the training code:
-https://github.com/suayptalha/mathBERT-base/blob/main/mathBERT-base.ipynb
 ## **Training Details**
@@ -68,10 +68,10 @@ https://github.com/suayptalha/mathBERT-base/blob/main/mathBERT-base.ipynb
 - **Max Sequence Length**: 512 tokens
 ### **Dataset**
-- **Dataset Name**: *ddrg/named_math_formulas*
-- **Task**: Masked Language Modeling (MLM) on mathematical formulas
 ## **Acknowledgements**
-- The *ddrg/named_math_formulas* dataset is available on the Hugging Face dataset hub and provides a rich collection of mathematical formulas for training.
-- This model uses the Hugging Face `transformers` library, which is a state-of-the-art library for NLP models

 - math
 ---
+# **medBERT-base**
+This repository contains a BERT-based model, **medBERT-base**, fine-tuned on the *gayanin/pubmed-gastro-maskfilling* dataset for the task of **Masked Language Modeling (MLM)**. The model is trained to predict masked tokens in medical and gastroenterological texts. The goal of this project is to improve the model's understanding and generation of medical-related information in natural language contexts.
 ## **Model Architecture**
 - **Base Model**: `bert-base-uncased`
+- **Task**: Masked Language Modeling (MLM) for medical texts
 - **Tokenizer**: BERT's WordPiece tokenizer
 ## **Usage**
 ### **Loading the Pre-trained Model**
+You can load the pre-trained **medBERT-base** model using the Hugging Face `transformers` library:
+'''
 from transformers import BertTokenizer, BertForMaskedLM
 import torch
+tokenizer = BertTokenizer.from_pretrained('suayptalha/medBERT-base')
+model = BertForMaskedLM.from_pretrained('suayptalha/medBERT-base').to("cuda")
+input_text = "The patient was diagnosed with gastric cancer after a thorough examination."
+masked_text = input_text.replace("gastric cancer", tokenizer.mask_token)
 inputs = tokenizer(masked_text, return_tensors='pt').to("cuda")
 predicted_token = tokenizer.decode(predicted_token_id[0, inputs['input_ids'].shape[1] - 1])
 print(predicted_token)
+'''
 ### **Fine-tuning the Model**
+To fine-tune the **medBERT-base** model on your own medical dataset, follow these steps:
+1. Prepare your dataset (e.g., medical texts or gastroenterology-related information) in text format.
 2. Tokenize the dataset and apply masking.
 3. Train the model using the provided training loop.
 Here's the training code:
+https://github.com/suayptalha/medBERT-base/blob/main/medBERT-base.ipynb
 ## **Training Details**
 - **Max Sequence Length**: 512 tokens
 ### **Dataset**
+- **Dataset Name**: *gayanin/pubmed-gastro-maskfilling*
+- **Task**: Masked Language Modeling (MLM) on medical texts
 ## **Acknowledgements**
+- The *gayanin/pubmed-gastro-maskfilling* dataset is available on the Hugging Face dataset hub and provides a rich collection of medical and gastroenterology-related information for training.
+- This model uses the Hugging Face `transformers` library, which is a state-of-the-art library for NLP models