Spaces:

plebias
/

LLMGeneLinker_LGL_V1

Sleeping

App Files Files Community

LLMGeneLinker_LGL_V1 / README.md

hiverlab-nicholastkb

added demo.ipynb

a54ef97 almost 2 years ago

preview code

raw

history blame contribute delete

3.31 kB

	---
	title: LLMGeneLinker (LGL)
	language: en
	sdk: gradio
	tags:
	- Named Entity Recognition
	- SciBERT
	- Drug-Target interaction
	- Drugs
	- Genes
	- Proteins
	- Medical
	datasets:
	- bigbio/ncbi_disease
	- bigbio/bc5cdr
	- bigbio/genetag
	- bigbio/drugprot
	- allenai/drug-combo-extraction
	---

	# LLMGeneLinker (LGL): a Fine-Tuned SciBERT Model for Named Entity Recognition

	LLMGeneLinker uses a domain-specific transformer like SciBERT finetuned on AllenAI drug dataset, BC5CDR disease, NCBI disease, DrugProt and GeneTAG datasets. The resulting SciBERT model performs Named Entity Recognition to tag drug, protein, gene, diseases in input text. Sentence embedding of SciBERT is then fed into BERT

	## Table of Contents

	- [Model Overview](#model-overview)
	- [Usage](#usage)
	- [Installation](#installation)

	- [Dataset](#dataset)

	- [Contributing](#contributing)
	- [License](#license)

	## Model Overview

	The model is based on the [SciBERT](https://github.com/allenai/scibert) architecture, which is a pre-trained language model specifically designed for the biomedical domain. By fine-tuning SciBERT on a labeled dataset, we have created a specialized NER model that can accurately recognize drugs, genes, and diseases in biomedical texts.

	## Usage
	You can access an interactive web interface for querying the fine-tuned LGL model [here](spacelink). If you prefer to load the model yourself, you can check out [Installation](#installation) below.

	## Installation
	If you prefer to run LGL locally or conduct further fine-tuning, you need to install the required dependencies and download the model files. Follow the steps below to set up the environment:

	1. Clone this repository to your local machine.
	1.1 If you do not have Python installed, download python via the official sources. Anaconda is recommended if you use scientific packages often.

	If using anaconda, after installation setup a new conda environment via the following (replace myname with your own choice of environment name):
	```conda create --name myname python==3.8```

	2. Activate your venv/ conda env (if using) and install the required Python packages using `pip`:

	```pip install -r requirements_local.txt```

	3. To utilize the fine-tuned NER model for recognizing drugs, genes, and diseases, you can open `demo.ipynb` in Jupyter Lab by starting Jupyter Lab via ```jupyter lab```. The script takes text input as a string and returns the identified entities along with their respective labels.

	## Dataset

	The following datasets were processed and used for training and evaluation:
	Most datasets were sourced from `BigBIO` [GitHub] (https://github.com/bigscience-workshop/biomedical/blob/main/README.md) [HF] (https://huggingface.co/bigbio)

	\| Task Type \| Dataset \| Links \|\|
	\|:---------:\|:---------:\|:---------:\|:---------:\|:---------:\|:---------:\|:---------:\|
	\| NER \| NCBI-disease \| [Link](https://huggingface.co/datasets/bigbio/ncbi_disease)\|
	\| NER \| BC5-disease \| [Link](https://huggingface.co/datasets/bigbio/bc5cdr)\|
	\| NER \| Genetag \| [Link](https://huggingface.co/datasets/bigbio/genetag)\|
	\| NER/RE \| Drugprot \| [Link](https://huggingface.co/datasets/bigbio/drugprot)\|
	\| NER/RE \| AllenAI Drug-Combo-Extraction \| [Link](https://huggingface.co/datasets/allenai/drug-combo-extraction)\|