metadata

title: MedKGC
emoji: 🐠
colorFrom: red
colorTo: red
sdk: streamlit
sdk_version: 1.39.0
app_file: app.py
pinned: false

Medical Knowledge Graph Construction (medKGC)

Overview

A automated annotation tool using LLMs to help medical annotators annotate the input radiology reports.

这个工具涉及了Named Entity Recognition，relation extraction, named entity normalization，最终结果会以知识图谱的形式输出。

medKGC is a medical text knowledge graph construction and review system. It supports entity recognition, relation extraction, and visualization of medical reports, providing a convenient review interface.

Deployment

Installation

Create conda environment

conda create -n medkgc python=3.10
conda activate medkgc

Install dependencies

pip install -r requirements.txt

Run application

streamlit run app.py

Core Features

1. Data Processing

Position Conversion: Support word-level and char-level position conversion
Entity Conversion: Convert between JSON format and Selection objects
Relation Extraction: Entity ID-based relation mapping and reconstruction

2. Entity Annotation

Label Types:
- OBS-DP: Observation definitely present (Red)
- ANAT-DP: Anatomy definitely present (Cyan)
- OBS-U: Observation uncertain (Yellow)
- OBS-DA: Observation definitely absent (Gray)
Interactive Annotation: Support entity selection and annotation

3. Relation Visualization

Node Merging: Automatically merge entities with same text
Color Coding: Different colors for different entity types
Dynamic Updates: Support real-time graph updates

4. Review Process

Report Selection: Display pending and reviewed reports separately
Status Saving: Automatically save review status and modifications
Batch Processing: Support continuous review of multiple reports

Technical Implementation

Data Structures

Entity Data

{
    "entities": {
        "1": {
            "tokens": "entity text",
            "label": "entity type",
            "start_ix": "word-level start position",
            "end_ix": "word-level end position",
            "relations": [["relation type", "target entity ID"]]
        }
    }
}

Selection Object

@dataclass
class Selection:
    start: int  # char-level start position
    end: int    # char-level end position
    text: str   # entity text
    labels: List[str]  # entity type list

Core Algorithms

Position Conversion

def word_to_char_span(text, start_ix, end_ix):
    """Convert word-level position to character-level range"""

Relation Reconstruction

def find_relations_with_entities(entities, entities_data):
    """Rebuild relations based on entity text matching"""

TODO

Add data export functionality
Named Entity Recognition
1. 增加输入框
2. 调用llms
Relation Extraction
1. Add relation editing functionality
数据在哪里
1. 从某个地方读取，git上
2. 存到某个地方，存有点麻烦（commit吗）

Contributing

Welcome to contribute through:

Submit Issues for bug reports or suggestions
Submit Pull Requests to improve code
Improve documentation and comments

License

MIT License