medKGC / README.md
hanbinChen's picture
update
1a1c17c

A newer version of the Streamlit SDK is available: 1.44.1

Upgrade
metadata
title: MedKGC
emoji: 🐠
colorFrom: red
colorTo: red
sdk: streamlit
sdk_version: 1.39.0
app_file: app.py
pinned: false

Medical Knowledge Graph Construction (medKGC)

Overview

A automated annotation tool using LLMs to help medical annotators annotate the input radiology reports.

这个工具涉及了Named Entity Recognition,relation extraction, named entity normalization,最终结果会以知识图谱的形式输出。

medKGC is a medical text knowledge graph construction and review system. It supports entity recognition, relation extraction, and visualization of medical reports, providing a convenient review interface.

Deployment

Installation

  1. Create conda environment
conda create -n medkgc python=3.10
conda activate medkgc
  1. Install dependencies
pip install -r requirements.txt
  1. Run application
streamlit run app.py

Core Features

1. Data Processing

  • Position Conversion: Support word-level and char-level position conversion
  • Entity Conversion: Convert between JSON format and Selection objects
  • Relation Extraction: Entity ID-based relation mapping and reconstruction

2. Entity Annotation

  • Label Types:
    • OBS-DP: Observation definitely present (Red)
    • ANAT-DP: Anatomy definitely present (Cyan)
    • OBS-U: Observation uncertain (Yellow)
    • OBS-DA: Observation definitely absent (Gray)
  • Interactive Annotation: Support entity selection and annotation

3. Relation Visualization

  • Node Merging: Automatically merge entities with same text
  • Color Coding: Different colors for different entity types
  • Dynamic Updates: Support real-time graph updates

4. Review Process

  • Report Selection: Display pending and reviewed reports separately
  • Status Saving: Automatically save review status and modifications
  • Batch Processing: Support continuous review of multiple reports

Technical Implementation

Data Structures

  1. Entity Data
{
    "entities": {
        "1": {
            "tokens": "entity text",
            "label": "entity type",
            "start_ix": "word-level start position",
            "end_ix": "word-level end position",
            "relations": [["relation type", "target entity ID"]]
        }
    }
}

index start from 0.

  1. Selection Object
@dataclass
class Selection:
    start: int  # char-level start position
    end: int    # char-level end position
    text: str   # entity text
    labels: List[str]  # entity type list

Core Algorithms

  1. Position Conversion
def word_to_char_span(text, start_ix, end_ix):
    """Convert word-level position to character-level range"""
  1. Relation Reconstruction
def find_relations_with_entities(entities, entities_data):
    """Rebuild relations based on entity text matching"""

TODO

  1. Add data export functionality
  2. Named Entity Recognition
    1. 增加输入框
    2. 调用llms
  3. Relation Extraction
    1. Add relation editing functionality
  4. 数据在哪里
    1. 从某个地方读取,git上
    2. 存到某个地方,存有点麻烦(commit吗)

Contributing

Welcome to contribute through:

  1. Submit Issues for bug reports or suggestions
  2. Submit Pull Requests to improve code
  3. Improve documentation and comments

License

MIT License