Spaces:

hanbinChen
/

medKGC

Sleeping

File size: 3,387 Bytes

---
title: MedKGC
emoji: 🐠
colorFrom: red
colorTo: red
sdk: streamlit
sdk_version: 1.39.0
app_file: app.py
pinned: false
---

# Medical Knowledge Graph Construction (medKGC)

## Overview
A automated annotation tool using LLMs to help medical annotators annotate the input radiology reports.

这个工具涉及了Named Entity Recognition，relation extraction, named entity normalization，最终结果会以知识图谱的形式输出。

medKGC is a medical text knowledge graph construction and review system. It supports entity recognition, relation extraction, and visualization of medical reports, providing a convenient review interface.

## Deployment

### Installation
1. Create conda environment
```bash
conda create -n medkgc python=3.10
conda activate medkgc
```

2. Install dependencies
```bash
pip install -r requirements.txt
```

3. Run application
```bash
streamlit run app.py
```




## Core Features

### 1. Data Processing
- **Position Conversion**: Support word-level and char-level position conversion
- **Entity Conversion**: Convert between JSON format and Selection objects
- **Relation Extraction**: Entity ID-based relation mapping and reconstruction

### 2. Entity Annotation
- **Label Types**:
  - OBS-DP: Observation definitely present (Red)
  - ANAT-DP: Anatomy definitely present (Cyan)
  - OBS-U: Observation uncertain (Yellow)
  - OBS-DA: Observation definitely absent (Gray)
- **Interactive Annotation**: Support entity selection and annotation

### 3. Relation Visualization
- **Node Merging**: Automatically merge entities with same text
- **Color Coding**: Different colors for different entity types
- **Dynamic Updates**: Support real-time graph updates

### 4. Review Process
- **Report Selection**: Display pending and reviewed reports separately
- **Status Saving**: Automatically save review status and modifications
- **Batch Processing**: Support continuous review of multiple reports

## Technical Implementation

### Data Structures
1. **Entity Data**
```json
{
    "entities": {
        "1": {
            "tokens": "entity text",
            "label": "entity type",
            "start_ix": "word-level start position",
            "end_ix": "word-level end position",
            "relations": [["relation type", "target entity ID"]]
        }
    }
}
```

2. **Selection Object**
```python
@dataclass
class Selection:
    start: int  # char-level start position
    end: int    # char-level end position
    text: str   # entity text
    labels: List[str]  # entity type list
```

### Core Algorithms
1. **Position Conversion**
```python
def word_to_char_span(text, start_ix, end_ix):
    """Convert word-level position to character-level range"""
```

2. **Relation Reconstruction**
```python
def find_relations_with_entities(entities, entities_data):
    """Rebuild relations based on entity text matching"""
```

## TODO
1. [ ] Add data export functionality
2. [ ] Named Entity Recognition
   1. [ ] 增加输入框
   2. [ ] 调用llms
3. [ ] Relation Extraction
   1. [ ] Add relation editing functionality
4. [ ] 数据在哪里
   1. [ ] 从某个地方读取，git上
   2. [ ] 存到某个地方，存有点麻烦（commit吗）

## Contributing
Welcome to contribute through:
1. Submit Issues for bug reports or suggestions
2. Submit Pull Requests to improve code
3. Improve documentation and comments

## License
MIT License