medKGC / README.md
hanbinChen's picture
update
1a1c17c
---
title: MedKGC
emoji: 🐠
colorFrom: red
colorTo: red
sdk: streamlit
sdk_version: 1.39.0
app_file: app.py
pinned: false
---
# Medical Knowledge Graph Construction (medKGC)
## Overview
A automated annotation tool using LLMs to help medical annotators annotate the input radiology reports.
这个工具涉及了Named Entity Recognition,relation extraction, named entity normalization,最终结果会以知识图谱的形式输出。
medKGC is a medical text knowledge graph construction and review system. It supports entity recognition, relation extraction, and visualization of medical reports, providing a convenient review interface.
## Deployment
### Installation
1. Create conda environment
```bash
conda create -n medkgc python=3.10
conda activate medkgc
```
2. Install dependencies
```bash
pip install -r requirements.txt
```
3. Run application
```bash
streamlit run app.py
```
## Core Features
### 1. Data Processing
- **Position Conversion**: Support word-level and char-level position conversion
- **Entity Conversion**: Convert between JSON format and Selection objects
- **Relation Extraction**: Entity ID-based relation mapping and reconstruction
### 2. Entity Annotation
- **Label Types**:
- OBS-DP: Observation definitely present (Red)
- ANAT-DP: Anatomy definitely present (Cyan)
- OBS-U: Observation uncertain (Yellow)
- OBS-DA: Observation definitely absent (Gray)
- **Interactive Annotation**: Support entity selection and annotation
### 3. Relation Visualization
- **Node Merging**: Automatically merge entities with same text
- **Color Coding**: Different colors for different entity types
- **Dynamic Updates**: Support real-time graph updates
### 4. Review Process
- **Report Selection**: Display pending and reviewed reports separately
- **Status Saving**: Automatically save review status and modifications
- **Batch Processing**: Support continuous review of multiple reports
## Technical Implementation
### Data Structures
1. **Entity Data**
```json
{
"entities": {
"1": {
"tokens": "entity text",
"label": "entity type",
"start_ix": "word-level start position",
"end_ix": "word-level end position",
"relations": [["relation type", "target entity ID"]]
}
}
}
```
index start from 0.
2. **Selection Object**
```python
@dataclass
class Selection:
start: int # char-level start position
end: int # char-level end position
text: str # entity text
labels: List[str] # entity type list
```
### Core Algorithms
1. **Position Conversion**
```python
def word_to_char_span(text, start_ix, end_ix):
"""Convert word-level position to character-level range"""
```
2. **Relation Reconstruction**
```python
def find_relations_with_entities(entities, entities_data):
"""Rebuild relations based on entity text matching"""
```
## TODO
1. [ ] Add data export functionality
2. [ ] Named Entity Recognition
1. [ ] 增加输入框
2. [ ] 调用llms
3. [ ] Relation Extraction
1. [ ] Add relation editing functionality
4. [ ] 数据在哪里
1. [ ] 从某个地方读取,git上
2. [ ] 存到某个地方,存有点麻烦(commit吗)
## Contributing
Welcome to contribute through:
1. Submit Issues for bug reports or suggestions
2. Submit Pull Requests to improve code
3. Improve documentation and comments
## License
MIT License