Spaces:
Running
Running
title: MedKGC | |
emoji: 🐠 | |
colorFrom: red | |
colorTo: red | |
sdk: streamlit | |
sdk_version: 1.39.0 | |
app_file: app.py | |
pinned: false | |
# Medical Knowledge Graph Construction (medKGC) | |
## Overview | |
A automated annotation tool using LLMs to help medical annotators annotate the input radiology reports. | |
这个工具涉及了Named Entity Recognition,relation extraction, named entity normalization,最终结果会以知识图谱的形式输出。 | |
medKGC is a medical text knowledge graph construction and review system. It supports entity recognition, relation extraction, and visualization of medical reports, providing a convenient review interface. | |
## Deployment | |
### Installation | |
1. Create conda environment | |
```bash | |
conda create -n medkgc python=3.10 | |
conda activate medkgc | |
``` | |
2. Install dependencies | |
```bash | |
pip install -r requirements.txt | |
``` | |
3. Run application | |
```bash | |
streamlit run app.py | |
``` | |
## Core Features | |
### 1. Data Processing | |
- **Position Conversion**: Support word-level and char-level position conversion | |
- **Entity Conversion**: Convert between JSON format and Selection objects | |
- **Relation Extraction**: Entity ID-based relation mapping and reconstruction | |
### 2. Entity Annotation | |
- **Label Types**: | |
- OBS-DP: Observation definitely present (Red) | |
- ANAT-DP: Anatomy definitely present (Cyan) | |
- OBS-U: Observation uncertain (Yellow) | |
- OBS-DA: Observation definitely absent (Gray) | |
- **Interactive Annotation**: Support entity selection and annotation | |
### 3. Relation Visualization | |
- **Node Merging**: Automatically merge entities with same text | |
- **Color Coding**: Different colors for different entity types | |
- **Dynamic Updates**: Support real-time graph updates | |
### 4. Review Process | |
- **Report Selection**: Display pending and reviewed reports separately | |
- **Status Saving**: Automatically save review status and modifications | |
- **Batch Processing**: Support continuous review of multiple reports | |
## Technical Implementation | |
### Data Structures | |
1. **Entity Data** | |
```json | |
{ | |
"entities": { | |
"1": { | |
"tokens": "entity text", | |
"label": "entity type", | |
"start_ix": "word-level start position", | |
"end_ix": "word-level end position", | |
"relations": [["relation type", "target entity ID"]] | |
} | |
} | |
} | |
``` | |
index start from 0. | |
2. **Selection Object** | |
```python | |
@dataclass | |
class Selection: | |
start: int # char-level start position | |
end: int # char-level end position | |
text: str # entity text | |
labels: List[str] # entity type list | |
``` | |
### Core Algorithms | |
1. **Position Conversion** | |
```python | |
def word_to_char_span(text, start_ix, end_ix): | |
"""Convert word-level position to character-level range""" | |
``` | |
2. **Relation Reconstruction** | |
```python | |
def find_relations_with_entities(entities, entities_data): | |
"""Rebuild relations based on entity text matching""" | |
``` | |
## TODO | |
1. [ ] Add data export functionality | |
2. [ ] Named Entity Recognition | |
1. [ ] 增加输入框 | |
2. [ ] 调用llms | |
3. [ ] Relation Extraction | |
1. [ ] Add relation editing functionality | |
4. [ ] 数据在哪里 | |
1. [ ] 从某个地方读取,git上 | |
2. [ ] 存到某个地方,存有点麻烦(commit吗) | |
## Contributing | |
Welcome to contribute through: | |
1. Submit Issues for bug reports or suggestions | |
2. Submit Pull Requests to improve code | |
3. Improve documentation and comments | |
## License | |
MIT License |