Spaces:
Running
Running
File size: 3,407 Bytes
e808dc6 e81f16d 04a8bf0 2981176 ab3a660 6e0faa9 2981176 04a8bf0 9985fd7 6e0faa9 2981176 e81f16d 2981176 e81f16d 2981176 e81f16d 2981176 e81f16d 2981176 e81f16d 2981176 e81f16d 2981176 1a1c17c e81f16d 2981176 e81f16d 2981176 9985fd7 6e0faa9 9985fd7 2981176 9985fd7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
---
title: MedKGC
emoji: 🐠
colorFrom: red
colorTo: red
sdk: streamlit
sdk_version: 1.39.0
app_file: app.py
pinned: false
---
# Medical Knowledge Graph Construction (medKGC)
## Overview
A automated annotation tool using LLMs to help medical annotators annotate the input radiology reports.
这个工具涉及了Named Entity Recognition,relation extraction, named entity normalization,最终结果会以知识图谱的形式输出。
medKGC is a medical text knowledge graph construction and review system. It supports entity recognition, relation extraction, and visualization of medical reports, providing a convenient review interface.
## Deployment
### Installation
1. Create conda environment
```bash
conda create -n medkgc python=3.10
conda activate medkgc
```
2. Install dependencies
```bash
pip install -r requirements.txt
```
3. Run application
```bash
streamlit run app.py
```
## Core Features
### 1. Data Processing
- **Position Conversion**: Support word-level and char-level position conversion
- **Entity Conversion**: Convert between JSON format and Selection objects
- **Relation Extraction**: Entity ID-based relation mapping and reconstruction
### 2. Entity Annotation
- **Label Types**:
- OBS-DP: Observation definitely present (Red)
- ANAT-DP: Anatomy definitely present (Cyan)
- OBS-U: Observation uncertain (Yellow)
- OBS-DA: Observation definitely absent (Gray)
- **Interactive Annotation**: Support entity selection and annotation
### 3. Relation Visualization
- **Node Merging**: Automatically merge entities with same text
- **Color Coding**: Different colors for different entity types
- **Dynamic Updates**: Support real-time graph updates
### 4. Review Process
- **Report Selection**: Display pending and reviewed reports separately
- **Status Saving**: Automatically save review status and modifications
- **Batch Processing**: Support continuous review of multiple reports
## Technical Implementation
### Data Structures
1. **Entity Data**
```json
{
"entities": {
"1": {
"tokens": "entity text",
"label": "entity type",
"start_ix": "word-level start position",
"end_ix": "word-level end position",
"relations": [["relation type", "target entity ID"]]
}
}
}
```
index start from 0.
2. **Selection Object**
```python
@dataclass
class Selection:
start: int # char-level start position
end: int # char-level end position
text: str # entity text
labels: List[str] # entity type list
```
### Core Algorithms
1. **Position Conversion**
```python
def word_to_char_span(text, start_ix, end_ix):
"""Convert word-level position to character-level range"""
```
2. **Relation Reconstruction**
```python
def find_relations_with_entities(entities, entities_data):
"""Rebuild relations based on entity text matching"""
```
## TODO
1. [ ] Add data export functionality
2. [ ] Named Entity Recognition
1. [ ] 增加输入框
2. [ ] 调用llms
3. [ ] Relation Extraction
1. [ ] Add relation editing functionality
4. [ ] 数据在哪里
1. [ ] 从某个地方读取,git上
2. [ ] 存到某个地方,存有点麻烦(commit吗)
## Contributing
Welcome to contribute through:
1. Submit Issues for bug reports or suggestions
2. Submit Pull Requests to improve code
3. Improve documentation and comments
## License
MIT License |