File size: 3,407 Bytes
e808dc6
 
 
 
 
 
 
 
 
 
 
e81f16d
04a8bf0
2981176
ab3a660
6e0faa9
 
 
2981176
04a8bf0
9985fd7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6e0faa9
 
 
2981176
e81f16d
2981176
 
 
 
e81f16d
2981176
 
 
 
 
 
 
e81f16d
2981176
 
 
 
e81f16d
2981176
 
 
 
e81f16d
2981176
e81f16d
2981176
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1a1c17c
e81f16d
2981176
 
 
 
 
 
 
 
 
e81f16d
2981176
 
 
 
 
 
 
 
 
 
 
 
 
9985fd7
 
 
6e0faa9
9985fd7
 
 
 
 
 
2981176
 
 
 
 
 
 
 
9985fd7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
---
title: MedKGC
emoji: 🐠
colorFrom: red
colorTo: red
sdk: streamlit
sdk_version: 1.39.0
app_file: app.py
pinned: false
---

# Medical Knowledge Graph Construction (medKGC)

## Overview
A automated annotation tool using LLMs to help medical annotators annotate the input radiology reports.

这个工具涉及了Named Entity Recognition,relation extraction, named entity normalization,最终结果会以知识图谱的形式输出。

medKGC is a medical text knowledge graph construction and review system. It supports entity recognition, relation extraction, and visualization of medical reports, providing a convenient review interface.

## Deployment

### Installation
1. Create conda environment
```bash
conda create -n medkgc python=3.10
conda activate medkgc
```

2. Install dependencies
```bash
pip install -r requirements.txt
```

3. Run application
```bash
streamlit run app.py
```




## Core Features

### 1. Data Processing
- **Position Conversion**: Support word-level and char-level position conversion
- **Entity Conversion**: Convert between JSON format and Selection objects
- **Relation Extraction**: Entity ID-based relation mapping and reconstruction

### 2. Entity Annotation
- **Label Types**:
  - OBS-DP: Observation definitely present (Red)
  - ANAT-DP: Anatomy definitely present (Cyan)
  - OBS-U: Observation uncertain (Yellow)
  - OBS-DA: Observation definitely absent (Gray)
- **Interactive Annotation**: Support entity selection and annotation

### 3. Relation Visualization
- **Node Merging**: Automatically merge entities with same text
- **Color Coding**: Different colors for different entity types
- **Dynamic Updates**: Support real-time graph updates

### 4. Review Process
- **Report Selection**: Display pending and reviewed reports separately
- **Status Saving**: Automatically save review status and modifications
- **Batch Processing**: Support continuous review of multiple reports

## Technical Implementation

### Data Structures
1. **Entity Data**
```json
{
    "entities": {
        "1": {
            "tokens": "entity text",
            "label": "entity type",
            "start_ix": "word-level start position",
            "end_ix": "word-level end position",
            "relations": [["relation type", "target entity ID"]]
        }
    }
}
```
index start from 0.

2. **Selection Object**
```python
@dataclass
class Selection:
    start: int  # char-level start position
    end: int    # char-level end position
    text: str   # entity text
    labels: List[str]  # entity type list
```

### Core Algorithms
1. **Position Conversion**
```python
def word_to_char_span(text, start_ix, end_ix):
    """Convert word-level position to character-level range"""
```

2. **Relation Reconstruction**
```python
def find_relations_with_entities(entities, entities_data):
    """Rebuild relations based on entity text matching"""
```

## TODO
1. [ ] Add data export functionality
2. [ ] Named Entity Recognition
   1. [ ] 增加输入框
   2. [ ] 调用llms
3. [ ] Relation Extraction
   1. [ ] Add relation editing functionality
4. [ ] 数据在哪里
   1. [ ] 从某个地方读取,git上
   2. [ ] 存到某个地方,存有点麻烦(commit吗)

## Contributing
Welcome to contribute through:
1. Submit Issues for bug reports or suggestions
2. Submit Pull Requests to improve code
3. Improve documentation and comments

## License
MIT License