Spaces:

hanbinChen
/

medKGC

Sleeping

App Files Files Community

medKGC / README.md

hanbinChen

update

ab3a660 4 months ago

preview code

raw

history blame contribute delete

3.39 kB

	---
	title: MedKGC
	emoji: 🐠
	colorFrom: red
	colorTo: red
	sdk: streamlit
	sdk_version: 1.39.0
	app_file: app.py
	pinned: false
	---

	# Medical Knowledge Graph Construction (medKGC)

	## Overview
	A automated annotation tool using LLMs to help medical annotators annotate the input radiology reports.

	这个工具涉及了Named Entity Recognition，relation extraction, named entity normalization，最终结果会以知识图谱的形式输出。

	medKGC is a medical text knowledge graph construction and review system. It supports entity recognition, relation extraction, and visualization of medical reports, providing a convenient review interface.

	## Deployment

	### Installation
	1. Create conda environment
	```bash
	conda create -n medkgc python=3.10
	conda activate medkgc
	```

	2. Install dependencies
	```bash
	pip install -r requirements.txt
	```

	3. Run application
	```bash
	streamlit run app.py
	```




	## Core Features

	### 1. Data Processing
	- Position Conversion: Support word-level and char-level position conversion
	- Entity Conversion: Convert between JSON format and Selection objects
	- Relation Extraction: Entity ID-based relation mapping and reconstruction

	### 2. Entity Annotation
	- Label Types:
	- OBS-DP: Observation definitely present (Red)
	- ANAT-DP: Anatomy definitely present (Cyan)
	- OBS-U: Observation uncertain (Yellow)
	- OBS-DA: Observation definitely absent (Gray)
	- Interactive Annotation: Support entity selection and annotation

	### 3. Relation Visualization
	- Node Merging: Automatically merge entities with same text
	- Color Coding: Different colors for different entity types
	- Dynamic Updates: Support real-time graph updates

	### 4. Review Process
	- Report Selection: Display pending and reviewed reports separately
	- Status Saving: Automatically save review status and modifications
	- Batch Processing: Support continuous review of multiple reports

	## Technical Implementation

	### Data Structures
	1. Entity Data
	```json
	{
	"entities": {
	"1": {
	"tokens": "entity text",
	"label": "entity type",
	"start_ix": "word-level start position",
	"end_ix": "word-level end position",
	"relations": [["relation type", "target entity ID"]]
	}
	}
	}
	```

	2. Selection Object
	```python
	@dataclass
	class Selection:
	start: int # char-level start position
	end: int # char-level end position
	text: str # entity text
	labels: List[str] # entity type list
	```

	### Core Algorithms
	1. Position Conversion
	```python
	def word_to_char_span(text, start_ix, end_ix):
	"""Convert word-level position to character-level range"""
	```

	2. Relation Reconstruction
	```python
	def find_relations_with_entities(entities, entities_data):
	"""Rebuild relations based on entity text matching"""
	```

	## TODO
	1. [ ] Add data export functionality
	2. [ ] Named Entity Recognition
	1. [ ] 增加输入框
	2. [ ] 调用llms
	3. [ ] Relation Extraction
	1. [ ] Add relation editing functionality
	4. [ ] 数据在哪里
	1. [ ] 从某个地方读取，git上
	2. [ ] 存到某个地方，存有点麻烦（commit吗）

	## Contributing
	Welcome to contribute through:
	1. Submit Issues for bug reports or suggestions
	2. Submit Pull Requests to improve code
	3. Improve documentation and comments

	## License
	MIT License