--- title: MedKGC emoji: 🐠 colorFrom: red colorTo: red sdk: streamlit sdk_version: 1.39.0 app_file: app.py pinned: false --- # Medical Knowledge Graph Construction (medKGC) ## Overview A automated annotation tool using LLMs to help medical annotators annotate the input radiology reports. 这个工具涉及了Named Entity Recognition,relation extraction, named entity normalization,最终结果会以知识图谱的形式输出。 medKGC is a medical text knowledge graph construction and review system. It supports entity recognition, relation extraction, and visualization of medical reports, providing a convenient review interface. ## Deployment ### Installation 1. Create conda environment ```bash conda create -n medkgc python=3.10 conda activate medkgc ``` 2. Install dependencies ```bash pip install -r requirements.txt ``` 3. Run application ```bash streamlit run app.py ``` ## Core Features ### 1. Data Processing - **Position Conversion**: Support word-level and char-level position conversion - **Entity Conversion**: Convert between JSON format and Selection objects - **Relation Extraction**: Entity ID-based relation mapping and reconstruction ### 2. Entity Annotation - **Label Types**: - OBS-DP: Observation definitely present (Red) - ANAT-DP: Anatomy definitely present (Cyan) - OBS-U: Observation uncertain (Yellow) - OBS-DA: Observation definitely absent (Gray) - **Interactive Annotation**: Support entity selection and annotation ### 3. Relation Visualization - **Node Merging**: Automatically merge entities with same text - **Color Coding**: Different colors for different entity types - **Dynamic Updates**: Support real-time graph updates ### 4. Review Process - **Report Selection**: Display pending and reviewed reports separately - **Status Saving**: Automatically save review status and modifications - **Batch Processing**: Support continuous review of multiple reports ## Technical Implementation ### Data Structures 1. **Entity Data** ```json { "entities": { "1": { "tokens": "entity text", "label": "entity type", "start_ix": "word-level start position", "end_ix": "word-level end position", "relations": [["relation type", "target entity ID"]] } } } ``` 2. **Selection Object** ```python @dataclass class Selection: start: int # char-level start position end: int # char-level end position text: str # entity text labels: List[str] # entity type list ``` ### Core Algorithms 1. **Position Conversion** ```python def word_to_char_span(text, start_ix, end_ix): """Convert word-level position to character-level range""" ``` 2. **Relation Reconstruction** ```python def find_relations_with_entities(entities, entities_data): """Rebuild relations based on entity text matching""" ``` ## TODO 1. [ ] Add data export functionality 2. [ ] Named Entity Recognition 1. [ ] 增加输入框 2. [ ] 调用llms 3. [ ] Relation Extraction 1. [ ] Add relation editing functionality 4. [ ] 数据在哪里 1. [ ] 从某个地方读取,git上 2. [ ] 存到某个地方,存有点麻烦(commit吗) ## Contributing Welcome to contribute through: 1. Submit Issues for bug reports or suggestions 2. Submit Pull Requests to improve code 3. Improve documentation and comments ## License MIT License