Spaces:
Sleeping
Sleeping
File size: 3,233 Bytes
6191890 1d133b0 6191890 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
import streamlit as st
import base64
st.sidebar.markdown('''
# Sections
- [How it works](#how-it-works)
- [Schematic overview of ProtHGT](#schematic-overview)
''', unsafe_allow_html=True)
st.markdown('''
## ProtHGT: Heterogeneous Graph Transformers for Automated Protein Function Prediction Using Knowledge Graphs and Language Models
''')
st.markdown(
"""
[](https://github.com/HUBioDataLab/ProtHGT)
""")
st.markdown('<p style="font-size:18px; font-weight:bold">Developers: Erva Ulusoy, Tunca Dogan</p>', unsafe_allow_html=True)
st.subheader('How it works', anchor='how-it-works')
st.markdown(
"""
ProtHGT is a **heterogeneous graph transformer-based model** for automated protein function prediction. It integrates diverse biological data—proteins, pathways, domains, GO terms, and more—into **a unified knowledge graph**, enabling accurate and interpretable predictions.
Using transformer-based message passing, ProtHGT models complex biological relationships by propagating information across proteins and their functional associations in a structured graph. The model represents proteins using initial embeddings from **advanced protein language models (e.g., TAPE, ProtT5)** while integrating contextual information from pathways, domains, and molecular interactions. By employing knowledge **graph attention mechanisms**, ProtHGT learns to prioritize the most biologically relevant connections, improving prediction accuracy and interpretability.
ProtHGT outperforms existing sequence- and graph-based methods, as demonstrated in evaluations on CAFA3 and DeepHGAT datasets. By incorporating a broader biological context through its knowledge graph, **the model improves function prediction across all Gene Ontology (GO) sub-ontologies**. Additionally, its attention-based framework allows researchers to trace predictions back to key contributing relationships in the graph, making it possible to explore new functional links, validate known annotations, and generate testable biological hypotheses.
Overall workflow of ProtHGT is shown below.
""")
st.subheader('Schematic overview of ProtHGT', anchor='schematic-overview')
st.image('figures/ProtHGT_workflow.png')
st.markdown(
'<p style="text-align:center"><em><strong>Schematic representation of the ProtHGT framework. a)</strong> Diverse biological datasets, including proteins, pathways, domains, and GO terms, are integrated into a unified knowledge graph; <strong>b)</strong> the heterogeneous graph is constructed, capturing multi-relational biological associations; <strong>c)</strong> feature vectors for each node type are generated using state-of-the-art embedding methods; <strong>d)</strong> protein function prediction models are trained separately for molecular function, biological process, and cellular component sub-ontologies; <strong>e)</strong> heterogeneous graph transformer (HGT) layers process and refine node representations through multi-relational message passing. Final protein function predictions are obtained by linking proteins to GO terms based on learned embeddings and attention-weighted relationships.</em></p>', unsafe_allow_html=True)
|