File size: 4,959 Bytes
2999286
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
#!/usr/bin/env python
# coding: utf-8

# # Celltype annotation transfer in multi-omics
# 
# In the field of multi-omics research, transferring cell type annotations from one data modality to another is a crucial step. For instance, when annotating cell types in single-cell ATAC sequencing (scATAC-seq) data, it's often desirable to leverage the cell type labels already annotated in single-cell RNA sequencing (scRNA-seq) data. This process involves integrating information from both scRNA-seq and scATAC-seq data modalities.
# 
# GLUE is a prominent algorithm used for cross-modality integration, allowing researchers to combine data from different omics modalities effectively. However, GLUE does not inherently provide a method for transferring cell type labels from scRNA-seq to scATAC-seq data. To address this limitation, an approach was implemented in the omicverse platform using K-nearest neighbor (KNN) graphs.
# 
# The KNN graph-based approach likely involves constructing KNN graphs separately for scRNA-seq and scATAC-seq data. In these graphs, each cell is connected to its K nearest neighbors based on certain similarity metrics, which could be calculated using gene expression profiles in scRNA-seq and accessibility profiles in scATAC-seq. Once these graphs are constructed, the idea is to transfer the cell type labels from the scRNA-seq side to the scATAC-seq side by assigning labels to scATAC-seq cells based on the labels of their KNN neighbors in the scRNA-seq graph.
# 
# Colab_Reproducibility:https://colab.research.google.com/drive/1aIMmSgyIw-PGjJ65WvMgz4Ob3EtoK_UV?usp=sharing

# In[3]:


import omicverse as ov
import matplotlib.pyplot as plt
import scanpy as sc
ov.ov_plot_set()


# ## Loading the data preprocessed with GLUE
# 
# Here, we use two output files from the GLUE cross-modal integration, and their common feature is that they both have the `obsm['X_glue']` layer. And the rna have been annotated.

# In[4]:


rna=sc.read("data/analysis_lymph/rna-emb.h5ad")
atac=sc.read("data/analysis_lymph/atac-emb.h5ad")


# We can visualize the intergrated effect of GLUE with UMAP

# In[5]:


import scanpy as sc
combined=sc.concat([rna,atac],merge='same')
combined


# In[6]:


combined.obsm['X_mde']=ov.utils.mde(combined.obsm['X_glue'])


# We can see that the two layers are correctly aligned

# In[8]:


ov.utils.embedding(combined,
               basis='X_mde',
               color='domain',
                title='Layers',
                show=False,
                palette=ov.utils.red_color,
                frameon='small'
               )


# And the RNA modality has an already annotated cell type label on it

# In[22]:


ov.utils.embedding(rna,
               basis='X_mde',
               color='major_celltype',
                title='Cell type',
                show=False,
                #palette=ov.utils.red_color,
                frameon='small'
               )


# ## Celltype transfer
# 
# We train a knn nearest neighbour classifier using `X_glue` features

# In[13]:


knn_transformer=ov.utils.weighted_knn_trainer(
    train_adata=rna,
    train_adata_emb='X_glue',
    n_neighbors=15,
)


# In[14]:


labels,uncert=ov.utils.weighted_knn_transfer(
    query_adata=atac,
    query_adata_emb='X_glue',
    label_keys='major_celltype',
    knn_model=knn_transformer,
    ref_adata_obs=rna.obs,
)


# We migrate the training results of the KNN classifier to atac. `unc` stands for uncertainty, with higher uncertainty demonstrating lower migration accuracy, suggesting that the cell in question may be a double-fate signature or some other type of cell.

# In[15]:


atac.obs["transf_celltype"]=labels.loc[atac.obs.index,"major_celltype"]
atac.obs["transf_celltype_unc"]=uncert.loc[atac.obs.index,"major_celltype"]


# In[24]:


atac.obs["major_celltype"]=atac.obs["transf_celltype"].copy()


# In[27]:


ov.utils.embedding(atac,
               basis='X_umap',
               color=['transf_celltype_unc','transf_celltype'],
                #title='Cell type Un',
                show=False,
                palette=ov.palette()[11:],
                frameon='small'
               )


# ## Visualization
# 
# We can merge atac and rna after migration annotation and observe on the umap plot whether the cell types are consistent after merging the modalities.

# In[28]:


import scanpy as sc
combined1=sc.concat([rna,atac],merge='same')
combined1


# In[29]:


combined1.obsm['X_mde']=ov.utils.mde(combined1.obsm['X_glue'])


# We found that the annotation was better, suggesting that the KNN nearest-neighbour classifier we constructed can effectively migrate cell type labels from RNA to ATAC.

# In[31]:


ov.utils.embedding(combined1,
               basis='X_mde',
               color=['domain','major_celltype'],
                title=['Layers','Cell type'],
                show=False,
                palette=ov.palette()[11:],
                frameon='small'
               )


# In[ ]: