Spaces:
Sleeping
Sleeping
File size: 4,959 Bytes
2999286 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
#!/usr/bin/env python # coding: utf-8 # # Celltype annotation transfer in multi-omics # # In the field of multi-omics research, transferring cell type annotations from one data modality to another is a crucial step. For instance, when annotating cell types in single-cell ATAC sequencing (scATAC-seq) data, it's often desirable to leverage the cell type labels already annotated in single-cell RNA sequencing (scRNA-seq) data. This process involves integrating information from both scRNA-seq and scATAC-seq data modalities. # # GLUE is a prominent algorithm used for cross-modality integration, allowing researchers to combine data from different omics modalities effectively. However, GLUE does not inherently provide a method for transferring cell type labels from scRNA-seq to scATAC-seq data. To address this limitation, an approach was implemented in the omicverse platform using K-nearest neighbor (KNN) graphs. # # The KNN graph-based approach likely involves constructing KNN graphs separately for scRNA-seq and scATAC-seq data. In these graphs, each cell is connected to its K nearest neighbors based on certain similarity metrics, which could be calculated using gene expression profiles in scRNA-seq and accessibility profiles in scATAC-seq. Once these graphs are constructed, the idea is to transfer the cell type labels from the scRNA-seq side to the scATAC-seq side by assigning labels to scATAC-seq cells based on the labels of their KNN neighbors in the scRNA-seq graph. # # Colab_Reproducibility:https://colab.research.google.com/drive/1aIMmSgyIw-PGjJ65WvMgz4Ob3EtoK_UV?usp=sharing # In[3]: import omicverse as ov import matplotlib.pyplot as plt import scanpy as sc ov.ov_plot_set() # ## Loading the data preprocessed with GLUE # # Here, we use two output files from the GLUE cross-modal integration, and their common feature is that they both have the `obsm['X_glue']` layer. And the rna have been annotated. # In[4]: rna=sc.read("data/analysis_lymph/rna-emb.h5ad") atac=sc.read("data/analysis_lymph/atac-emb.h5ad") # We can visualize the intergrated effect of GLUE with UMAP # In[5]: import scanpy as sc combined=sc.concat([rna,atac],merge='same') combined # In[6]: combined.obsm['X_mde']=ov.utils.mde(combined.obsm['X_glue']) # We can see that the two layers are correctly aligned # In[8]: ov.utils.embedding(combined, basis='X_mde', color='domain', title='Layers', show=False, palette=ov.utils.red_color, frameon='small' ) # And the RNA modality has an already annotated cell type label on it # In[22]: ov.utils.embedding(rna, basis='X_mde', color='major_celltype', title='Cell type', show=False, #palette=ov.utils.red_color, frameon='small' ) # ## Celltype transfer # # We train a knn nearest neighbour classifier using `X_glue` features # In[13]: knn_transformer=ov.utils.weighted_knn_trainer( train_adata=rna, train_adata_emb='X_glue', n_neighbors=15, ) # In[14]: labels,uncert=ov.utils.weighted_knn_transfer( query_adata=atac, query_adata_emb='X_glue', label_keys='major_celltype', knn_model=knn_transformer, ref_adata_obs=rna.obs, ) # We migrate the training results of the KNN classifier to atac. `unc` stands for uncertainty, with higher uncertainty demonstrating lower migration accuracy, suggesting that the cell in question may be a double-fate signature or some other type of cell. # In[15]: atac.obs["transf_celltype"]=labels.loc[atac.obs.index,"major_celltype"] atac.obs["transf_celltype_unc"]=uncert.loc[atac.obs.index,"major_celltype"] # In[24]: atac.obs["major_celltype"]=atac.obs["transf_celltype"].copy() # In[27]: ov.utils.embedding(atac, basis='X_umap', color=['transf_celltype_unc','transf_celltype'], #title='Cell type Un', show=False, palette=ov.palette()[11:], frameon='small' ) # ## Visualization # # We can merge atac and rna after migration annotation and observe on the umap plot whether the cell types are consistent after merging the modalities. # In[28]: import scanpy as sc combined1=sc.concat([rna,atac],merge='same') combined1 # In[29]: combined1.obsm['X_mde']=ov.utils.mde(combined1.obsm['X_glue']) # We found that the annotation was better, suggesting that the KNN nearest-neighbour classifier we constructed can effectively migrate cell type labels from RNA to ATAC. # In[31]: ov.utils.embedding(combined1, basis='X_mde', color=['domain','major_celltype'], title=['Layers','Cell type'], show=False, palette=ov.palette()[11:], frameon='small' ) # In[ ]: |