#!/usr/bin/env python # coding: utf-8 # # Celltype auto annotation with MetaTiME # # MetaTiME learns data-driven, interpretable, and reproducible gene programs by integrating millions of single cells from hundreds of tumor scRNA-seq data. The idea is to learn a map of single-cell space with biologically meaningful directions from large-scale data, which helps understand functional cell states and transfers knowledge to new data analysis. MetaTiME provides pretrained meta-components (MeCs) to automatically annotate fine-grained cell states and plot signature continuum for new single-cells of tumor microenvironment. # # Here, we integrate MetaTiME in omicverse. This tutorial demonstrates how to use [MetaTiME (original code)](https://github.com/yi-zhang/MetaTiME/blob/main/docs/notebooks/metatime_annotator.ipynb) to annotate celltype in TME # # Paper: [MetaTiME integrates single-cell gene expression to characterize the meta-components of the tumor immune microenvironment](https://www.nature.com/articles/s41467-023-38333-8) # # Code: https://github.com/yi-zhang/MetaTiME # # Colab_Reproducibility:https://colab.research.google.com/drive/1isvjTfSFM2cy6GzHWAwbuvSjveEJijzP?usp=sharing # # ![metatime](https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41467-023-38333-8/MediaObjects/41467_2023_38333_Fig1_HTML.png) # In[1]: import omicverse as ov ov.utils.ov_plot_set() # ## Data normalize and Batch remove # # The sample data has multiple patients , and we can use batch correction on patients. Here, we using [scVI](https://docs.scvi-tools.org/en/stable/) to remove batch. # #
#

Note

#

# If your data contains count matrix, we provide a wrapped function for pre-processing the data. Otherwise, if the data is already depth-normalized, log-transformed, and cells are filtered, we can skip this step. #

#
# In[ ]: ''' import scvi scvi.model.SCVI.setup_anndata(adata, layer="counts", batch_key="patient") vae = scvi.model.SCVI(adata, n_layers=2, n_latent=30, gene_likelihood="nb") vae.train() adata.obsm["X_scVI"] = vae.get_latent_representation() ''' # Example data can be obtained from figshare: https://figshare.com/ndownloader/files/41440050 # In[2]: import scanpy as sc adata=sc.read('TiME_adata_scvi.h5ad') adata # It is recommended that malignant cells are identified first and removed for best practice in cell state annotation. # # In the BCC data, the cluster of malignant cells are identified with `inferCNV`. We can use the pre-saved column 'isTME' to keep Tumor Microenvironment cells. # # These are the authors' exact words, but tests have found that the difference in annotation effect is not that great even without removing the malignant cells # # But I think this step is not necessary # In[3]: #adata = adata[adata.obs['isTME']] # ## Neighborhood graph calculated # # We note that scVI was used earlier to remove the batch effect from the data, so we need to recalculate the neighbourhood map based on what is stored in `adata.obsm['X_scVI']`. Note that if you are not using scVI but using another method to calculate the neighbourhood map, such as `X_pca`, then you need to change `X_scVI` to `X_pca` to complete the calculation # # ``` # #Example # #sc.tl.pca(adata) # #sc.pp.neighbors(adata, use_rep="X_pca") # ``` # In[4]: sc.pp.neighbors(adata, use_rep="X_scVI") # To visualize the PCA’s embeddings, we use the `pymde` package wrapper in omicverse. This is an alternative to UMAP that is GPU-accelerated. # In[5]: adata.obsm["X_mde"] = ov.utils.mde(adata.obsm["X_scVI"]) # In[6]: sc.pl.embedding( adata, basis="X_mde", color=["patient"], frameon=False, ncols=1, ) # In[7]: #adata.write_h5ad('adata_mde.h5ad',compression='gzip') #adata=sc.read('adata_mde.h5ad') # ## MeteTiME model init # # Next, let's load the pre-computed MetaTiME MetaComponents (MeCs), and their functional annotation. # In[8]: TiME_object=ov.single.MetaTiME(adata,mode='table') # We can over-cluster the cells which is useful for fine-grained cell state annotation. # # As the resolution gets larger, the number of clusters gets larger # In[9]: TiME_object.overcluster(resolution=8,clustercol = 'overcluster',) # ## TME celltype predicted # # We using `TiME_object.predictTiME()` to predicted the latent celltype in TME. # # - The minor celltype will be stored in `adata.obs['MetaTiME']` # - The major celltype will be stored in `adata.obs['Major_MetaTiME']` # In[10]: TiME_object.predictTiME(save_obs_name='MetaTiME') # ## Visualize # # The original author provides a drawing function that effectively avoids overlapping labels. Here I have expanded its parameters so that it can be visualised using parameters other than X_umap # In[13]: fig,ax=TiME_object.plot(cluster_key='MetaTiME',basis='X_mde',dpi=80) #fig.save # We can also use `sc.pl.embedding` to visualize the celltype # In[15]: sc.pl.embedding( adata, basis="X_mde", color=["Major_MetaTiME"], frameon=False, ncols=1, ) # In[ ]: