#!/usr/bin/env python
# coding: utf-8
# # Celltype auto annotation with MetaTiME
#
# MetaTiME learns data-driven, interpretable, and reproducible gene programs by integrating millions of single cells from hundreds of tumor scRNA-seq data. The idea is to learn a map of single-cell space with biologically meaningful directions from large-scale data, which helps understand functional cell states and transfers knowledge to new data analysis. MetaTiME provides pretrained meta-components (MeCs) to automatically annotate fine-grained cell states and plot signature continuum for new single-cells of tumor microenvironment.
#
# Here, we integrate MetaTiME in omicverse. This tutorial demonstrates how to use [MetaTiME (original code)](https://github.com/yi-zhang/MetaTiME/blob/main/docs/notebooks/metatime_annotator.ipynb) to annotate celltype in TME
#
# Paper: [MetaTiME integrates single-cell gene expression to characterize the meta-components of the tumor immune microenvironment](https://www.nature.com/articles/s41467-023-38333-8)
#
# Code: https://github.com/yi-zhang/MetaTiME
#
# Colab_Reproducibility:https://colab.research.google.com/drive/1isvjTfSFM2cy6GzHWAwbuvSjveEJijzP?usp=sharing
#
# ![metatime](https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41467-023-38333-8/MediaObjects/41467_2023_38333_Fig1_HTML.png)
# In[1]:
import omicverse as ov
ov.utils.ov_plot_set()
# ## Data normalize and Batch remove
#
# The sample data has multiple patients , and we can use batch correction on patients. Here, we using [scVI](https://docs.scvi-tools.org/en/stable/) to remove batch.
#
#
#
Note
#
# If your data contains count matrix, we provide a wrapped function for pre-processing the data. Otherwise, if the data is already depth-normalized, log-transformed, and cells are filtered, we can skip this step.
#
#
# In[ ]:
'''
import scvi
scvi.model.SCVI.setup_anndata(adata, layer="counts", batch_key="patient")
vae = scvi.model.SCVI(adata, n_layers=2, n_latent=30, gene_likelihood="nb")
vae.train()
adata.obsm["X_scVI"] = vae.get_latent_representation()
'''
# Example data can be obtained from figshare: https://figshare.com/ndownloader/files/41440050
# In[2]:
import scanpy as sc
adata=sc.read('TiME_adata_scvi.h5ad')
adata
# It is recommended that malignant cells are identified first and removed for best practice in cell state annotation.
#
# In the BCC data, the cluster of malignant cells are identified with `inferCNV`. We can use the pre-saved column 'isTME' to keep Tumor Microenvironment cells.
#
# These are the authors' exact words, but tests have found that the difference in annotation effect is not that great even without removing the malignant cells
#
# But I think this step is not necessary
# In[3]:
#adata = adata[adata.obs['isTME']]
# ## Neighborhood graph calculated
#
# We note that scVI was used earlier to remove the batch effect from the data, so we need to recalculate the neighbourhood map based on what is stored in `adata.obsm['X_scVI']`. Note that if you are not using scVI but using another method to calculate the neighbourhood map, such as `X_pca`, then you need to change `X_scVI` to `X_pca` to complete the calculation
#
# ```
# #Example
# #sc.tl.pca(adata)
# #sc.pp.neighbors(adata, use_rep="X_pca")
# ```
# In[4]:
sc.pp.neighbors(adata, use_rep="X_scVI")
# To visualize the PCA’s embeddings, we use the `pymde` package wrapper in omicverse. This is an alternative to UMAP that is GPU-accelerated.
# In[5]:
adata.obsm["X_mde"] = ov.utils.mde(adata.obsm["X_scVI"])
# In[6]:
sc.pl.embedding(
adata,
basis="X_mde",
color=["patient"],
frameon=False,
ncols=1,
)
# In[7]:
#adata.write_h5ad('adata_mde.h5ad',compression='gzip')
#adata=sc.read('adata_mde.h5ad')
# ## MeteTiME model init
#
# Next, let's load the pre-computed MetaTiME MetaComponents (MeCs), and their functional annotation.
# In[8]:
TiME_object=ov.single.MetaTiME(adata,mode='table')
# We can over-cluster the cells which is useful for fine-grained cell state annotation.
#
# As the resolution gets larger, the number of clusters gets larger
# In[9]:
TiME_object.overcluster(resolution=8,clustercol = 'overcluster',)
# ## TME celltype predicted
#
# We using `TiME_object.predictTiME()` to predicted the latent celltype in TME.
#
# - The minor celltype will be stored in `adata.obs['MetaTiME']`
# - The major celltype will be stored in `adata.obs['Major_MetaTiME']`
# In[10]:
TiME_object.predictTiME(save_obs_name='MetaTiME')
# ## Visualize
#
# The original author provides a drawing function that effectively avoids overlapping labels. Here I have expanded its parameters so that it can be visualised using parameters other than X_umap
# In[13]:
fig,ax=TiME_object.plot(cluster_key='MetaTiME',basis='X_mde',dpi=80)
#fig.save
# We can also use `sc.pl.embedding` to visualize the celltype
# In[15]:
sc.pl.embedding(
adata,
basis="X_mde",
color=["Major_MetaTiME"],
frameon=False,
ncols=1,
)
# In[ ]: