#!/usr/bin/env python # coding: utf-8 # # Timing-associated geneset analysis with cellfategenie # # In our single-cell analysis, we analyse the underlying temporal state in the cell, which we call pseudotime. and identifying the genes associated with pseudotime becomes the key to unravelling models of gene dynamic regulation. In traditional analysis, we would use correlation coefficients, or gene dynamics model fitting. The correlation coefficient approach will have a preference for genes at the beginning and end of the time series, and the gene dynamics model requires RNA velocity information. Unbiased identification of chronosequence-related genes, as well as the need for no additional dependency information, has become a challenge in current chronosequence analyses. # # Here, we developed CellFateGenie, which first removes potential noise from the data through metacells, and then constructs an adaptive ridge regression model to find the minimum set of genes needed to satisfy the timing fit.CellFateGenie has similar accuracy to gene dynamics models while eliminating preferences for the start and end of the time series. # # We provided the AUCell to evaluate the geneset of adata # # Colab_Reproducibility:https://colab.research.google.com/drive/1upcKKZHsZMS78eOliwRAddbaZ9ICXSrc?usp=sharing # In[ ]: import omicverse as ov import scvelo as scv import matplotlib.pyplot as plt ov.ov_plot_set() # ## Data preprocessed # # We using dataset of dentategyrus in scvelo to demonstrate the timing-associated genes analysis. Firstly, We use `ov.pp.qc` and `ov.pp.preprocess` to preprocess the dataset. # # Then we use `ov.pp.scale` and `ov.pp.pca` to analysis the principal component of the data # In[2]: adata=ov.read('data/tutorial_meta_den.h5ad') adata=adata.raw.to_adata() adata # ## Genesets evaluata # In[3]: import omicverse as ov pathway_dict=ov.utils.geneset_prepare('../placenta/genesets/GO_Biological_Process_2021.txt',organism='Mouse') len(pathway_dict.keys()) # In[ ]: ##Assest all pathways adata_aucs=ov.single.pathway_aucell_enrichment(adata, pathways_dict=pathway_dict, num_workers=8) # In[11]: adata_aucs.obs=adata[adata_aucs.obs.index].obs adata_aucs.obsm=adata[adata_aucs.obs.index].obsm adata_aucs.obsp=adata[adata_aucs.obs.index].obsp adata_aucs.uns=adata[adata_aucs.obs.index].uns adata_aucs # ## Timing-associated genes analysis # # We have encapsulated the cellfategenie algorithm into omicverse, and we can simply use omicverse to analysis. # In[12]: cfg_obj=ov.single.cellfategenie(adata_aucs,pseudotime='pt_via') cfg_obj.model_init() # We used Adaptive Threshold Regression to calculate the minimum number of gene sets that would have the same accuracy as the regression model constructed for all genes. # In[13]: cfg_obj.ATR(stop=500) # In[14]: fig,ax=cfg_obj.plot_filtering(color='#5ca8dc') ax.set_title('Dentategyrus Metacells\nCellFateGenie') # In[15]: res=cfg_obj.model_fit() # ## Visualization # # We prepared a series of function to visualize the result. we can use `plot_color_fitting` to observe the different cells how to transit with the pseudotime. # In[16]: cfg_obj.plot_color_fitting(type='raw',cluster_key='celltype') # In[17]: cfg_obj.plot_color_fitting(type='filter',cluster_key='celltype') # ## Kendalltau test # # We can further narrow down the set of genes that satisfy the maximum regression coefficient. We used the kendalltau test to calculate the trend significance for each gene. # In[18]: kt_filter=cfg_obj.kendalltau_filter() kt_filter.head() # In[21]: var_name=kt_filter.loc[kt_filter['pvalue']