Spaces:
Sleeping
Sleeping
File size: 4,837 Bytes
2999286 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
#!/usr/bin/env python # coding: utf-8 # # Timing-associated geneset analysis with cellfategenie # # In our single-cell analysis, we analyse the underlying temporal state in the cell, which we call pseudotime. and identifying the genes associated with pseudotime becomes the key to unravelling models of gene dynamic regulation. In traditional analysis, we would use correlation coefficients, or gene dynamics model fitting. The correlation coefficient approach will have a preference for genes at the beginning and end of the time series, and the gene dynamics model requires RNA velocity information. Unbiased identification of chronosequence-related genes, as well as the need for no additional dependency information, has become a challenge in current chronosequence analyses. # # Here, we developed CellFateGenie, which first removes potential noise from the data through metacells, and then constructs an adaptive ridge regression model to find the minimum set of genes needed to satisfy the timing fit.CellFateGenie has similar accuracy to gene dynamics models while eliminating preferences for the start and end of the time series. # # We provided the AUCell to evaluate the geneset of adata # # Colab_Reproducibility:https://colab.research.google.com/drive/1upcKKZHsZMS78eOliwRAddbaZ9ICXSrc?usp=sharing # In[ ]: import omicverse as ov import scvelo as scv import matplotlib.pyplot as plt ov.ov_plot_set() # ## Data preprocessed # # We using dataset of dentategyrus in scvelo to demonstrate the timing-associated genes analysis. Firstly, We use `ov.pp.qc` and `ov.pp.preprocess` to preprocess the dataset. # # Then we use `ov.pp.scale` and `ov.pp.pca` to analysis the principal component of the data # In[2]: adata=ov.read('data/tutorial_meta_den.h5ad') adata=adata.raw.to_adata() adata # ## Genesets evaluata # In[3]: import omicverse as ov pathway_dict=ov.utils.geneset_prepare('../placenta/genesets/GO_Biological_Process_2021.txt',organism='Mouse') len(pathway_dict.keys()) # In[ ]: ##Assest all pathways adata_aucs=ov.single.pathway_aucell_enrichment(adata, pathways_dict=pathway_dict, num_workers=8) # In[11]: adata_aucs.obs=adata[adata_aucs.obs.index].obs adata_aucs.obsm=adata[adata_aucs.obs.index].obsm adata_aucs.obsp=adata[adata_aucs.obs.index].obsp adata_aucs.uns=adata[adata_aucs.obs.index].uns adata_aucs # ## Timing-associated genes analysis # # We have encapsulated the cellfategenie algorithm into omicverse, and we can simply use omicverse to analysis. # In[12]: cfg_obj=ov.single.cellfategenie(adata_aucs,pseudotime='pt_via') cfg_obj.model_init() # We used Adaptive Threshold Regression to calculate the minimum number of gene sets that would have the same accuracy as the regression model constructed for all genes. # In[13]: cfg_obj.ATR(stop=500) # In[14]: fig,ax=cfg_obj.plot_filtering(color='#5ca8dc') ax.set_title('Dentategyrus Metacells\nCellFateGenie') # In[15]: res=cfg_obj.model_fit() # ## Visualization # # We prepared a series of function to visualize the result. we can use `plot_color_fitting` to observe the different cells how to transit with the pseudotime. # In[16]: cfg_obj.plot_color_fitting(type='raw',cluster_key='celltype') # In[17]: cfg_obj.plot_color_fitting(type='filter',cluster_key='celltype') # ## Kendalltau test # # We can further narrow down the set of genes that satisfy the maximum regression coefficient. We used the kendalltau test to calculate the trend significance for each gene. # In[18]: kt_filter=cfg_obj.kendalltau_filter() kt_filter.head() # In[21]: var_name=kt_filter.loc[kt_filter['pvalue']<kt_filter['pvalue'].mean()].index.tolist() gt_obj=ov.single.gene_trends(adata_aucs,'pt_via',var_name) gt_obj.calculate(n_convolve=10) # In[22]: print(f"Dimension: {len(var_name)}") # In[23]: fig,ax=gt_obj.plot_trend(color=ov.utils.blue_color[3]) ax.set_title(f'Dentategyrus meta\nCellfategenie',fontsize=13) # In[25]: g=ov.utils.plot_heatmap(adata_aucs,var_names=var_name, sortby='pt_via',col_color='celltype', n_convolve=10,figsize=(1,6),show=False) g.fig.set_size_inches(2, 6) g.fig.suptitle('CellFateGenie',x=0.25,y=0.83, horizontalalignment='left',fontsize=12,fontweight='bold') g.ax_heatmap.set_yticklabels(g.ax_heatmap.get_yticklabels(),fontsize=12) plt.show() # In[32]: gw_obj1=ov.utils.geneset_wordcloud(adata=adata_aucs[:,var_name], cluster_key='celltype',pseudotime='pt_via',figsize=(3,6)) gw_obj1.get() # In[33]: g=gw_obj1.plot_heatmap(figwidth=6,cmap='RdBu_r') plt.suptitle('CellFateGenie',x=0.18,y=0.95, horizontalalignment='left',fontsize=12,fontweight='bold') # In[ ]: |