File size: 4,837 Bytes
2999286
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
#!/usr/bin/env python
# coding: utf-8

# # Timing-associated geneset analysis with cellfategenie
# 
# In our single-cell analysis, we analyse the underlying temporal state in the cell, which we call pseudotime. and identifying the genes associated with pseudotime becomes the key to unravelling models of gene dynamic regulation. In traditional analysis, we would use correlation coefficients, or gene dynamics model fitting. The correlation coefficient approach will have a preference for genes at the beginning and end of the time series, and the gene dynamics model requires RNA velocity information. Unbiased identification of chronosequence-related genes, as well as the need for no additional dependency information, has become a challenge in current chronosequence analyses.
# 
# Here, we developed CellFateGenie, which first removes potential noise from the data through metacells, and then constructs an adaptive ridge regression model to find the minimum set of genes needed to satisfy the timing fit.CellFateGenie has similar accuracy to gene dynamics models while eliminating preferences for the start and end of the time series.
# 
# We provided the AUCell to evaluate the geneset of adata
# 
# Colab_Reproducibility:https://colab.research.google.com/drive/1upcKKZHsZMS78eOliwRAddbaZ9ICXSrc?usp=sharing

# In[ ]:


import omicverse as ov
import scvelo as scv
import matplotlib.pyplot as plt
ov.ov_plot_set()


# ## Data preprocessed
# 
# We using dataset of dentategyrus in scvelo to demonstrate the timing-associated genes analysis. Firstly, We use `ov.pp.qc` and `ov.pp.preprocess` to preprocess the dataset.
# 
# Then we use `ov.pp.scale` and `ov.pp.pca` to analysis the principal component of the data

# In[2]:


adata=ov.read('data/tutorial_meta_den.h5ad')
adata=adata.raw.to_adata()
adata


# ## Genesets evaluata

# In[3]:


import omicverse as ov
pathway_dict=ov.utils.geneset_prepare('../placenta/genesets/GO_Biological_Process_2021.txt',organism='Mouse')
len(pathway_dict.keys())


# In[ ]:


##Assest all pathways
adata_aucs=ov.single.pathway_aucell_enrichment(adata,
                                                pathways_dict=pathway_dict,
                                                num_workers=8)


# In[11]:


adata_aucs.obs=adata[adata_aucs.obs.index].obs
adata_aucs.obsm=adata[adata_aucs.obs.index].obsm
adata_aucs.obsp=adata[adata_aucs.obs.index].obsp
adata_aucs.uns=adata[adata_aucs.obs.index].uns

adata_aucs


# ## Timing-associated genes analysis
# 
# We have encapsulated the cellfategenie algorithm into omicverse, and we can simply use omicverse to analysis.

# In[12]:


cfg_obj=ov.single.cellfategenie(adata_aucs,pseudotime='pt_via')
cfg_obj.model_init()


# We used Adaptive Threshold Regression to calculate the minimum number of gene sets that would have the same accuracy as the regression model constructed for all genes.

# In[13]:


cfg_obj.ATR(stop=500)


# In[14]:


fig,ax=cfg_obj.plot_filtering(color='#5ca8dc')
ax.set_title('Dentategyrus Metacells\nCellFateGenie')


# In[15]:


res=cfg_obj.model_fit()


# ## Visualization
# 
# We prepared a series of function to visualize the result. we can use `plot_color_fitting` to observe the different cells how to transit with the pseudotime.

# In[16]:


cfg_obj.plot_color_fitting(type='raw',cluster_key='celltype')


# In[17]:


cfg_obj.plot_color_fitting(type='filter',cluster_key='celltype')


# ## Kendalltau test
# 
# We can further narrow down the set of genes that satisfy the maximum regression coefficient. We used the kendalltau test to calculate the trend significance for each gene.

# In[18]:


kt_filter=cfg_obj.kendalltau_filter()
kt_filter.head()


# In[21]:


var_name=kt_filter.loc[kt_filter['pvalue']<kt_filter['pvalue'].mean()].index.tolist()
gt_obj=ov.single.gene_trends(adata_aucs,'pt_via',var_name)
gt_obj.calculate(n_convolve=10)


# In[22]:


print(f"Dimension: {len(var_name)}")


# In[23]:


fig,ax=gt_obj.plot_trend(color=ov.utils.blue_color[3])
ax.set_title(f'Dentategyrus meta\nCellfategenie',fontsize=13)


# In[25]:


g=ov.utils.plot_heatmap(adata_aucs,var_names=var_name,
                  sortby='pt_via',col_color='celltype',
                 n_convolve=10,figsize=(1,6),show=False)

g.fig.set_size_inches(2, 6)
g.fig.suptitle('CellFateGenie',x=0.25,y=0.83,
               horizontalalignment='left',fontsize=12,fontweight='bold')
g.ax_heatmap.set_yticklabels(g.ax_heatmap.get_yticklabels(),fontsize=12)
plt.show()


# In[32]:


gw_obj1=ov.utils.geneset_wordcloud(adata=adata_aucs[:,var_name],
                                  cluster_key='celltype',pseudotime='pt_via',figsize=(3,6))
gw_obj1.get()


# In[33]:


g=gw_obj1.plot_heatmap(figwidth=6,cmap='RdBu_r')
plt.suptitle('CellFateGenie',x=0.18,y=0.95,
               horizontalalignment='left',fontsize=12,fontweight='bold')


# In[ ]: