canergen commited on
Commit
a3868d6
·
verified ·
1 Parent(s): ef70667

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +225 -0
README.md ADDED
@@ -0,0 +1,225 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: scvi-tools
3
+ license: cc-by-4.0
4
+ tags:
5
+ - biology
6
+ - genomics
7
+ - single-cell
8
+ - model_cls_name:TOTALVI
9
+ - scvi_version:1.2.0
10
+ - anndata_version:0.11.1
11
+ - modality:rna
12
+ - modality:protein
13
+ - tissue:thymus
14
+ - annotated:True
15
+ ---
16
+
17
+
18
+ TotalVI is a variational inference model for single-cell RNA-seq as well as protein data that can
19
+ learn an underlying latent space, integrate technical batches, impute dropouts,
20
+ and predict protein expression given gene expression or missing protein data given gene expression
21
+ and protein data for a subset of proteins.
22
+ The learned low-dimensional latent representation of the data can be used for visualization and
23
+ clustering.
24
+
25
+ TotalVI takes as input a scRNA-seq gene expression and protein expression matrix with cells and
26
+ genes.
27
+ We provide an extensive [user guide](https://docs.scvi-tools.org/en/1.2.0/user_guide/models/totalvi.html).
28
+
29
+ - See our original manuscript for further details of the model:
30
+ [TotalVI manuscript](https://www.nature.com/articles/s41592-020-01050-x).
31
+ - See our manuscript on [scvi-hub](https://www.biorxiv.org/content/10.1101/2024.03.01.582887v2)
32
+ how to leverage pre-trained models.
33
+
34
+ This model can be used for fine tuning on new data using our Arches framework:
35
+ [Arches tutorial](https://docs.scvi-tools.org/en/1.0.0/tutorials/notebooks/scarches_scvi_tools.html).
36
+
37
+
38
+ # Model Description
39
+
40
+ CITE-seq to measure RNA and surface proteins in thymocytes from wild-type and T cell lineage-restricted mice to generate a comprehensive timeline of cell state for each T cell lineage.
41
+
42
+ # Metrics
43
+
44
+ We provide here key performance metrics for the uploaded model, if provided by the data uploader.
45
+
46
+ <details>
47
+ <summary><strong>Coefficient of variation</strong></summary>
48
+
49
+ The cell-wise coefficient of variation summarizes how well variation between different cells is
50
+ preserved by the generated model expression. Below a squared Pearson correlation coefficient of 0.4
51
+ , we would recommend not to use generated data for downstream analysis, while the generated latent
52
+ space might still be useful for analysis.
53
+
54
+ **Cell-wise Coefficient of Variation**:
55
+
56
+ Modality: protein
57
+
58
+ | Metric | Training Value | Validation Value |
59
+ |-------------------------|----------------|------------------|
60
+ | Mean Absolute Error | 0.32 | 0.33 |
61
+ | Pearson Correlation | 0.52 | 0.51 |
62
+ | Spearman Correlation | 0.49 | 0.49 |
63
+ | R² (R-Squared) | -0.01 | -0.01 |
64
+
65
+
66
+
67
+ The gene-wise coefficient of variation summarizes how well variation between different genes is
68
+ preserved by the generated model expression. This value is usually quite high.
69
+
70
+ **Gene-wise Coefficient of Variation**:
71
+
72
+ Modality: protein
73
+
74
+ | Metric | Training Value |
75
+ |-------------------------|----------------|
76
+ | Mean Absolute Error | 0.32 |
77
+ | Pearson Correlation | 0.87 |
78
+ | Spearman Correlation | 0.95 |
79
+ | R² (R-Squared) | 0.16 |
80
+
81
+
82
+
83
+ </details>
84
+
85
+ <details>
86
+ <summary><strong>Differential expression metric</strong></summary>
87
+
88
+ The differential expression metric provides a summary of the differential expression analysis
89
+ between cell types or input clusters. We provide here the F1-score, Pearson Correlation
90
+ Coefficient of Log-Foldchanges, Spearman Correlation Coefficient, and Area Under the Precision
91
+ Recall Curve (AUPRC) for the differential expression analysis using Wilcoxon Rank Sum test for each
92
+ cell-type.
93
+
94
+ **Differential expression**:
95
+
96
+ Modality: protein
97
+
98
+ | Index | gene_f1 | lfc_mae | lfc_pearson | lfc_spearman | roc_auc | pr_auc | n_cells |
99
+ | --- | --- | --- | --- | --- | --- | --- | --- |
100
+ | DP (Q2) | 0.91 | 0.09 | 0.99 | 0.98 | 0.50 | 0.99 | 10864.00 |
101
+ | DP (Sig.) | 0.91 | 0.09 | 0.97 | 0.93 | 0.21 | 0.93 | 9824.00 |
102
+ | DP (Q1) | 1.00 | 0.08 | 0.99 | 0.98 | 0.61 | 0.98 | 8556.00 |
103
+ | Mature CD4 | 0.91 | 0.13 | 0.99 | 0.98 | 0.57 | 0.98 | 6525.00 |
104
+ | Immature CD8 | 0.82 | 0.08 | 0.98 | 0.96 | 0.35 | 0.95 | 5686.00 |
105
+ | DP (P) | 1.00 | 0.12 | 0.98 | 0.92 | 0.52 | 0.92 | 5593.00 |
106
+ | Immature CD4 | 1.00 | 0.10 | 0.99 | 0.94 | 0.32 | 0.94 | 5164.00 |
107
+ | Mature CD8 | 0.91 | 0.13 | 0.99 | 0.97 | 0.40 | 0.96 | 4234.00 |
108
+ | DN | 0.82 | 0.14 | 0.99 | 0.94 | 0.57 | 0.92 | 2395.00 |
109
+ | GD T | 0.82 | 0.13 | 0.99 | 0.95 | 0.39 | 0.93 | 2279.00 |
110
+ | Treg | 0.91 | 0.12 | 0.98 | 0.98 | 0.44 | 0.95 | 1966.00 |
111
+ | Neg. sel. (2) | 0.91 | 0.10 | 0.99 | 0.97 | 0.25 | 0.90 | 1560.00 |
112
+ | Dying | 0.82 | 0.13 | 0.93 | 0.91 | 0.52 | 0.93 | 1552.00 |
113
+ | Neg. sel. (1) | 0.82 | 0.13 | 0.97 | 0.95 | 0.27 | 0.87 | 1206.00 |
114
+ | Mature cycling | 0.73 | 0.17 | 0.97 | 0.94 | 0.27 | 0.89 | 992.00 |
115
+ | Interferon sig. | 0.91 | 0.09 | 0.94 | 0.78 | 0.15 | 0.91 | 984.00 |
116
+ | NKT | 0.82 | 0.18 | 0.95 | 0.95 | 0.56 | 0.93 | 928.00 |
117
+ | Myeloid | 1.00 | 0.18 | 0.97 | 0.93 | 0.66 | 0.97 | 908.00 |
118
+ | Doublet | 0.55 | 0.35 | 0.60 | 0.46 | 0.81 | 0.99 | 677.00 |
119
+ | B | 0.73 | 0.60 | 0.93 | 0.81 | 0.40 | 0.78 | 106.00 |
120
+ | Erythrocyte | 0.55 | 0.74 | 0.79 | 0.69 | 0.50 | 0.59 | 43.00 |
121
+
122
+
123
+
124
+ </details>
125
+
126
+ # Model Properties
127
+
128
+ We provide here key parameters used to setup and train the model.
129
+
130
+ <details>
131
+ <summary><strong>Model Parameters</strong></summary>
132
+
133
+ These provide the settings to setup the original model:
134
+ ```json
135
+ {
136
+ "n_latent": 20,
137
+ "gene_dispersion": "gene",
138
+ "protein_dispersion": "protein",
139
+ "gene_likelihood": "nb",
140
+ "latent_distribution": "normal",
141
+ "empirical_protein_background_prior": null,
142
+ "override_missing_proteins": false
143
+ }
144
+ ```
145
+
146
+ </details>
147
+
148
+ <details>
149
+ <summary><strong>Setup Data Arguments</strong></summary>
150
+
151
+ Arguments passed to setup_anndata of the original model:
152
+ ```json
153
+ {
154
+ "rna_layer": "counts",
155
+ "protein_layer": null,
156
+ "batch_key": "sample_id",
157
+ "size_factor_key": null,
158
+ "categorical_covariate_keys": null,
159
+ "continuous_covariate_keys": null,
160
+ "modalities": {
161
+ "rna_layer": "rna",
162
+ "protein_layer": "protein",
163
+ "batch_key": "rna"
164
+ }
165
+ }
166
+ ```
167
+
168
+ </details>
169
+
170
+ <details>
171
+ <summary><strong>Data Registry</strong></summary>
172
+
173
+ Registry elements for AnnData manager:
174
+ |  Registry Key  |  scvi-tools Location  |
175
+ |-------------------|--------------------------------------|
176
+ |  X  |  adata.mod['rna'].layers['counts']  |
177
+ |  batch  | adata.mod['rna'].obs['_scvi_batch']  |
178
+ |  labels  |  adata.obs['_scvi_labels']  |
179
+ |  latent_qzm  |  adata.obsm['totalvi_latent_qzm']  |
180
+ |  latent_qzv  |  adata.obsm['totalvi_latent_qzv']  |
181
+ |  minify_type  | adata.uns['_scvi_adata_minify_type'] |
182
+ | observed_lib_size |  adata.obs['observed_lib_size']  |
183
+ |  proteins  |  adata.mod['protein'].X  |
184
+
185
+ - **Data is Minified**: False
186
+
187
+ </details>
188
+
189
+ <details>
190
+ <summary><strong>Summary Statistics</strong></summary>
191
+
192
+ |  Summary Stat Key  | Value |
193
+ |--------------------------|-------|
194
+ |  n_batch  |  17  |
195
+ |  n_cells  | 72042 |
196
+ | n_extra_categorical_covs |  0  |
197
+ | n_extra_continuous_covs  |  0  |
198
+ |  n_labels  |  1  |
199
+ |  n_latent_qzm  |  20  |
200
+ |  n_latent_qzv  |  20  |
201
+ |  n_proteins  |  111  |
202
+ |  n_vars  | 4000  |
203
+
204
+ </details>
205
+
206
+
207
+ <details>
208
+ <summary><strong>Training</strong></summary>
209
+
210
+ <!-- If your model is not uploaded with any data (e.g., minified data) on the Model Hub, then make
211
+ sure to provide this field if you want users to be able to access your training data. See the
212
+ scvi-tools documentation for details. -->
213
+ **Training data url**: Not provided by uploader
214
+
215
+ If provided by the original uploader, for those interested in understanding or replicating the
216
+ training process, the code is available at the link below.
217
+
218
+ **Training Code URL**: https://github.com/YosefLab/Thymus_CITE-seq/blob/main/totalVI_AllData/totalVI_thymus111.ipynb
219
+
220
+ </details>
221
+
222
+
223
+ # References
224
+
225
+ Steier, Z., Aylard, D.A., McIntyre, L.L. et al. Single-cell multiomic analysis of thymocyte development reveals drivers of CD4+ T cell and CD8+ T cell lineage commitment. Nat Immunol 24, 1579–1590 (2023). https://doi.org/10.1038/s41590-023-01584-0.