scvi-tools
/

mouse_thymus_cite_totalvi

+---
+library_name: scvi-tools
+license: cc-by-4.0
+tags:
+- biology
+- genomics
+- single-cell
+- model_cls_name:TOTALVI
+- scvi_version:1.2.0
+- anndata_version:0.11.1
+- modality:rna
+- modality:protein
+- tissue:thymus
+- annotated:True
+---
+TotalVI is a variational inference model for single-cell RNA-seq as well as protein data that can
+learn an underlying latent space, integrate technical batches, impute dropouts,
+and predict protein expression given gene expression or missing protein data given gene expression
+and protein data for a subset of proteins.
+The learned low-dimensional latent representation of the data can be used for visualization and
+clustering.
+TotalVI takes as input a scRNA-seq gene expression and protein expression matrix with cells and
+genes.
+We provide an extensive [user guide](https://docs.scvi-tools.org/en/1.2.0/user_guide/models/totalvi.html).
+- See our original manuscript for further details of the model:
+[TotalVI manuscript](https://www.nature.com/articles/s41592-020-01050-x).
+- See our manuscript on [scvi-hub](https://www.biorxiv.org/content/10.1101/2024.03.01.582887v2)
+how to leverage pre-trained models.
+This model can be used for fine tuning on new data using our Arches framework:
+[Arches tutorial](https://docs.scvi-tools.org/en/1.0.0/tutorials/notebooks/scarches_scvi_tools.html).
+# Model Description
+CITE-seq to measure RNA and surface proteins in thymocytes from wild-type and T cell lineage-restricted mice to generate a comprehensive timeline of cell state for each T cell lineage.
+# Metrics
+We provide here key performance metrics for the uploaded model, if provided by the data uploader.
+<details>
+<summary><strong>Coefficient of variation</strong></summary>
+The cell-wise coefficient of variation summarizes how well variation between different cells is
+preserved by the generated model expression. Below a squared Pearson correlation coefficient of 0.4
+, we would recommend not to use generated data for downstream analysis, while the generated latent
+space might still be useful for analysis.
+**Cell-wise Coefficient of Variation**:
+Modality: protein
+| Metric                  | Training Value | Validation Value |
+|-------------------------|----------------|------------------|
+| Mean Absolute Error | 0.32  | 0.33           |
+| Pearson Correlation | 0.52  | 0.51  |
+| Spearman Correlation | 0.49 | 0.49  |
+| R² (R-Squared) | -0.01  | -0.01      |
+The gene-wise coefficient of variation summarizes how well variation between different genes is
+preserved by the generated model expression. This value is usually quite high.
+**Gene-wise Coefficient of Variation**:
+Modality: protein
+| Metric                  | Training Value |
+|-------------------------|----------------|
+| Mean Absolute Error | 0.32   |
+| Pearson Correlation | 0.87  |
+| Spearman Correlation | 0.95 |
+| R² (R-Squared) | 0.16  |
+</details>
+<details>
+<summary><strong>Differential expression metric</strong></summary>
+The differential expression metric provides a summary of the differential expression analysis
+between cell types or input clusters. We provide here the F1-score, Pearson Correlation
+Coefficient of Log-Foldchanges, Spearman Correlation Coefficient, and Area Under the Precision
+Recall Curve (AUPRC) for the differential expression analysis using Wilcoxon Rank Sum test for each
+cell-type.
+**Differential expression**:
+Modality: protein
+| Index | gene_f1 | lfc_mae | lfc_pearson | lfc_spearman | roc_auc | pr_auc | n_cells |
+| --- | --- | --- | --- | --- | --- | --- | --- |
+| DP (Q2) | 0.91 | 0.09 | 0.99 | 0.98 | 0.50 | 0.99 | 10864.00 |
+| DP (Sig.) | 0.91 | 0.09 | 0.97 | 0.93 | 0.21 | 0.93 | 9824.00 |
+| DP (Q1) | 1.00 | 0.08 | 0.99 | 0.98 | 0.61 | 0.98 | 8556.00 |
+| Mature CD4 | 0.91 | 0.13 | 0.99 | 0.98 | 0.57 | 0.98 | 6525.00 |
+| Immature CD8 | 0.82 | 0.08 | 0.98 | 0.96 | 0.35 | 0.95 | 5686.00 |
+| DP (P) | 1.00 | 0.12 | 0.98 | 0.92 | 0.52 | 0.92 | 5593.00 |
+| Immature CD4 | 1.00 | 0.10 | 0.99 | 0.94 | 0.32 | 0.94 | 5164.00 |
+| Mature CD8 | 0.91 | 0.13 | 0.99 | 0.97 | 0.40 | 0.96 | 4234.00 |
+| DN | 0.82 | 0.14 | 0.99 | 0.94 | 0.57 | 0.92 | 2395.00 |
+| GD T | 0.82 | 0.13 | 0.99 | 0.95 | 0.39 | 0.93 | 2279.00 |
+| Treg | 0.91 | 0.12 | 0.98 | 0.98 | 0.44 | 0.95 | 1966.00 |
+| Neg. sel. (2) | 0.91 | 0.10 | 0.99 | 0.97 | 0.25 | 0.90 | 1560.00 |
+| Dying | 0.82 | 0.13 | 0.93 | 0.91 | 0.52 | 0.93 | 1552.00 |
+| Neg. sel. (1) | 0.82 | 0.13 | 0.97 | 0.95 | 0.27 | 0.87 | 1206.00 |
+| Mature cycling | 0.73 | 0.17 | 0.97 | 0.94 | 0.27 | 0.89 | 992.00 |
+| Interferon sig. | 0.91 | 0.09 | 0.94 | 0.78 | 0.15 | 0.91 | 984.00 |
+| NKT | 0.82 | 0.18 | 0.95 | 0.95 | 0.56 | 0.93 | 928.00 |
+| Myeloid | 1.00 | 0.18 | 0.97 | 0.93 | 0.66 | 0.97 | 908.00 |
+| Doublet | 0.55 | 0.35 | 0.60 | 0.46 | 0.81 | 0.99 | 677.00 |
+| B | 0.73 | 0.60 | 0.93 | 0.81 | 0.40 | 0.78 | 106.00 |
+| Erythrocyte | 0.55 | 0.74 | 0.79 | 0.69 | 0.50 | 0.59 | 43.00 |
+</details>
+# Model Properties
+We provide here key parameters used to setup and train the model.
+<details>
+<summary><strong>Model Parameters</strong></summary>
+These provide the settings to setup the original model:
+```json
+{
+    "n_latent": 20,
+    "gene_dispersion": "gene",
+    "protein_dispersion": "protein",
+    "gene_likelihood": "nb",
+    "latent_distribution": "normal",
+    "empirical_protein_background_prior": null,
+    "override_missing_proteins": false
+}
+```
+</details>
+<details>
+<summary><strong>Setup Data Arguments</strong></summary>
+Arguments passed to setup_anndata of the original model:
+```json
+{
+    "rna_layer": "counts",
+    "protein_layer": null,
+    "batch_key": "sample_id",
+    "size_factor_key": null,
+    "categorical_covariate_keys": null,
+    "continuous_covariate_keys": null,
+    "modalities": {
+        "rna_layer": "rna",
+        "protein_layer": "protein",
+        "batch_key": "rna"
+    }
+}
+```
+</details>
+<details>
+<summary><strong>Data Registry</strong></summary>
+Registry elements for AnnData manager:
+|[1m [0m[1m  Registry Key   [0m[1m [0m|[1m [0m[1m        scvi-tools Location         [0m[1m [0m|
+|-------------------|--------------------------------------|
+|[94m [0m[94m        X        [0m[94m [0m|[35m [0m[35m adata.mod['rna'].layers['counts']  [0m[35m [0m|
+|[94m [0m[94m      batch      [0m[94m [0m|[35m [0m[35madata.mod['rna'].obs['_scvi_batch'] [0m[35m [0m|
+|[94m [0m[94m     labels      [0m[94m [0m|[35m [0m[35m     adata.obs['_scvi_labels']      [0m[35m [0m|
+|[94m [0m[94m   latent_qzm    [0m[94m [0m|[35m [0m[35m  adata.obsm['totalvi_latent_qzm']  [0m[35m [0m|
+|[94m [0m[94m   latent_qzv    [0m[94m [0m|[35m [0m[35m  adata.obsm['totalvi_latent_qzv']  [0m[35m [0m|
+|[94m [0m[94m   minify_type   [0m[94m [0m|[35m [0m[35madata.uns['_scvi_adata_minify_type'][0m[35m [0m|
+|[94m [0m[94mobserved_lib_size[0m[94m [0m|[35m [0m[35m   adata.obs['observed_lib_size']   [0m[35m [0m|
+|[94m [0m[94m    proteins     [0m[94m [0m|[35m [0m[35m       adata.mod['protein'].X       [0m[35m [0m|
+- **Data is Minified**: False
+</details>
+<details>
+<summary><strong>Summary Statistics</strong></summary>
+|[1m [0m[1m    Summary Stat Key    [0m[1m [0m|[1m [0m[1mValue[0m[1m [0m|
+|--------------------------|-------|
+|[94m [0m[94m        n_batch         [0m[94m [0m|[35m [0m[35m 17  [0m[35m [0m|
+|[94m [0m[94m        n_cells         [0m[94m [0m|[35m [0m[35m72042[0m[35m [0m|
+|[94m [0m[94mn_extra_categorical_covs[0m[94m [0m|[35m [0m[35m  0  [0m[35m [0m|
+|[94m [0m[94mn_extra_continuous_covs [0m[94m [0m|[35m [0m[35m  0  [0m[35m [0m|
+|[94m [0m[94m        n_labels        [0m[94m [0m|[35m [0m[35m  1  [0m[35m [0m|
+|[94m [0m[94m      n_latent_qzm      [0m[94m [0m|[35m [0m[35m 20  [0m[35m [0m|
+|[94m [0m[94m      n_latent_qzv      [0m[94m [0m|[35m [0m[35m 20  [0m[35m [0m|
+|[94m [0m[94m       n_proteins       [0m[94m [0m|[35m [0m[35m 111 [0m[35m [0m|
+|[94m [0m[94m         n_vars         [0m[94m [0m|[35m [0m[35m4000 [0m[35m [0m|
+</details>
+<details>
+<summary><strong>Training</strong></summary>
+<!-- If your model is not uploaded with any data (e.g., minified data) on the Model Hub, then make
+sure to provide this field if you want users to be able to access your training data. See the
+scvi-tools documentation for details. -->
+**Training data url**: Not provided by uploader
+If provided by the original uploader, for those interested in understanding or replicating the
+training process, the code is available at the link below.
+**Training Code URL**: https://github.com/YosefLab/Thymus_CITE-seq/blob/main/totalVI_AllData/totalVI_thymus111.ipynb
+</details>
+# References
+Steier, Z., Aylard, D.A., McIntyre, L.L. et al. Single-cell multiomic analysis of thymocyte development reveals drivers of CD4+ T cell and CD8+ T cell lineage commitment. Nat Immunol 24, 1579–1590 (2023). https://doi.org/10.1038/s41590-023-01584-0.