Update README.md
Browse files
README.md
CHANGED
@@ -1,10 +1,74 @@
|
|
1 |
---
|
2 |
title: README
|
3 |
-
emoji:
|
4 |
-
colorFrom:
|
5 |
-
colorTo:
|
6 |
sdk: static
|
7 |
-
pinned:
|
|
|
|
|
8 |
---
|
|
|
9 |
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
title: README
|
3 |
+
emoji: 🐨
|
4 |
+
colorFrom: purple
|
5 |
+
colorTo: blue
|
6 |
sdk: static
|
7 |
+
pinned: true
|
8 |
+
license: bsd-3-clause
|
9 |
+
short_description: Ensemble of experts for cell-type annotation
|
10 |
---
|
11 |
+
# **popV**
|
12 |
|
13 |
+
Welcome to the **popV** framework. We provide state-of-the-art performance in cell-type label transfer using an ensemble of experts approach. We provide here pre-trained
|
14 |
+
models to transfer cell-types to your own query dataset. Cell-type definition is a tedious process. Using reference data can significantly accelerate this process.
|
15 |
+
By using several tools for label transfer, we provide a certainty score that is well calibrated and allows to detect cell-types, where automatic annotation has high
|
16 |
+
uncertainty. We recommend to manually check transferred cell-type labels by plotting marker or differentially expressed genes before blindly trusting them.
|
17 |
+
This is an open science initiative, please contribute your own models to allow the single-cell community to leverage your reference datasets by asking in our [GitHub
|
18 |
+
repository](https://github.com/YosefLab/popV) to add your dataset.
|
19 |
+
|
20 |
+
---
|
21 |
+
|
22 |
+
## **Model Overview**
|
23 |
+
popV trains up to 9 different algorithms for automatic label transfer and computes a consensus score. We provide an automatic report. To learn how to apply popV to your
|
24 |
+
own dataset, please refer to our [tutorial]()
|
25 |
+
|
26 |
+
### Algorithms
|
27 |
+
|
28 |
+
Currently implemented algorithms are:
|
29 |
+
|
30 |
+
- K-nearest neighbor classification after dataset integration with [BBKNN](https://github.com/Teichlab/bbknn)
|
31 |
+
- K-nearest neighbor classification after dataset integration with [SCANORAMA](https://github.com/brianhie/scanorama)
|
32 |
+
- K-nearest neighbor classification after dataset integration with [scVI](https://github.com/scverse/scvi-tools)
|
33 |
+
- K-nearest neighbor classification after dataset integration with [Harmony](https://github.com/lilab-bcb/harmony-pytorch)
|
34 |
+
- Random forest classification
|
35 |
+
- Support vector machine classification
|
36 |
+
- [OnClass](https://github.com/wangshenguiuc/OnClass) cell type classification
|
37 |
+
- [scANVI](https://github.com/scverse/scvi-tools) label transfer
|
38 |
+
- [Celltypist](https://www.celltypist.org) cell type classification
|
39 |
+
|
40 |
+
All algorithms are implemented as a class in [popv/algorithms](popv/algorithms/__init__.py).
|
41 |
+
To implement a new method, a class has to have several methods:
|
42 |
+
|
43 |
+
- algorithm.compute_integration: Computes dataset integration to yield an integrated latent space.
|
44 |
+
- algorithm.predict: Computes cell-type labels based on the specific classifier.
|
45 |
+
- algorithm.compute_embedding: Computes UMAP embedding of previously computed integrated latent space.
|
46 |
+
|
47 |
+
Adding a new class with those methods will automatically tell popV to include this class into its classifiers and will use the new classifier as another expert.
|
48 |
+
|
49 |
+
---
|
50 |
+
|
51 |
+
## **Key Applications**
|
52 |
+
The purpose of these models is to perform cell-type label transfer.
|
53 |
+
We provide models with (CUML support)[collection] for large-scale reference mapping and (without CUML support)[collection] if no GPU is available. PopV without GPU scales
|
54 |
+
well to 100k cells. PopV has three levels of prediction complexities:
|
55 |
+
|
56 |
+
- retrain will train all classifiers from scratch. For 50k cells this takes up to an hour of computing time using a GPU.
|
57 |
+
- inference will use pretrained classifiers to annotate query as well as reference cells and construct a joint embedding using all integration methods from above. For 50k cells this takes in our hands up to half an hour of computing time using a GPU.
|
58 |
+
- fast will use only methods with pretrained classifiers to annotate only query cells. For 50k cells this takes 5 minutes without a GPU (without UMAP embedding).
|
59 |
+
|
60 |
+
---
|
61 |
+
|
62 |
+
## **Publications**
|
63 |
+
- **[Original popV paper](https://www.nature.com/articles/s41588-024-01993-3)**:
|
64 |
+
- Published in *Nature Genetics*, this paper introduces popV and benchmarks it.
|
65 |
+
|
66 |
+
## **Contact**
|
67 |
+
- GitHub: [https://github.com/YosefLab/popV](https://github.com/YosefLab/popV)
|
68 |
+
- User questions: [Discourse](https://discourse.scverse.org)
|
69 |
+
|
70 |
+
|
71 |
+
<!---
|
72 |
+
- **[MultiVI](https://docs.scvi-tools.org/en/stable/user_guide/models/multivi.html)**:
|
73 |
+
- A multi-modal model for joint analysis of RNA, ATAC and protein data, enabling integrative insights from diverse omics data.
|
74 |
+
-->
|