canergen commited on
Commit
169b2fa
·
verified ·
1 Parent(s): a38995c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -5
README.md CHANGED
@@ -1,10 +1,74 @@
1
  ---
2
  title: README
3
- emoji: 📚
4
- colorFrom: pink
5
- colorTo: green
6
  sdk: static
7
- pinned: false
 
 
8
  ---
 
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: README
3
+ emoji: 🐨
4
+ colorFrom: purple
5
+ colorTo: blue
6
  sdk: static
7
+ pinned: true
8
+ license: bsd-3-clause
9
+ short_description: Ensemble of experts for cell-type annotation
10
  ---
11
+ # **popV**
12
 
13
+ Welcome to the **popV** framework. We provide state-of-the-art performance in cell-type label transfer using an ensemble of experts approach. We provide here pre-trained
14
+ models to transfer cell-types to your own query dataset. Cell-type definition is a tedious process. Using reference data can significantly accelerate this process.
15
+ By using several tools for label transfer, we provide a certainty score that is well calibrated and allows to detect cell-types, where automatic annotation has high
16
+ uncertainty. We recommend to manually check transferred cell-type labels by plotting marker or differentially expressed genes before blindly trusting them.
17
+ This is an open science initiative, please contribute your own models to allow the single-cell community to leverage your reference datasets by asking in our [GitHub
18
+ repository](https://github.com/YosefLab/popV) to add your dataset.
19
+
20
+ ---
21
+
22
+ ## **Model Overview**
23
+ popV trains up to 9 different algorithms for automatic label transfer and computes a consensus score. We provide an automatic report. To learn how to apply popV to your
24
+ own dataset, please refer to our [tutorial]()
25
+
26
+ ### Algorithms
27
+
28
+ Currently implemented algorithms are:
29
+
30
+ - K-nearest neighbor classification after dataset integration with [BBKNN](https://github.com/Teichlab/bbknn)
31
+ - K-nearest neighbor classification after dataset integration with [SCANORAMA](https://github.com/brianhie/scanorama)
32
+ - K-nearest neighbor classification after dataset integration with [scVI](https://github.com/scverse/scvi-tools)
33
+ - K-nearest neighbor classification after dataset integration with [Harmony](https://github.com/lilab-bcb/harmony-pytorch)
34
+ - Random forest classification
35
+ - Support vector machine classification
36
+ - [OnClass](https://github.com/wangshenguiuc/OnClass) cell type classification
37
+ - [scANVI](https://github.com/scverse/scvi-tools) label transfer
38
+ - [Celltypist](https://www.celltypist.org) cell type classification
39
+
40
+ All algorithms are implemented as a class in [popv/algorithms](popv/algorithms/__init__.py).
41
+ To implement a new method, a class has to have several methods:
42
+
43
+ - algorithm.compute_integration: Computes dataset integration to yield an integrated latent space.
44
+ - algorithm.predict: Computes cell-type labels based on the specific classifier.
45
+ - algorithm.compute_embedding: Computes UMAP embedding of previously computed integrated latent space.
46
+
47
+ Adding a new class with those methods will automatically tell popV to include this class into its classifiers and will use the new classifier as another expert.
48
+
49
+ ---
50
+
51
+ ## **Key Applications**
52
+ The purpose of these models is to perform cell-type label transfer.
53
+ We provide models with (CUML support)[collection] for large-scale reference mapping and (without CUML support)[collection] if no GPU is available. PopV without GPU scales
54
+ well to 100k cells. PopV has three levels of prediction complexities:
55
+
56
+ - retrain will train all classifiers from scratch. For 50k cells this takes up to an hour of computing time using a GPU.
57
+ - inference will use pretrained classifiers to annotate query as well as reference cells and construct a joint embedding using all integration methods from above. For 50k cells this takes in our hands up to half an hour of computing time using a GPU.
58
+ - fast will use only methods with pretrained classifiers to annotate only query cells. For 50k cells this takes 5 minutes without a GPU (without UMAP embedding).
59
+
60
+ ---
61
+
62
+ ## **Publications**
63
+ - **[Original popV paper](https://www.nature.com/articles/s41588-024-01993-3)**:
64
+ - Published in *Nature Genetics*, this paper introduces popV and benchmarks it.
65
+
66
+ ## **Contact**
67
+ - GitHub: [https://github.com/YosefLab/popV](https://github.com/YosefLab/popV)
68
+ - User questions: [Discourse](https://discourse.scverse.org)
69
+
70
+
71
+ <!---
72
+ - **[MultiVI](https://docs.scvi-tools.org/en/stable/user_guide/models/multivi.html)**:
73
+ - A multi-modal model for joint analysis of RNA, ATAC and protein data, enabling integrative insights from diverse omics data.
74
+ -->