popV
AI & ML interests
popularVoting for cell-type annotation in single-cell genomics
Recent Activity
popV
Welcome to the popV framework. We provide state-of-the-art performance in cell-type label transfer using an ensemble of experts approach. We provide here pre-trained models to transfer cell-types to your own query dataset. Cell-type definition is a tedious process. Using reference data can significantly accelerate this process. By using several tools for label transfer, we provide a certainty score that is well calibrated and allows to detect cell-types, where automatic annotation has high uncertainty. We recommend to manually check transferred cell-type labels by plotting marker or differentially expressed genes before blindly trusting them. This is an open science initiative, please contribute your own models to allow the single-cell community to leverage your reference datasets by asking in our GitHub repository to add your dataset.
Model Overview
popV trains up to 9 different algorithms for automatic label transfer and computes a consensus score. We provide an automatic report. To learn how to apply popV to your own dataset, please refer to our tutorial
Algorithms
Currently implemented algorithms are:
- K-nearest neighbor classification after dataset integration with BBKNN
- K-nearest neighbor classification after dataset integration with SCANORAMA
- K-nearest neighbor classification after dataset integration with scVI
- K-nearest neighbor classification after dataset integration with Harmony
- Random forest classification
- Support vector machine classification
- OnClass cell type classification
- scANVI label transfer
- Celltypist cell type classification
Key Applications
The purpose of these models is to perform cell-type label transfer. We provide models with (CUML support)[collection] for large-scale reference mapping and (without CUML support)[collection] if no GPU is available. PopV without GPU scales well to 100k cells. PopV has three levels of prediction complexities:
- retrain will train all classifiers from scratch. For 50k cells this takes up to an hour of computing time using a GPU.
- inference will use pretrained classifiers to annotate query as well as reference cells and construct a joint embedding using all integration methods from above. For 50k cells this takes in our hands up to half an hour of computing time using a GPU.
- fast will use only methods with pretrained classifiers to annotate only query cells. For 50k cells this takes 5 minutes without a GPU (without UMAP embedding).
Publications
- Original popV paper:
- Published in Nature Genetics, this paper introduces popV and benchmarks it.
Contact
- GitHub: https://github.com/YosefLab/popV
- User questions: Discourse