metadata

title: README
emoji: 🐨
colorFrom: purple
colorTo: blue
sdk: static
pinned: true
license: bsd-3-clause
short_description: Ensemble of experts for cell-type annotation

popV

Welcome to the popV framework. We provide state-of-the-art performance in cell-type label transfer using an ensemble of experts approach. We provide here pre-trained models to transfer cell-types to your own query dataset. Cell-type definition is a tedious process. Using reference data can significantly accelerate this process. By using several tools for label transfer, we provide a certainty score that is well calibrated and allows to detect cell-types, where automatic annotation has high uncertainty. We recommend to manually check transferred cell-type labels by plotting marker or differentially expressed genes before blindly trusting them. This is an open science initiative, please contribute your own models to allow the single-cell community to leverage your reference datasets by asking in our GitHub repository to add your dataset.

Model Overview

popV trains up to 9 different algorithms for automatic label transfer and computes a consensus score. We provide an automatic report. To learn how to apply popV to your own dataset, please refer to our tutorial

Algorithms

Currently implemented algorithms are:

K-nearest neighbor classification after dataset integration with BBKNN
K-nearest neighbor classification after dataset integration with SCANORAMA
K-nearest neighbor classification after dataset integration with scVI
K-nearest neighbor classification after dataset integration with Harmony
Random forest classification
Support vector machine classification
OnClass cell type classification
scANVI label transfer
Celltypist cell type classification

All algorithms are implemented as a class in popv/algorithms. To implement a new method, a class has to have several methods:

algorithm.compute_integration: Computes dataset integration to yield an integrated latent space.
algorithm.predict: Computes cell-type labels based on the specific classifier.
algorithm.compute_embedding: Computes UMAP embedding of previously computed integrated latent space.

Adding a new class with those methods will automatically tell popV to include this class into its classifiers and will use the new classifier as another expert.

Key Applications

The purpose of these models is to perform cell-type label transfer. We provide models with (CUML support)[collection] for large-scale reference mapping and (without CUML support)[collection] if no GPU is available. PopV without GPU scales well to 100k cells. PopV has three levels of prediction complexities:

retrain will train all classifiers from scratch. For 50k cells this takes up to an hour of computing time using a GPU.
inference will use pretrained classifiers to annotate query as well as reference cells and construct a joint embedding using all integration methods from above. For 50k cells this takes in our hands up to half an hour of computing time using a GPU.
fast will use only methods with pretrained classifiers to annotate only query cells. For 50k cells this takes 5 minutes without a GPU (without UMAP embedding).

Publications

Original popV paper:
- Published in Nature Genetics, this paper introduces popV and benchmarks it.

Contact

GitHub: https://github.com/YosefLab/popV
User questions: Discourse