--- license: cc-by-4.0 --- # Pancancer tissue classifier This model classifies among 32 cancers from TCGA. It was trained by Jakub Kaczmarzyk using CLAM. Output classes: ACC, BLCA, BRCA, CESC, CHOL, COAD, DLBC, ESCA, GBM, HNSC, KICH, KIRC, KIRP, LGG, LIHC, LUAD, LUSC, MESO, OV, PAAD, PCPG, PRAD, READ, SARC, SKCM, STAD, TGCT, THCA, THYM, UCEC, UCS, UVM. Please see the [TCGA study abbreviations](https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/tcga-study-abbreviations) to map these class names to the TCGA study names. ## Data Diagnostic slides in TCGA (e.g., `DX`) were used to train the model. The whole slide images were tiles into 128x128um patches, and each patch was encoded using CTransPath (this produces 768-dimensional embeddings). Train, validation, and test splits were stratified by TCGA study, and patients did not cross split boundaries. Samples sizes: - Train: 9,257 slides (7,633 patients) - Validation: 1,186 slides (955 patients) - Test: 1,163 slides (955 patients) ## Model performance The model achieved a weighted average AUROC of 0.99 (one-vs-rest). Here are the one-vs-rest AUROC values for each TCGA study. - ACC: 0.9993 - BLCA: 0.9814 - BRCA: 0.9908 - CESC: 0.9868 - CHOL: 0.9972 - COAD: 0.9927 - DLBC: 0.9996 - ESCA: 0.9571 - GBM: 0.9984 - HNSC: 0.9974 - KICH: 0.9998 - KIRC: 0.9993 - KIRP: 0.9952 - LGG: 0.9984 - LIHC: 0.9988 - LUAD: 0.9879 - LUSC: 0.9868 - MESO: 0.9961 - OV: 0.9900 - PAAD: 0.9897 - PCPG: 0.9944 - PRAD: 1.0000 - READ: 0.9752 - SARC: 0.9946 - SKCM: 0.9957 - STAD: 0.9932 - TGCT: 0.9957 - THCA: 1.0000 - THYM: 0.9991 - UCEC: 0.9971 - UCS: 0.9863 - UVM: 0.9997 ### Renal cell carcinoma (RCC) subtyping RCC subtyping is a relatively common benchmark task for slide-level classification. We evaluate this model on RCC subtyping. When tested on a set of 52 KIRC slides and 28 KIRP slides (from the overall test set), the model achieved a balanced accuracy of 0.88. ### Non-small cell lung cancer (NSCLC) subtyping NSCLC subtyping is a relatively common benchmark task for slide-level classification. We evaluate this model on NSCLC subtyping. When tested on a set of 55 LUAD slides and 58 LUSC slides (from the overall test set), the model achieved a balanced accuracy of 0.76. # Intended uses This model is ONLY intended for research purposes. **This model may not be used for clinical purposes.** This model is distributed without warranties, either express or implied.