Cancer Transcriptomics ML

Machine learning classification of tumour vs. normal tissue across 5 TCGA cancer types, filtered through two-scale evolutionary analysis to identify candidate cancer-maintaining gene dependencies.

163 candidate genes identified โ†’ 15 cross-validated across cancers โ†’ Established drivers confirmed (TP53, PIK3CA, PTEN)
๐Ÿ’ก Central hypothesis: ML-predictive genes under both strong germline purifying selection (dN/dS < 0.3) AND somatic positive selection (dN/dS โ‰ฅ 1.5, FDR < 0.05) are candidate cancer-maintaining dependencies.
๐Ÿงฌ Cancer Type Overview
BRCA โ€” Breast โ€”
Bal. Accuracyโ€”
Specificityโ€”
AUCโ€”
Samplesโ€”
Genes Testedโ€”
BLCA โ€” Bladder โ€”
Bal. Accuracyโ€”
Specificityโ€”
AUCโ€”
Samplesโ€”
Genes Testedโ€”
PRAD โ€” Prostate โ€”
Bal. Accuracyโ€”
Specificityโ€”
AUCโ€”
Samplesโ€”
Genes Testedโ€”
LUAD โ€” Lung Adeno. โ€”
Bal. Accuracyโ€”
Specificityโ€”
AUCโ€”
Samplesโ€”
Genes Testedโ€”
UCEC โ€” Uterine โ€”
Bal. Accuracyโ€”
Specificityโ€”
AUCโ€”
Samplesโ€”
Genes Testedโ€”
โš ๏ธ Limitations & Caveats
  • Somatic dN/dS method: Uses a per-gene binomial exact test, not the site-level dNdScv model. Results should be interpreted as exploratory.
  • UCEC candidate count: Elevated candidate numbers in uterine cancer likely reflect microsatellite-instability-driven hypermutation rather than a proportionally larger set of true dependencies.
  • PRAD statistical power: Prostate cancer has the lowest specificity (73.5%), driven by smaller normal-tissue sample size and adjacent-normal heterogeneity.
  • Near-perfect AUC values: High AUC scores reflect the intrinsic separability of tumour vs. normal transcriptomes (thousands of DE genes) rather than the specificity of the final gene signatures.

๐Ÿ†• v2 Statistical Hardening (new)

Every headline metric now ships with 95 % confidence intervals; somatic dN/dS uses Wilson CIs with a minimum-synonymous filter; feature rankings are stable across folds; specificity is reported in percentage points.

Explore v2 dashboard โ†’