Three ML models (Logistic Regression, Random Forest, MLP) trained on TCGA RNA-seq data with 5-fold stratified cross-validation.
β
Best Accuracy
bal. accuracy
β
Best AUC
across cancers
β
Avg Specificity Gain
percentage points
5
Cancer Types
TCGA cohorts
3
Models
LR Β· RF Β· MLP
Specificity Improvements by Cancer Type
MLP Performance Dashboard
Cancer
Bal. Accuracy
Specificity
Sensitivity
AUC
MCC
Architecture
Samples (T/N)
Loadingβ¦
Task Γ Model Results
Task
Model
Accuracy
Precision
Recall
ROC AUC
Loadingβ¦
β AUC = 1.000 reflects the profound transcriptomic difference between breast tumour and solid tissue normal samples (~5,000 differentially expressed genes). Gene features were selected using training data only (external samples excluded from the DESeq2 filtering step). While high, this validates model generalization rather than indicating data leakage.
β οΈ
Limitations
Near-perfect v1 AUC reflects the intrinsic separability of tumor vs. normal transcriptomes on the full DESeq2-filtered feature set (~5,000 genes), not signature-specific discriminatory power. The v2 Reliability Hardening table below retrains each cohort on its final candidate gene set and reports honest bootstrap-CI metrics.
PRAD v1 specificity (73.5%) is the lowest across cancers due to adjacent-normal tumor contamination; v2 Youden-optimal specificity is β.
UCEC has only 201 samples (smallest dataset) and reaches v1 AUC β 1.000 on the full feature set; v2 signature-only AUC = β.
SMOTE oversampling is applied for PRAD and BLCA within CV folds. Class weighting may be more appropriate for high-dimensional data.
βΉοΈ
All metrics are averaged over 5-fold stratified cross-validation.
Architecture is selected dynamically: 512β256β128 for datasets with
n > 600 samples, 256β128 for smaller datasets.
The headline metrics above are computed on the full DESeq2-filtered feature set (β 5 000 genes), which lets the MLP memorise tumor-vs-normal patterns and produces AUCs β₯ 0.99 in four out of five cohorts. The v2 hardening stage re-evaluates every cohort on its final candidate gene set (or the top-50 signature genes when too few candidates pass) with logistic regression, OOF predictions, Youden-optimal thresholding, and 1 000-iteration bootstrap 95 % CIs.
Loadingβ¦
Default = probability β₯ 0.5. Youden = threshold maximising sensitivity + specificity β 1 on held-out OOF predictions. Source: results/v2/<cohort>/reliability_hardened.json.
π v2 β 95 % Confidence Intervals on every metric
Metrics below are point estimates. The v2 layer computes bootstrap (or percentile-over-5-folds) 95 % CIs for AUC, balanced accuracy, MCC, sensitivity, and specificity per cohort.