Executive Summary
Baseline (All Columns)
Best Drop
Best Keep
Experiments
Model Comparison: All Columns + 14 Drop-One Experiments
Classification accuracy for baseline (all columns) and each run where one feature is dropped. Survival Months excluded in all.
Accuracy When Dropping Each Feature
• Drop-one-column: Train the same model once per feature, each time excluding that feature. Compare accuracy, ROC AUC, etc., to the baseline (all columns).
• Why it matters: If dropping a feature improves performance, that feature may add noise or overfitting. If dropping it hurts performance a lot, that feature is likely important.
• Baseline: "All columns" uses every feature except Survival Months (to avoid data leakage). Each row in the table is one experiment where we dropped one additional feature.
Why We Use Progesterone vs Estrogen (ER vs PR)
Estrogen Receptor (ER) and Progesterone Receptor (PR) status are key biomarkers in breast cancer. This tab explains how they relate to outcomes in this dataset and why the model may emphasize one over the other.
ER+ (Estrogen Positive)
ER− (Estrogen Negative)
PR+ (Progesterone Positive)
PR− (Progesterone Negative)
ER/PR Combined: Outcomes in This Dataset
P = Positive, N = Negative. ER−/PR− has the highest risk; ER+/PR+ the best baseline prognosis.
Why the Tree Kept Progesterone but Not Estrogen (In This Run)
- Estrogen is highly imbalanced (~93% positive), so it adds less split power in a shallow tree.
- Progesterone has more variation (~83% positive / 17% negative), so it can separate risk better.
- ER and PR are correlated; when two columns overlap, tree models often keep the one with cleaner incremental signal and treat the other as redundant.
- This ablation result is model-specific (split seed, depth, preprocessing). It does not mean estrogen is biologically unimportant.
Right interpretation: Keep both for clinical completeness. For this tree configuration, PR contributed more unique predictive signal than ER. For robust feature retention, run repeated cross-validation ablation (not a single split).
Five areas where progesterone receptor (PR) status shapes biology, risk, and treatment. Use the section titles to jump to what you need.
1 Mechanism of Immune System Evasion
- “Cloaking” effect: Progesterone suppresses “danger signals” on the surface of breast tumor cells, allowing them to bypass immune surveillance.
- STAT1 pathway: PR downregulates STAT1-mediated interferon-alpha signal, preventing the innate immune system from recognizing and destroying developing tumors.
- Immune-cold tumors: High progesterone activity is a primary reason breast tumors are often “immune-cold,” with weak T-cell response and resistance to standard immunotherapy.
2 PR as a Clinical Biomarker (Subtyping & Risk)
- Luminal A vs B: High PR (≥20%) is the primary differentiator for Luminal A (better prognosis) vs Luminal B (more aggressive, higher proliferation).
- ER+/PR−: Estrogen-positive but progesterone-negative tumors are associated with “unfunctional” ER pathway, higher recurrence, and worse overall survival.
- Isoform ratio (PR−A:PR−B): The balance between these receptor forms affects treatment outcome; high PR−A is linked to tamoxifen resistance and disease progression.
3 Therapeutic Synergy (PIONEER Trial)
- Genomic reprogramming: When bound to a ligand (e.g. Megestrol), PR physically interacts with ER, pulling it away from DNA sites that drive cancer growth.
- Antiproliferative synergy: Combining progestogens with Aromatase Inhibitors (AIs) can yield ~80% reduction in tumor proliferation (Ki67, AURKA), significantly higher than AI alone.
- Dose efficiency: Low-dose megestrol (40 mg) can be as effective as high-dose (160 mg) at suppressing growth while reducing side effects (e.g. hot flashes).
4 Drivers of Resistance & Poor Prognosis
- RANKL pathway: Progesterone increases RANKL expression; PR+ cells can signal PR− neighbors to divide, complicating local control.
- Resistance markers: FGFR1 amplification (intrinsic resistance to AIs); high TMB (≥9 mutations/Mb) with poor response to combined endocrine therapy.
- ER−/PR+ rarity: This rare subgroup (<2%) is clinically distinct, often younger, with worse outcomes than typical ER+/PR+.
5 Key Clinical Correlation Points
- Ki67: Post-treatment Ki67 ≤2.7% indicates complete cell-cycle arrest; >10% signals high recurrence risk.
- PR repression: Successful antiestrogen treatment often leads to loss of PR markers; if PR stays high during treatment, the tumor may be evading therapy.
Binary Classification (Alive vs. Dead)
All runs: Survival Months excluded. Baseline = All Columns (14 features); each other row = drop that one feature (13 features).
What “Best drop” and “Best keep” mean: Best drop = when we removed that feature, accuracy was highest - so keeping that feature may add noise. Best keep = when we removed that feature, accuracy was lowest - so dropping it makes the model less accurate, and keeping that feature makes the model more accurate (that feature is important). So “best keep” = important to keep; “best drop” = maybe better to leave out.
Baseline (All Columns)
Best Drop-One
Best Keep
Runs
Accuracy by Experiment
↓ lower % when dropped = more important to keep. Green = best to drop, Red = best to keep.
ROC AUC by Experiment
📉 Predicting Exact Survival Duration (Months)
Drop-one ablation for regression: each row shows MAE, RMSE, and R² when one feature is excluded. Survival Months is not used as a feature (data leakage); the model predicts it from diagnosis-time features.
Baseline MAE
Best R² (Drop)
Overall
Better Alternative
Model Performance Comparison
All approaches use diagnosis-time features only (Survival Months excluded). Higher is better. Regression (R² × 100) is on a different scale than accuracy.
MAE by Experiment (Lower is Better)
Δ = change vs baseline (All Columns). For MAE/RMSE, negative Δ is better; for R², positive Δ is better.
1. The Censored Data Problem
Many patients are still alive when the study ends. We only know they survived at least X months, not their total lifespan. Regression treats "still alive at 60 months" the same as "died at 60 months" - wrong.
2. Wrong Tool for the Job
Survival data needs Cox Proportional Hazards, Kaplan-Meier, or Survival Random Forests. Decision tree regression ignores censoring.
3. Extreme Variability
Survival months range from 0 to 100+ with huge noise. Trees predict averages in regions - terrible for this data.
Survival Horizon Analysis
Predicts whether the patient survives past 24 or 60 months. Drop-one ablation: each row shows performance when one feature is excluded.
Note on sample sizes: 24‑month horizon uses 3,976 patients (with 24mo+ follow‑up); 60‑month horizon uses 3,270 patients (with 60mo+ follow‑up). Fewer patients have 60‑month data.
Best 24-Month Accuracy
Baseline 60-Month
Demographics Impact
Clinical Utility
Horizon Accuracy Visualization
Accuracy vs ROC AUC
All numbers below use only diagnosis-time features (Survival Months excluded).
Baseline ROC AUC
Best ROC AUC (Drop)
Metrics
Accuracy and ROC AUC by Experiment
Green = Accuracy went down when this feature was dropped → this feature is important to keep (best keep).
Red = Accuracy went up when this feature was dropped → this feature may add noise (consider dropping).
Full Experiments Table
Delta = change vs baseline (All Columns).
Green = Metric went down when this feature was dropped → feature is important to keep (best keep).
Red = Metric went up when this feature was dropped → feature may add noise (consider dropping).
All Experiments: Accuracy vs ROC AUC
Each point = one experiment. Blue = baseline (All Columns), Red = best to drop (accuracy went up when removed), Green = best to keep.
Key Insights
The "Best to Keep" and "Best to Drop" below are based on classification accuracy (Alive vs Dead). Other tabs use different metrics:
- Overview / Classification / Key Insights: Best drop = highest classification accuracy when removed → -
- Horizon tab: "Best 24-Month Accuracy" = which drop gives the highest 24‑month survival prediction → - (may differ from classification)
- Regression tab: "Best R² (Drop)" = which drop improves regression R² → - (yet another metric)
So if you see different features called "best" in different tabs, that's expected - each tab optimizes a different outcome.
🏆 Best to Keep
-: Important for Accuracy
When we removed this feature, accuracy dropped to -. So the model relies on it - we should keep it.
With This Feature (Baseline)
| Accuracy | - |
| ROC AUC | - |
📉 When We Dropped It
| Accuracy | - |
| Change | - |
Why This Feature Is Best to Keep (Easy to Understand)
1️⃣ The model uses it to separate patients
When this feature is in the model, it helps distinguish who is likely alive vs dead. Removing it takes away real information the tree was using to make better splits.
2️⃣ It carries signal, not noise
Accuracy dropped when we removed it - that means the feature was contributing to correct predictions. If it were just noise, dropping it would have left accuracy the same or improved it.
3️⃣ Clinically it makes sense
This aligns with clinical guidance: progesterone receptor (PR) is part of hormone receptor status (ER/PR), and hormone receptors are important for treatment decisions and prognosis in invasive breast cancer. The American Cancer Society notes these receptors are routinely tested, many breast cancers are receptor-positive, and receptor status meaningfully affects management. That supports why dropping progesterone hurts model quality - this feature carries real clinical signal, not noise.
Source: American Cancer Society - Breast Cancer Hormone Receptor Status
4️⃣ Bottom line
Don't drop this feature if you want the best accuracy. The model is better when this information is included.
📌 Recommendation
Keep - in your model. Accuracy drops when it's removed, so it's important for predicting survival status.
🔴 Best to Drop
-: Accuracy Improves When We Remove It
When we removed this feature, accuracy went up to -. So the model may be better off without it.
With This Feature (Baseline)
| Accuracy | - |
When We Dropped It
| Accuracy | - |
| Change | - |
Why This Feature Is Best to Drop (Easy to Understand)
1️⃣ Accuracy went up when we removed it
The model got more accurate without this feature. That suggests the feature wasn't helping - or was hurting - predictions, e.g. by adding noise or overfitting to this dataset.
2️⃣ SENOMAC Trial
The SENOMAC trial found that skipping more extensive axillary lymph node surgery in patients with one or two positive sentinel nodes led to the same 5-year recurrence-free survival rates (~90%). This suggests that counting how many nodes were examined does not always add the decision-making signal you would expect.
Source: Breastcancer.org - Some People With Early-Stage Breast Cancer Don't Need Axillary Lymph Node Surgery
3️⃣ BOOG 2013-08 Phase III Trial (SABCS 2025)
A 2025 BOOG 2013-08 phase III trial (presented at SABCS 2025) found comparable recurrence and regional recurrence-free survival rates at 5 years whether or not a sentinel lymph node biopsy was performed at all - especially in patients over 50 with hormone receptor-positive, T1 grade 1-2 breast cancer. The absolute difference in recurrence rate was only 0.7%.
4️⃣ Bottom line
Consider building the model without this feature. In this ablation, accuracy was higher when we dropped it, so it may not be needed for predicting survival status.
📌 Recommendation
Consider dropping - from your model. Accuracy improved when we removed it, so it may add noise or redundancy rather than useful signal.
All models shown exclude "Survival Months" as a feature. Initial experiments from V01 homework (Decision Tree 2026) including Survival Months as a feature showed 83.1% accuracy - suspiciously high! Analysis revealed this was data leakage: survival months is correlated with the outcome, making the model useless for real predictions.
The models here use only features available at diagnosis time, making them clinically valid. See the sections below for detailed evidence of why excluding survival months is essential.
Critical data leakage: Using "Survival Months" as a predictor creates a model that looks excellent in testing but is clinically useless.
- Circular logic: Survival Months is directly tied to the outcome. Dead patients have shorter survival by definition - the model is "cheating" by using the answer to predict the answer.
- Not available at diagnosis: When seeing a new patient, you do not know how many months they will survive. Any feature that encodes that future information must be excluded.
- It dominates other features: When included, Survival Months gets ~75% feature importance; real clinical features (tumor size, grade, stage) shrink to a few percent. The model then ignores actual medical indicators.
- This dashboard proves it: With Survival Months properly excluded, accuracy is ~70% and the model relies on diagnosis-time features only - the only setup valid for real-world use.
Rule: Never include variables that contain information from the future or that reveal the outcome you're predicting. Survival Months is the outcome's timeline - including it gives a model that fails in practice.
DATA LEAKAGE: What It Looks Like
This is what happens when you include "Survival Months" as a feature
WITH Survival Months (BAD)
| Accuracy | 83.1% |
| ROC AUC | 0.810 |
| Features Used | 15 |
Cannot be used in real world
WITHOUT Survival Months (GOOD)
| Accuracy | 69.9% |
| ROC AUC | 0.671 |
| Features Used | 14 |
Can be used in clinical practice
🔍 The Smoking Gun: Feature Importance
When "Survival Months" is included, it dominates all other features:
- "Survival Months" has ~0.75 (75%) importance - completely dominates!
- All clinical features (Tumor Size, Grade, Stage) are tiny - only ~0.05 (5%) each.
- The model ignores actual medical indicators and just uses survival duration.
- This is circular logic: using "how long they lived" to predict "if they're alive".
The Bottom Line
~83% accuracy is meaningless because you can't know "Survival Months" until after the patient has already survived that long! (V01 homework run with Survival Months included.)
It's like predicting tomorrow's weather by checking tomorrow's temperature - impossible in practice.
ROC AUC (Area Under the ROC Curve): How well the model separates classes; 1.0 = perfect, 0.5 = random.
Sensitivity (Recall Alive): Of all truly alive patients, how many we correctly predict as alive.
Specificity: Of all truly dead patients, how many we correctly predict as dead.
F1 (Alive): Balance between precision and recall for the "Alive" class.
Deltas show the change when we drop one feature compared to the baseline (all columns).
What the model predicts: Alive or Dead (survival status). “Best drop” = removing that feature gave highest accuracy (keeping it may add noise). “Best keep” = removing that feature gave lowest accuracy - so dropping it makes the model less accurate; keeping that feature makes the model more accurate. So best keep = important to keep.
+0.029 Accuracy went up when we dropped that feature.
+0.018 ROC AUC improved.
-0.058 Accuracy dropped when we removed that feature.
Feature is likely important for the model.
What the data suggests
- Dropping Marital Status or Regional Node Examined improves accuracy - the model may rely less on these or they add noise in this setup.
- Dropping Progesterone Status or 6th Stage hurts accuracy the most - these features appear important for predicting survival.
- ROC AUC can sometimes increase when accuracy drops (e.g. better ranking of predictions but more errors). Always look at both.