Analysis Recipe: Prediction Baselines (LR + GBDT)¶
Goal¶
Estimate classification performance for each modality (genes, sMRI, fMRI) and their late-fusion concatenation using strong, interpretable baselines with rigorous evaluation.
Inputs¶
- Fold-specific train/test splits shared across modalities.
- Residualized, z-scored, 512-D embeddings per modality (plus covariates for logging).
- Binary or multi-class clinical targets (e.g., MDD, eoMDD vs loMDD).
Models¶
Logistic Regression¶
- Solver:
sagaorlbfgs. - Penalty: L2; tune
Cin{0.01, 0.1, 1, 10}. class_weight='balanced'.- Max iter ≥ 5,000; tolerance
1e-4.
LightGBM¶
num_leaves=31,max_depth=-1,learning_rate=0.05.n_estimators=1000with early stopping (patience 50) on validation splits.feature_fraction=0.9,bagging_fraction=0.8,bagging_freq=5.scale_pos_weight = N_neg / N_posif not using class weights.
CatBoost (optional)¶
depth=6-8,learning_rate=0.05,iterations=2000,l2_leaf_reg=3.loss_function='Logloss',eval_metric='AUC',auto_class_weights='Balanced'.
Procedure¶
- Fit models on train split per modality; tune hyperparameters via inner CV or validation fold.
- Evaluate on held-out fold; store probabilities and logits.
- Repeat for concatenated late-fusion features (stacked 512-D per modality).
- Record metrics: AUROC, AUPRC, accuracy, calibration (Brier/ECE optional).
- Compare modalities: run DeLong test (or bootstrap) on AUROC differences; report mean ± SD across folds.
- For LOGO analyses, compute ΔAUC per fold and apply Wilcoxon + FDR.
Logging¶
- Persist per-fold predictions in
artifacts/generated/predictions/<experiment_id>/. - Store trained hyperparameters, random seeds, and config YAML snapshot.
- Capture ROC/PR curves and metric tables; include class prevalence.
Reporting Template¶
- Table with AUROC/AUPRC (mean ± SD) for Gene, Brain, Fusion.
- DeLong/Bootstrap ΔAUROC with 95% CI and p-value.
- Calibration plot or summary if clinically relevant.
- Notes on covariates, residualization, and projector settings.
References¶
- Ensemble Integration paper (late fusion motivation).
- Oncology multimodal review (evaluation discipline).