Integration Hub¶
Everything in this section supports the phased escalation strategy documented in the Integration Plan (Nov 2025). Use it as the connective tissue between per-modality preprocessing, harmonization, and experiment execution.
Overview¶
This hub provides end-to-end guidance for integrating genetics, brain, and behavioral data using foundation models.
π Key Resources¶
- Integration Strategy β High-level playbook: covariates to regress, projection dims, escalation triggers
- Design Patterns β Late fusion β two-tower β MoT β BOM escalation logic
- Multimodal Architectures β Detailed patterns from BAGEL, MoT, M3FM, Me-LLaMA, TITAN
- Embedding Policies β Naming conventions and PCA dimensionality guidelines
- Benchmarks β Prior benchmark targets to compare against
π¬ Analysis Recipes¶
Copy-ready runbooks for common integration tasks:
- CCA + Permutation β Test gene-brain associations before heavy fusion
- Prediction Baselines β Gene-only vs Brain-only vs Late fusion
- Partial Correlations β Control for covariates with logistic regression
π§¬π§ Modality Features¶
Concrete instructions for extracting and harmonizing features:
- Genomics β Genetics embeddings, RC-equivariance, gene attribution
- sMRI β FreeSurfer ROIs, PCA compression, site harmonization
- fMRI β Functional connectivity, BrainLM/SwiFT embeddings, preprocessing
π¨ Integration Cards¶
Comprehensive multimodal fusion guidance:
- Ensemble Integration β Model stacking, averaging, meta-learning
- Oncology Multimodal Review β Early/intermediate/late fusion taxonomy
- Multimodal FM Patterns β Architectural patterns from state-of-the-art FMs
Quick Start¶
Before running any analysis, grab the relevant strategy IDs and log them with your experiment configs:
# Show sMRI baseline recipe
python scripts/manage_kb.py ops strategy smri_free_surfer_pca512_v1
# Inspect harmonization metadata (e.g., MURD)
python scripts/manage_kb.py ops harmonization murd_t1_t2
# Show rs-fMRI preprocessing stack
python scripts/manage_kb.py ops strategy rsfmri_swift_segments_v1
This keeps downstream reports auditable even when raw datasets (e.g., UKB) cannot be shared.
Integration Phases¶
We follow a phased escalation strategy to avoid premature complexity:
| Phase | Status | Pattern | Trigger | Documentation |
|---|---|---|---|---|
| Phase 1 | β Active | Late Fusion | Baseline | Integration Plan |
| Phase 2 | π§ Prep | Two-Tower Contrastive | CCA p<0.001, ΞAUROC>5% | Integration Plan |
| Phase 3 | β³ Future | Unified Multimodal (MoT/BAGEL/LLM-Bridge) | ΞAUROC>10%, cross-modal reasoning | Integration Plan |
Navigation Guide¶
For Late Fusion Workflows (Phase 1)¶
- Read Integration Strategy
- Pick analysis recipe: CCA, Prediction, or Partial Correlations
- Extract features: Genomics, sMRI, fMRI
- Review Ensemble Integration card for stacking strategies
- Run analysis with logged strategy IDs
For Multimodal Architecture Design (Phase 2+)¶
- Read Design Patterns for escalation logic
- Study Multimodal Architectures for BAGEL/MoT/M3FM/Me-LLaMA/TITAN patterns
- Review Multimodal FM Patterns integration card
- Consult Oncology Multimodal Review for fusion taxonomy
- Check Integration Plan decision table for recommended pattern
For Adding New Integration Strategies¶
- Start from Integration card template
- Review existing cards for structure and style
- Document mechanics, use cases, caveats, and BOM integration
- Add to
models/integrations/directory - Update
mkdocs.ymlnavigation
Key Principles¶
β
Late fusion first β Preserve modality-specific signal under heterogeneous semantics
β
Unimodal baselines β Establish gene-only and brain-only performance before multimodal claims
β
Covariate control β Z-score + residualize vs age/sex/site before interpreting effects
β
Reproducibility β Log embedding strategy IDs, harmonization methods, CV folds
β
Phased escalation β Only escalate when data and compute justify the complexity