π§ Brain Foundation Models¶
Neuroimaging foundation models for brain representation learning
This section documents the brain imaging foundation models that extract embeddings from structural MRI (sMRI), functional MRI (fMRI), and other brain imaging modalities for downstream integration with genomic data, behavioral phenotypes, and clinical outcomes.
π Overview¶
All brain FMs documented here:
- Operate on neuroimaging data (volumetric MRI, parcel time series, or raw BOLD signals)
- Support subject-level embeddings via aggregation across spatial regions or temporal windows
- Are pretrained on large multi-site datasets (UK Biobank, HCP, ABCD, etc.)
- Enable cross-modal alignment with genomic and behavioral representations
π― Model Registry¶
| Model | Modality | Architecture | Key Feature | Documentation |
|---|---|---|---|---|
| π§ BrainLM | fMRI | ViT-MAE | Masked autoencoding; site-robust | Walkthrough |
| π§ Brain-JEPA | fMRI | JEPA | Joint-embedding prediction; lower-latency | Walkthrough |
| π§ Brain Harmony | sMRI + fMRI | ViT + TAPE | Multi-modal fusion via TAPE | Walkthrough |
| π§ BrainMT | sMRI + fMRI | Mamba-Transformer | Efficient long-range dependencies | Walkthrough |
| π§ SwiFT | fMRI | Swin Transformer | Hierarchical spatiotemporal modeling | Walkthrough |
π Usage Workflow¶
For fMRI models (BrainLM, Brain-JEPA, SwiFT)¶
- Preprocess rs-fMRI: parcellation (Schaefer/AAL), bandpass filter, motion scrubbing
- Tokenize parcel time series (or 4D volumes for SwiFT)
- Embed via pretrained encoder
- Pool to subject-level representation (mean over tokens/time)
- Project to 512-D for cross-modal alignment
For sMRI models (BrainMT, Brain Harmony)¶
- Run FreeSurfer or FSL FAST for tissue segmentation
- Extract IDPs (cortical thickness, subcortical volumes) or feed raw T1w volumes
- Embed via pretrained encoder
- Pool to subject-level representation
- Project to 512-D for fusion
π Key Considerations¶
Site/scanner harmonization¶
Multi-site pretraining (e.g., BrainLM on UKB+HCP) improves site robustness, but residualize scanner/site effects before fusion:
- Regress site dummy variables from embeddings
- Use ComBat or similar harmonization if needed (see Integration Strategy)
Motion artifacts¶
fMRI embeddings are sensitive to head motion. Quality control:
- Exclude high-motion frames (FD > 0.5 mm)
- Regress mean FD as confound in downstream prediction
- Report motion distributions stratified by diagnosis (e.g., ADHD vs TD)
Multimodal fusion¶
Brain Harmony natively fuses sMRI and fMRI via TAPE (Target-Aware Projection Ensemble). For other models, use late fusion (concatenate embeddings) or two-tower contrastive alignment (see Design Patterns).
π Integration Targets¶
Brain embeddings are integrated with:
- Genetics embeddings (Caduceus, DNABERT-2) for geneβbrain association discovery
- Behavioral phenotypes (cognitive scores, psychiatric diagnoses) via multimodal prediction
- Clinical data (longitudinal assessments, EHR records) for developmental trajectories
Learn more: - Integration Strategy - Fusion protocols - Modality Features: sMRI - sMRI preprocessing - Modality Features: fMRI - fMRI preprocessing
π¦ Source Repositories¶
Click to view all source repositories
All brain FM source code lives in `external_repos/`: | Model | Local Path | GitHub Repository | |-------|------------|-------------------| | BrainLM | `external_repos/brainlm/` | [vandijklab/BrainLM](https://github.com/vandijklab/BrainLM) | | Brain-JEPA | `external_repos/brainjepa/` | [janklees/brainjepa](https://github.com/janklees/brainjepa) | | Brain Harmony | `external_repos/brainharmony/` | [hzlab/Brain-Harmony](https://github.com/hzlab/Brain-Harmony) | | BrainMT | `external_repos/brainmt/` | [arunkumar-kannan/brainmt-fmri](https://github.com/arunkumar-kannan/brainmt-fmri) | | SwiFT | `external_repos/swift/` | [Transconnectome/SwiFT](https://github.com/Transconnectome/SwiFT) | Each model page includes: - β Detailed code walkthrough in `docs/code_walkthroughs/` - β Structured YAML card in `kb/model_cards/` - β Integration recipes and preprocessing specsπ Next Steps¶
- β Validate brain embedding reproducibility across cohorts (UK Biobank, Cha Hospital developmental cohort)
- β Benchmark fMRI encoder stability across different parcellation schemes (Schaefer 100/200/400, AAL)
- π¬ Explore EEG/EPhys foundation models for pediatric/clinical settings (e.g., LaBraM, TBD)
- π¬ Integrate diffusion MRI embeddings for white matter microstructure (exploratory)