Skip to content

🧠 Brain Foundation Models

Neuroimaging foundation models for brain representation learning

This section documents the brain imaging foundation models that extract embeddings from structural MRI (sMRI), functional MRI (fMRI), and other brain imaging modalities for downstream integration with genomic data, behavioral phenotypes, and clinical outcomes.


πŸ“‹ Overview

All brain FMs documented here:

  • Operate on neuroimaging data (volumetric MRI, parcel time series, or raw BOLD signals)
  • Support subject-level embeddings via aggregation across spatial regions or temporal windows
  • Are pretrained on large multi-site datasets (UK Biobank, HCP, ABCD, etc.)
  • Enable cross-modal alignment with genomic and behavioral representations

🎯 Model Registry

Model Modality Architecture Key Feature Documentation
🧠 BrainLM fMRI ViT-MAE Masked autoencoding; site-robust Walkthrough
🧠 Brain-JEPA fMRI JEPA Joint-embedding prediction; lower-latency Walkthrough
🧠 Brain Harmony sMRI + fMRI ViT + TAPE Multi-modal fusion via TAPE Walkthrough
🧠 BrainMT sMRI + fMRI Mamba-Transformer Efficient long-range dependencies Walkthrough
🧠 SwiFT fMRI Swin Transformer Hierarchical spatiotemporal modeling Walkthrough

πŸ”„ Usage Workflow

For fMRI models (BrainLM, Brain-JEPA, SwiFT)

  1. Preprocess rs-fMRI: parcellation (Schaefer/AAL), bandpass filter, motion scrubbing
  2. Tokenize parcel time series (or 4D volumes for SwiFT)
  3. Embed via pretrained encoder
  4. Pool to subject-level representation (mean over tokens/time)
  5. Project to 512-D for cross-modal alignment

For sMRI models (BrainMT, Brain Harmony)

  1. Run FreeSurfer or FSL FAST for tissue segmentation
  2. Extract IDPs (cortical thickness, subcortical volumes) or feed raw T1w volumes
  3. Embed via pretrained encoder
  4. Pool to subject-level representation
  5. Project to 512-D for fusion

πŸ”‘ Key Considerations

Site/scanner harmonization

Multi-site pretraining (e.g., BrainLM on UKB+HCP) improves site robustness, but residualize scanner/site effects before fusion:

  • Regress site dummy variables from embeddings
  • Use ComBat or similar harmonization if needed (see Integration Strategy)

Motion artifacts

fMRI embeddings are sensitive to head motion. Quality control:

  • Exclude high-motion frames (FD > 0.5 mm)
  • Regress mean FD as confound in downstream prediction
  • Report motion distributions stratified by diagnosis (e.g., ADHD vs TD)

Multimodal fusion

Brain Harmony natively fuses sMRI and fMRI via TAPE (Target-Aware Projection Ensemble). For other models, use late fusion (concatenate embeddings) or two-tower contrastive alignment (see Design Patterns).


πŸ”— Integration Targets

Brain embeddings are integrated with:

  • Genetics embeddings (Caduceus, DNABERT-2) for gene–brain association discovery
  • Behavioral phenotypes (cognitive scores, psychiatric diagnoses) via multimodal prediction
  • Clinical data (longitudinal assessments, EHR records) for developmental trajectories

Learn more: - Integration Strategy - Fusion protocols - Modality Features: sMRI - sMRI preprocessing - Modality Features: fMRI - fMRI preprocessing


πŸ“¦ Source Repositories

Click to view all source repositories All brain FM source code lives in `external_repos/`: | Model | Local Path | GitHub Repository | |-------|------------|-------------------| | BrainLM | `external_repos/brainlm/` | [vandijklab/BrainLM](https://github.com/vandijklab/BrainLM) | | Brain-JEPA | `external_repos/brainjepa/` | [janklees/brainjepa](https://github.com/janklees/brainjepa) | | Brain Harmony | `external_repos/brainharmony/` | [hzlab/Brain-Harmony](https://github.com/hzlab/Brain-Harmony) | | BrainMT | `external_repos/brainmt/` | [arunkumar-kannan/brainmt-fmri](https://github.com/arunkumar-kannan/brainmt-fmri) | | SwiFT | `external_repos/swift/` | [Transconnectome/SwiFT](https://github.com/Transconnectome/SwiFT) | Each model page includes: - βœ… Detailed code walkthrough in `docs/code_walkthroughs/` - βœ… Structured YAML card in `kb/model_cards/` - βœ… Integration recipes and preprocessing specs

πŸš€ Next Steps

  • βœ… Validate brain embedding reproducibility across cohorts (UK Biobank, Cha Hospital developmental cohort)
  • βœ… Benchmark fMRI encoder stability across different parcellation schemes (Schaefer 100/200/400, AAL)
  • πŸ”¬ Explore EEG/EPhys foundation models for pediatric/clinical settings (e.g., LaBraM, TBD)
  • πŸ”¬ Integrate diffusion MRI embeddings for white matter microstructure (exploratory)