SwiFT¶
Overview¶
Type: Spatiotemporal foundation model for fMRI
Architecture: Swin Transformer (hierarchical windows)
Modality: Functional MRI (4D volumes)
Primary use: Direct 4D volume encoding without explicit parcellation
Purpose & Design Philosophy¶
SwiFT (Swin Transformer for fMRI Time series) applies hierarchical windowed attention to 4D fMRI volumes, eliminating the need for explicit parcellation while capturing spatiotemporal patterns across multiple scales. The model processes raw BOLD signals through cascaded Swin blocks, enabling direct learning from volumetric data.
Key innovation: Sequence-free 4D modeling with hierarchical attention windows preserves fine-grained spatial structure while capturing temporal dynamics.
Architecture Highlights¶
- Backbone: 4D Swin Transformer with shifted windows
- Input: Raw BOLD volumes (X × Y × Z × T)
- Windowing: Hierarchical 4D patches with local/global attention
- No parcellation: Learns spatial structure end-to-end
- Output: Subject-level embeddings via global pooling or CLS token
Integration Strategy¶
For Neuro-Omics KB¶
Embedding recipe: rsfmri_swift_segments_v1
- Process 4D volumes through Swin blocks (typically 20-frame segments)
- Extract final layer representations
- Pool across spatial-temporal dimensions → subject vector
- Project to 512-D for cross-modal alignment
- Residualize: age, sex, site, mean FD
Fusion targets: - Gene-brain associations: When fine-grained spatial patterns matter - Atlasing-free analysis: Avoid parcellation scheme dependence - Multi-resolution modeling: Capture both local and global brain dynamics
For ARPA-H Brain-Omics Models¶
SwiFT's hierarchical 4D processing offers advantages for Brain-Omics systems: - No parcellation bias → better cross-site generalization - Multi-scale attention aligns with hierarchical biological organization - 4D paradigm extensible to other volumetric time series (perfusion imaging, DCE-MRI) - Can serve as blueprint for spatiotemporal EEG source reconstruction
Embedding Extraction Workflow¶
# 1. Preprocess fMRI → motion correction, normalization (no parcellation)
# 2. Segment into overlapping 4D windows (e.g., 20-frame chunks)
# 3. Load pretrained SwiFT checkpoint
# 4. Forward pass through Swin blocks
# 5. Extract global representation (CLS token or spatial average)
# 6. Aggregate across segments → subject embedding
# 7. Log: window_size, stride, preprocessing_pipeline_id
Strengths & Limitations¶
Strengths¶
- No parcellation required: Learns spatial structure end-to-end
- Multi-scale processing: Hierarchical windows capture local and global patterns
- Strong performance: Reported competitive results vs. parcellation-based methods
- Parcellation-agnostic: No bias from atlas choice (Schaefer vs. AAL vs. Gordon)
Limitations¶
- Computational cost: 4D convolutions and windowed attention memory-intensive
- Longer training: Hierarchical architecture requires more epochs to converge
- Preprocessing critical: Motion and spatial normalization quality directly impact performance
- GPU memory: Full 4D volumes with fine temporal resolution may exceed typical GPU limits
When to Use SwiFT¶
✅ Use when: - Want to avoid parcellation scheme dependence - Need fine-grained spatial analysis (subcortical structures, small nuclei) - Have sufficient compute for 4D volume processing - Exploring multi-resolution spatiotemporal patterns
⚠️ Consider alternatives: - BrainLM/Brain-JEPA: If parcellation acceptable and want faster baselines - BrainMT: For longer temporal contexts with lower memory footprint - Brain Harmony: Multi-modal sMRI+fMRI fusion with TAPE
Reference Materials¶
Knowledge Base Resources¶
Curated materials in this KB:
- Paper Summary (PDF Notes): SwiFT (2023)
- Paper card (YAML): kb/paper_cards/swift_2023.yaml
- Code walkthrough: SwiFT walkthrough
- Model card (YAML): kb/model_cards/swift.yaml
Integration recipes: - Modality Features: fMRI - Integration Strategy - Preprocessing Pipelines
Original Sources¶
Source code repositories:
- Local copy: external_repos/swift/
- Official GitHub: Transconnectome/SwiFT
Original paper: - Title: "SwiFT: Swin 4D fMRI Transformer" - Authors: Kim, Peter Yongho; Kwon, Junbeom; Joo, Sunghwan; Bae, Sangyoon; Lee, Donggyu; Jung, Yoonho; Yoo, Shinjae; Cha, Jiook; Moon, Taesup - Published: NeurIPS 2023 - Link: arXiv:2307.05916 - DOI: 10.48550/arXiv.2307.05916 - PDF Notes: swift_2023.pdf
Next Steps in Our Pipeline¶
- Parcellation comparison: SwiFT vs. BrainLM (Schaefer-400) on same UKB cognitive tasks
- Memory profiling: Document GPU requirements across different volume resolutions
- Preprocessing sensitivity: Test robustness to motion correction/spatial normalization choices
- Gene-brain fusion: Evaluate whether 4D embeddings improve genetics alignment
- Developmental adaptation: Assess performance on pediatric datasets with smaller brain volumes
Engineering Notes¶
- Segment long scans into overlapping windows to fit GPU memory
- Log window size, stride, and overlap for reproducibility
- Spatial normalization quality critical — consider using MURD/ComBat preprocessing
- When comparing to parcellation-based models, ensure fair preprocessing parity