fMRI Features¶
Reference preprocessing stacks¶
- Cataloged in
kb/integration_cards/rsfmri_preprocessing_pipelines.yaml. Default:hcp_like_minimal(motion + distortion correction, nuisance regression, 0.01–0.1 Hz filter, Schaefer-400 parcellation). - Document which pipeline ID you used per run; experiments reference it alongside the embedding strategy ID.
Subject-level embedding strategies¶
rsfmri_swift_segments_v1(SwiFT):- 20-frame segments → mean-pool tokens from the last hidden state → mean over segments per run → mean over runs per subject.
- Fold-wise z-score + residualize(age, sex, site/scanner, mean FD, DVARS) → PCA→512.
- Source references:
docs/code_walkthroughs/swift_walkthrough.md. rsfmri_brainlm_segments_v1(BrainLM ViT-MAE):- 32-frame windows with stride 16 → CLS pooling per window → attention pooling across windows with inverse-FD weights → mean over runs.
- Fold-wise z-score + residualize(age, sex, site, FD, DVARS) → PCA→512.
rsfmri_brainjepa_roi_v1(Brain-JEPA):- ROI tokens (Schaefer-400 + Tian-50) → mean pooling across unmasked tokens → option to average per functional network → subject mean.
- Residualize age/sex/site/motion (optionally GSR flag) → PCA→512.
rsfmri_brainmt_segments_v1(BrainMT):- 32-frame conv/Mamba segments → mean patch tokens per segment → run mean → subject mean.
- Residualize age/sex/site/motion → PCA→512.
All recipes live in kb/integration_cards/embedding_strategies.yaml; call python scripts/manage_kb.py ops strategy <id> to print the full pipeline (including preprocessing notes and sources) before launching extraction.
Classical atlas baseline (optional)¶
- For quick sanity checks, you can still run atlas-based functional connectivity:
- Schaefer-400 time courses → Pearson FC matrix → Fisher z → vectorize upper triangle → z-score/residualize (include motion/site covariates) → PCA to 100–256 → optionally pad to 512.
- Track this by creating its own
embedding_strategiesentry if you plan to use it beyond ad-hoc QA.
ROI mean baseline (UKB; HCP MMP1 180)¶
If you want non‑FM fMRI features (for CCA comparisons across modalities and gene FMs), the cleanest baseline is ROI-wise time‑mean features:
- Input: UKB ROI time series
.npyper subject (HCP MMP1 parcellation variants observed: 180 or 360) - Transform: take time mean → ROI feature vector
- Output:
fmri_X_180.npywith alignedfmri_eids_180.npy(one row per subject)
180 vs 360 handling (to avoid dropping subjects):
- Prefer native 180‑d when available.
- If only 360‑d exists: reshape (360,) → (2,180) and average halves → 180 (controlled fallback).
This baseline is intentionally simple, linearly readable, and pairs well with CCA/SCCA when you want “MRI preprocessed data” rather than brain FM embeddings.
Symlink note (lab server)
- A symlink is a “shortcut file” that points to the real build output path.
- Check where a symlink points with:
ls -l /storage/bigdata/UKB/fMRI/fmri_X_180.npy /storage/bigdata/UKB/fMRI/fmri_eids_180.npy