fMRI Data Specifications

Overview

Functional MRI (fMRI) measures brain activity by detecting changes in blood oxygenation (BOLD signal). This page defines the standardized data formats and preprocessing requirements for fMRI data in this benchmark hub.

Data Format Requirements

1. Raw fMRI Time Series

Format: NIfTI (.nii or .nii.gz) or NumPy array
Shape: (n_samples, n_timepoints, n_voxels) or (n_samples, n_voxels, n_timepoints)
Data Type: float32 or float64

import numpy as np
import nibabel as nib

# Load NIfTI file
img = nib.load('subject_001_bold.nii.gz')
data = img.get_fdata()  # Shape: (x, y, z, time)

# Reshape to 2D: (time, voxels)
n_timepoints = data.shape[-1]
timeseries = data.reshape(-1, n_timepoints).T  # (time, voxels)

2. Preprocessed Time Series (Recommended)

Minimum preprocessing steps: 1. ✅ Motion correction 2. ✅ Slice timing correction 3. ✅ Spatial normalization to standard space (MNI152) 4. ✅ Nuisance regression (motion parameters, CSF, white matter) 5. ✅ Bandpass filtering (0.01 - 0.1 Hz typical for resting-state)

Optional: - Spatial smoothing (6-8mm FWHM) - Global signal regression (controversial) - Denoising (ICA-AROMA, CompCor)

3. Parcellated/ROI Time Series

Format: NumPy array or CSV
Shape: (n_samples, n_timepoints, n_regions)

# Example: 400 parcels (Schaefer atlas), 200 timepoints
roi_timeseries = np.load('subject_timeseries.npy')  
# Shape: (1, 200, 400)

Recommended atlases: - Schaefer 2018 (100-1000 parcels, 7 or 17 networks) - AAL3 (170 regions) - Gordon (333 parcels) - Harvard-Oxford (cortical + subcortical) - Glasser MMP (360 parcels)

4. Connectivity Matrices

Format: NumPy array
Shape: (n_samples, n_regions, n_regions) or (n_samples, n_features) (vectorized)

from nilearn.connectome import ConnectivityMeasure

# Compute functional connectivity
conn_measure = ConnectivityMeasure(kind='correlation')
connectivity = conn_measure.fit_transform([roi_timeseries])
# Shape: (1, n_regions, n_regions)

# Vectorize upper triangle (for ML)
from sklearn.utils import check_array
import numpy as np

def vectorize_connectivity(conn_mat):
    """Extract upper triangle as feature vector."""
    n_regions = conn_mat.shape[0]
    triu_idx = np.triu_indices(n_regions, k=1)
    return conn_mat[triu_idx]

features = vectorize_connectivity(connectivity[0])
# Shape: (n_regions * (n_regions-1) / 2,)

Metadata Requirements

Each fMRI dataset should include metadata:

dataset_id: ukb_fmri_tensor
name: UK Biobank fMRI Tensors
modality: fmri
task: resting_state  # or 'task_based'
n_subjects: 40000
n_timepoints: 490
n_voxels: 91282  # or n_regions if parcellated
tr: 0.735  # Repetition time in seconds
preprocessing: fmriprep_20.2.0
atlas: Schaefer2018_400  # if parcellated
bandpass: [0.01, 0.1]  # Hz
smoothing: 6mm  # FWHM, or null
standard_space: MNI152NLin6Asym

Quality Control Metrics

1. Motion Parameters

Framewise Displacement (FD): Measure of head motion

def framewise_displacement(motion_params):
    """
    Calculate framewise displacement (Power et al. 2012).

    Args:
        motion_params: (n_timepoints, 6) - 3 translations + 3 rotations

    Returns:
        fd: (n_timepoints-1,) - framewise displacement
    """
    # Translations in mm
    trans = motion_params[:, :3]

    # Rotations converted to mm (50mm sphere radius)
    rot = motion_params[:, 3:] * 50  # radians to mm

    # Absolute derivatives
    dtrans = np.abs(np.diff(trans, axis=0))
    drot = np.abs(np.diff(rot, axis=0))

    # Sum
    fd = np.sum(dtrans, axis=1) + np.sum(drot, axis=1)

    return fd

# Recommended threshold: FD < 0.5mm for resting-state
mean_fd = fd.mean()
print(f"Mean FD: {mean_fd:.3f} mm")

# Exclude high-motion timepoints (scrubbing)
low_motion_frames = fd < 0.5
clean_timeseries = timeseries[low_motion_frames]

2. Temporal SNR (tSNR)

def temporal_snr(timeseries):
    """
    Temporal signal-to-noise ratio.

    Args:
        timeseries: (n_timepoints, n_voxels)

    Returns:
        tsnr: (n_voxels,) - temporal SNR per voxel
    """
    mean_signal = timeseries.mean(axis=0)
    std_signal = timeseries.std(axis=0)

    tsnr = mean_signal / (std_signal + 1e-8)

    return tsnr

tsnr = temporal_snr(timeseries)
print(f"Mean tSNR: {tsnr.mean():.2f}")

# Typical values: 50-100 for 3T, higher for 7T

3. Data Completeness

# Check for missing data
n_nan = np.isnan(timeseries).sum()
completeness = 1 - (n_nan / timeseries.size)
print(f"Data completeness: {completeness*100:.1f}%")

# Minimum recommended: 95% completeness
assert completeness > 0.95, "Too much missing data"

Data Augmentation for Robustness Testing

See our robustness testing documentation for details on perturbations:

from fmbench.robustness import (
    ChannelDropout,
    GaussianNoise,
    LineNoise,
    TemporalShift
)

# Example: Add Gaussian noise
noise_probe = GaussianNoise(snr_db=10)
noisy_data = noise_probe.apply(timeseries)

Example Data Loading

From NIfTI

import nibabel as nib
from nilearn.maskers import NiftiMasker

# Load functional image
func_img = nib.load('subject_001_bold.nii.gz')

# Apply brain mask and extract timeseries
masker = NiftiMasker(
    standardize=True,
    detrend=True,
    low_pass=0.1,
    high_pass=0.01,
    t_r=2.0
)
timeseries = masker.fit_transform(func_img)
# Shape: (n_timepoints, n_voxels)

From Parcellated CSV

import pandas as pd

# Load ROI timeseries
df = pd.read_csv('subject_001_schaefer400.csv')
# Columns: timepoint, region_1, region_2, ..., region_400

timeseries = df.iloc[:, 1:].values  # Exclude timepoint column
# Shape: (n_timepoints, 400)

From HCP-style Data

# HCP stores timeseries as CIFTI (cortical surface + subcortical)
from nibabel import cifti2

cifti_img = cifti2.load('subject_REST_LR.dtseries.nii')
timeseries = cifti_img.get_fdata()
# Shape: (n_timepoints, 91282) - 91k grayordinates

Benchmark Tasks

1. Classification

Typical tasks: - Disease vs. Control (AD, ADHD, ASD, schizophrenia) - Cognitive state classification - Task decoding

Input: (n_samples, n_timepoints, n_regions) or connectivity matrices
Output: Class labels (n_samples,)

2. Reconstruction

Typical tasks: - Masked autoencoding (predict masked timepoints/regions) - Denoising - Super-resolution (spatial or temporal)

Input: Masked/noisy timeseries
Output: Clean timeseries

3. Regression

Typical tasks: - Cognitive score prediction - Age prediction - Symptom severity prediction

Input: Timeseries
Output: Continuous values (n_samples,)

ITU AI4H Alignment

This specification aligns with:

DEL10.8 Section 3.1: Input data specifications for neurology
DEL3 Section 4.2: Data format requirements
DEL0.1: Standardized terminology (BOLD, fMRI, parcellation)

Tools & Libraries

Preprocessing

fMRIPrep: Robust preprocessing pipeline
CONN Toolbox: Connectivity preprocessing
DPARSF: Data Processing Assistant for Resting-State fMRI

Analysis

Nilearn: Machine learning for neuroimaging
NiBabel: Read/write neuroimaging formats
BrainIAK: Brain Imaging Analysis Kit

Parcellation

Schaefer2018: nilearn.datasets.fetch_atlas_schaefer_2018()
AAL3: nilearn.datasets.fetch_atlas_aal()

References

Esteban, O., et al. (2019). fMRIPrep: a robust preprocessing pipeline for fMRI data. Nature Methods, 16(1), 111-116.
Power, J. D., et al. (2012). Spurious but systematic correlations in functional connectivity MRI. NeuroImage, 59(3), 2142-2154.
Schaefer, A., et al. (2018). Local-Global Parcellation of the Human Cerebral Cortex. Cerebral Cortex, 28(9), 3095-3114.