Start Here: Researcher Workflow

This page connects the docs into one simple flow:

Domain → Suite → Run → Report → Submit → Leaderboard

flowchart LR
  A["Pick a suite"] --> B["Wrap model"]
  B --> C["Run locally"]
  C --> D["Inspect report.md"]
  C --> E["Submit eval.yaml"]
  E --> F["Leaderboard entry"]

Run a toy benchmark

Use the toy benchmark to verify your wrapper + evaluation pipeline end-to-end.

pip install -e .
python -m fmbench generate-toy-data
python -m fmbench run --suite SUITE-TOY-CLASS --model configs/model_dummy_classifier.yaml --out results/toy_run

Outputs:

results/toy_run/report.md
results/toy_run/eval.yaml

Submit results (the fastest path)

Run locally to produce eval.yaml.
Open a submission issue and attach your eval.yaml (and optionally report.md).

Open a submission issue Submission guide

Scenarios (end-to-end examples)

Suite IDs you can rely on

These suite IDs are guaranteed to exist in this repo (see python -m fmbench list-suites):

Suite ID	What it evaluates	How you run it
`SUITE-TOY-CLASS`	Toy fMRI-like classification (pipeline sanity check)	`python -m fmbench run --suite SUITE-TOY-CLASS ...`
`SUITE-ROBUSTNESS-NEURO`	Neuro robustness probes (dropout/noise/line noise/permutation/shift)	`python -m fmbench run-robustness ...`
`SUITE-GEN-CLASS-001`	Genomics classification suite	`python -m fmbench run --suite SUITE-GEN-CLASS-001 ...`
`SUITE-NEURO-CLASS-001`	Neurology MRI classification suite	`python -m fmbench run --suite SUITE-NEURO-CLASS-001 ...`

Scenario A: Evaluate my fMRI(-like) classifier on toy data

Step 1 — Generate toy neuro data:

python -m fmbench generate-toy-data

Step 2 — Point fmbench at your model (via a YAML config)

# my_model_config.yaml
model_id: my_fmri_model
name: "My fMRI model"
version: "0.1.0"

type: python_class
import_path: "my_model:MyModelWrapper"

Step 3 — Run the suite:

python -m fmbench run --suite SUITE-TOY-CLASS --model my_model_config.yaml --out results/my_fmri_model_toy

Step 4 — Inspect outputs: open report.md, then submit eval.yaml.

Scenario B: Test robustness on neuro time-series (`SUITE-ROBUSTNESS-NEURO`)

python -m fmbench run-robustness \
  --model my_model_config.yaml \
  --data toy_data/neuro/robustness \
  --out results/my_fmri_model_robustness \
  --probes dropout,noise,line_noise,permutation,shift

The resulting eval.yaml includes robustness metrics like:

dropout_rAUC, noise_rAUC, line_noise_rAUC
perm_equivariance
shift_rAUC

Scenario C: Sanity-check a genomics model (`SUITE-GEN-CLASS-001`)

python -m fmbench generate-toy-data
python -m fmbench run --suite SUITE-GEN-CLASS-001 --model my_model_config.yaml --out results/my_genomics_toy

If you don’t see the suite IDs above in list-suites, you’re likely not running from the repo root. This should work:

python -m fmbench list-suites

Scenario D (optional): Neurology fMRI classification (`SUITE-NEURO-CLASS-001`)

This suite evaluates fMRI foundation models on classification tasks using toy data.

python -m fmbench run --suite SUITE-NEURO-CLASS-001 --model my_model_config.yaml --out results/my_fmri_run

For full benchmarking, point to your own institutional data (UK Biobank, HCP, etc.).