Foundation Models Catalog
This page provides an overview of the foundation models evaluated in this benchmark hub. Each model is defined by a YAML configuration file in the models/ directory.
🧠 Neurology / Brain Imaging Models
BrainLM
Model ID: brainlm
Modality: fMRI (Brain functional imaging)
Architecture: ViT-MAE + Nystromformer encoder/decoder
Parameters: 111M / 650M
Repository: github.com/vandijklab/BrainLM
Masked autoencoding language model for fMRI voxel time-series. Uses ViT-MAE scaffolding with custom BrainLM embeddings that mix voxel coordinates and patched time windows to learn denoised cortical dynamics.
BrainJEPA
Model ID: brainjepa
Modality: fMRI, EEG
Repository: Brain-JEPA Repository
Joint-Embedding Predictive Architecture for brain signals. Self-supervised learning approach that learns representations by predicting latent representations rather than raw pixels/signals.
BrainMT
Model ID: brainmt
Modality: Multi-modal brain imaging
Repository: BrainMT Repository
Multi-modal brain transformer for integrating structural and functional brain imaging data.
BrainHarmony
Model ID: brainharmony
Modality: Multi-site neuroimaging
Repository: BrainHarmony Repository
Harmonization framework for multi-site neuroimaging studies, addressing scanner and acquisition protocol variability.
SwiFT
Model ID: swift
Modality: fMRI (4D volumes / time series)
Repository: github.com/Transconnectome/SwiFT
Swin 4D fMRI Transformer for learning representations from spatiotemporal fMRI sequences.
NeuroClips
Model ID: neuroclips
Modality: fMRI → video reconstruction
Repository: github.com/gongzix/NeuroClips
Framework for fMRI-to-video reconstruction; included for neuro representation work (not genomics).
🧬 Genomics / Single-Cell Models
Geneformer
Model ID: MOD-GENEFORMER
Modality: scRNA-seq (Single-cell transcriptomics)
Repository: huggingface.co/ctheodoris/Geneformer
Transformer model pretrained on 30 million single cell transcriptomes. Learns context-aware gene embeddings for cell type annotation, gene regulatory network inference, and therapeutic target discovery.
Caduceus
Model ID: caduceus
Modality: DNA sequences
Repository: Caduceus Repository
Long-range DNA sequence model using efficient attention mechanisms for genomic variant interpretation.
DNABERT-2
Model ID: dnabert2
Modality: DNA sequences
Repository: DNABERT-2 Repository
BERT-based model for DNA sequence understanding, supporting tasks like promoter prediction, splice site detection, and variant effect prediction.
Evo2
Model ID: evo2
Modality: DNA/RNA sequences
Repository: Evo2 Repository
Evolution-inspired foundation model for sequence analysis and generation.
HyenaDNA
Model ID: hyenadna
Modality: Long DNA sequences
Repository: HyenaDNA Repository
Efficient long-range sequence model using Hyena operators for genomic analysis at scale.
LLM Semantic Bridge
Model ID: llm_semantic_bridge
Modality: Multi-modal semantic alignment
Repository: Semantic Bridge Development
Model for bridging semantic representations across different medical data modalities.
📝 Model Configuration Format
Each model is defined in a YAML file with the following structure:
model_id: unique_identifier
name: Human-Readable Name
modality: primary_modality
upstream_repo: https://github.com/org/repo
notes: Description of the model architecture and capabilities
arch: Architecture details (optional)
params: Parameter count (optional)
🎯 Adding Your Model
To add your foundation model to the benchmark:
- Create a model configuration YAML in
models/ - Implement the model interface (see
fmbench/models.py) - Run the benchmark suite(s) relevant to your model's modality
- Submit results for leaderboard inclusion
See the contributing guide for more details.
📊 Model Performance
For detailed performance metrics and rankings, see the Leaderboards.