Merlin

Merlin

Enabling Machine Learning HPC Workflows

Last updated: 13 Jun 2025     |     Release: Version 1.13.0b1

The Merlin workflow framework targets large-scale scientific machine learning (ML) workflows in High Performance Computing (HPC) environments. Merlin is a producer-consumer workflow model that enables multi-machine, cross-batch job, dynamically allocated yet persistent workflows capable of utilizing surge-compute resources. Key features are a flexible and intuitive HPC-centric interface, low per-task overhead, multi-tiered fault recovery, and a hierarchical sampling algorithm that allows for highly scalable task execution and queuing to ensembles of millions of tasks.

Terminology
Terminology below follows the definitions established by the Workflows Community Terminology.

Characteristics
Flow
Task
Iterative
Granularity
Sub-workflows
Coupling
Loose
Domain
Agnostic
Composition
Description
Ad-hoc Schema
Abstraction
Intermediate
Modularity
Hierarchical
Orchestration
Planning
Static
Execution
Runner
Data Management
Transport
File-based
Storage
Shared
Distributed
Replicated
Metadata Capture
Anomaly Detection
Monitoring
Provenance
Extensions
Cloud-native Support