WfBG: Workflow Benchmarking Group

  4 members  Establishing Join   


In this working group, we seek to:

  • Define a shared vocabulary of performance metrics of interest for diverse workflow domains
  • Design, maintain and share benchmarking suites of real-world workflows to help evaluate those metrics
  • Develop community agreement on features that such benchmarks should possess
  • Collaboratively maintain a catalogue of state-of-art implementations of these real-world workflows for various workflow languages/frameworks.
  • Define a reproducible and agnostic methodology to collect and report benchmark results


We do NOT aim to do any of the following:

  • Test workflows using minimized data sets that are not representative of actual workflow uses
  • Define which performance metrics are essential for the workflow research domain. Instead, we want to define several benchmarking suites to evaluate different metrics
  • Determine the universally best tool for running workflows. Instead, we want to help users compare the performances of the different Workflow Managment Systems for their specific needs

Why is benchmarking important?

The intrinsic generality of the workflow paradigm makes it a powerful abstraction for designing complex applications and executing them on large-scale distributed infrastructures, such as HPC centres, Grid environments, and Cloud providers. However, such generality becomes an obstacle when evaluating workflow implementations or Workflow Management Systems (WMSs), as no consistent and commonly agreed key performance metrics exist in the state-of-art computer science literature.

Instead, different application domains tend to privilege different aspects of the workflow execution process when designing their ideal workflow system. For example, minimising the control-plane overhead is fundamental when running compute-intensive workflows with billions of fine-grained tasks, while for data-intensive workflows with few giant steps overlapping computation and communication is far more prominent.

Consequently, different workflow systems excel in handling different kinds of workflows. Still, the lack of community consensus on workflow benchmarking suites represents a massive obstacle for domain experts trying to compare WMSs based on their needs. Indeed, a direct and fair comparison is possible only by running multiple state-of-art implementations of the same application on the same execution environment.

Defining several benchmarking suites to evaluate different metrics of interest would represent a crucial improvement for the workflow research community. Still, benchmarks have no value without building community consensus around them. Conversely, history tells us that highly recognised benchmarks can tremendously impact research communities, fostering a positive continuous improvement process for years. For example, think about the role of HPLinpack in the High-Performance Computing community or the ongoing efforts around mastering the training of Deep Neural Networks (DNNs) on the ImageNet dataset.

Events and talks /

Potential performance reporting formats:

  1. Workflow Trace Archive
  2. Workflow Run RO-Crate
  3. WfFormat


Articles below are published as Open Access, or with green open access preprints where gold open access is not possible. Please let us know if you are unable to access any of these publications. To add to this list, please suggest a change.

Benchmark suites

Elliott Slaughter, Wei Wu, Yuankun Fu, Legend Brandenburg, Nicolai Garcia, Wilhem Kautz, Emily Marx, Kaleb S. Morris, Qinglei Cao, George Bosilca, Seema Mirchandaney, Wonchan Lee, Sean Treichler, Patrick S. McCormick, Alex Aiken (2020): Task bench: a parameterized benchmark for evaluating parallel runtime performance International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 62, pp. 1-15 (arXiv:1908.05790)

E. Larsonneur, J. Mercier, N. Wiart, E. L. Floch, O. Delhomme and V. Meyer (2018): Evaluating Workflow Management Systems: A Bioinformatics Use Case 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2773-2775

Patterns for scientific workflows

Tainã Coleman, Henri Casanova, Rafael Ferreira da Silva (2021): WfChef: Automated Generation of Accurate Scientific Workflow Generators 17th IEEE EScience Conference, pp. 159–168 (arXiv:2105.00129)

Daniel S. Katz, Andre Merzky, Zhao Zhang, Shantenu Jha (2016): Application skeletons: Construction and use in eScience Future Generation Computer Systems, 59, pp. 114-124

Daniel Garijo, Pinar Alper, Khalid Belhajjame, Óscar Corcho, Yolanda Gil, Carole A. Goble (2014): Common motifs in scientific workflows: An empirical analysis Future Generation Computer Systems, 36, pp. 338-351

Sara Migliorini, Mauro Gambini, Marcello La Rosa, Arthur H.M. ter Hofstede (2011): Pattern-Based Evaluation of Scientific Workflow Management Systems Unpublished

Ustun Yildiz, Adnene Guabtni, Anne H. H. Ngu (2009): Towards scientific workflow patterns 4th Workshop on Workflows in Support of Large-Scale Science (WORKS)

Robert Stevens, Carole A. Goble, Patricia G. Baker, Andy Brass (2001): A classification of tasks in bioinformatics Bioinformatics, 17:1, pp. 180-188

Performance metrics formats

Rafael Ferreira da Silva (2021): wfcommons/wfformat Zenodo

L. Versluis, Roland Mathá, Sacheendra Talluri, Tim Hegeman, Radu Prodan, Ewa Deelman, Alexandru Iosup (2020): The Workflow Trace Archive: Open-Access Data From Public and Private Computing Infrastructures IEEE Transactions on Parallel and Distributed Systems, 31:9, pp. 2170-2184 (arXiv:1906.07471)

Tazro Ohta, Tomoya Tanjo, Osamu Ogasawara (2019): Accumulating computational resource usage of genomic data analysis workflow to optimize cloud computing instance selection. GigaScience, 8:4, giz052 (bioRxiv:456756)

Workflow benchmarking tools

Tainã Coleman, Henri Casanova, Loïc Pottier, Manav Kaushik, Ewa Deelman, Rafael Ferreira da Silva (2022): WfCommons: A framework for enabling scientific workflow research and development Future Generation Computer Systems, 128, pp. 16-27 (arXiv:2105.14352)

Salvador Capella-Gutierrez, Diana de la Iglesia, Juergen Haas, Analia Lourenco, José María Fernández, Dmitry Repchevsky, Christophe Dessimoz, Torsten Schwede, Cedric Notredame, Josep Ll Gelpi, Alfonso Valencia (2017): Lessons Learned: Recommendations for Establishing Critical Periodic Scientific Benchmarking bioRxiv:181677