Up-scaling Python functions for HPC with executorlib

Up-scaling Python functions for HPC with executorlib

Jan Janssen (Max Planck Institute for Sustainable Materials)

January 14, 2026
11:00-11:30 PST / 14:00-14:30 EST / 20:00-20:30 CEST

The up-scaling of Python workflows from the execution on a local workstation to the parallel execution on an HPC typically faces three challenges: (1) the management of inter-process communication, (2) the data storage and (3) the management of task dependencies during the execution. These challenges commonly lead to a rewrite of major parts of the reference serial Python workflow to improve computational efficiency. Executorlib addresses these challenges by extending Python's ProcessPoolExecutor interface to distribute Python functions on HPC systems. It interfaces with the job scheduler directly without the need for a database or daemon process, leading to seamless up-scaling.

The presentation introduces the challenge of up-scaling Python workflows. It highlights how executorlib extends the ProcessPoolExecutor interface of the Python standard library to provide the user with a familiar interface, while the executorlib backend directly connects to the HPC job scheduler to distribute Python functions either from the login node to individual compute nodes or within an HPC allocation of a number of compute nodes, which is enabled by supporting both file-based and socket-based communication.

The setup of executorlib on different HPC systems is introduced, based on the current support for the SLURM job scheduler as well as the Flux framework to enable hierarchical scheduling within large HPC job allocations as commonly used on Exascale computers. Application examples are then given to demonstrate how executorlib supports the assignment of computational resources like CPU cores, number of threads and GPU resources on a per-function basis, including support for MPI, which drastically simplifies the process of up-scaling Python workflows.

In this context, the focus of this presentation is the user journey during the up-scaling of a Python workflow and how features like caching or the integrated debugging capabilities for the distributed execution of Python functions accelerate the development cycle. The presentation concludes by returning to challenges identified as part of DOE Exascale Computing Project's EXAALT effort to demonstrate how the development process was drastically simplified by using executorlib, with a specific focus on dynamic dependencies which are only resolved during run time of the Python workflow.

Register

About the Authors

Jan Janssen

Jan Janssen
Group Leader for Materials Informatics

Jan Janssen is the group leader for Materials Informatics at the Max Planck Institute for Sustainable Materials. His group focuses on applying methods from computer science including machine learning to discover novel sustainable materials with applications ranging from machine-learned interatomic potentials to large language model agents for atomistic simulation. Previously, Jan was a director’s postdoctoral fellow in the T-division at Los Alamos National Laboratory as part of the Exascale Computing Project as well as an invited postdoctoral fellow at the University of Chicago and the University of California Los Angeles. Besides his research work, Jan is the lead developer of the pyiron atomistic simulation suite, maintains over 1000 open-source materials informatics software packages for the conda-forge community and is a regular contributor to open-source software on Github.