Covalent: Workflow Orchestration on Highly Heterogeneous Infrastructure

Workflow orchestration is a common operations framework in the field of distributed computing. As high performance computing (HPC) becomes more distributed and incorporates a greater degree of heterogeneous technologies, workflow orchestration will no doubt become a common practice in this space, as it has in adjacent spaces such as Machine Learning Operations (MLOps) and more recently in quantum computing. Users with repeatable sets of interdependent tasks (workflows) may wish to run them on a time-based schedule or, more commonly in research and development, they may wish to iterate on the design of one or more experimental workflows. This iteration can include different algorithms, different model hyperparameters, different compute backends, or even different compute frameworks entirely. With these options, it is increasingly possible to construct and optimize workflows which span across multiple on-premises supercomputers, multiple clouds, and even emerging technologies such as quantum computers.

While so many options are a delight to workflow architects, they pose real challenges for operations. How are credentials managed and conveyed across such systems? How do researchers ensure workflow data pipelines are efficient and secure? How are logical tasks broken apart and rejoined (“packed”) to optimize information transfer across physically distant systems? When something fails, how easy is it to trace back errors across frameworks and to re-run portions of a workflow? There are certainly many other such challenges that workflow practitioners are familiar with.

Covalent
Covalent unifies tasks written in different languages and with heterogeneous compute requirements.

Covalent is Agnostiq’s answer to these challenges. With roots in quantum computing applications, Agnostiq researchers quickly learned that the operational challenges in quantum research today consume the majority of a project’s time. Covalent was developed primarily to manage experiments and facilitate access to cloud compute resources for researchers with little or no cloud engineering background. Today, users can interact with a large variety of cloud resources – including Batch, Kubernetes, and Braket Hybrid Jobs – using Covalent’s AWS Plugins package.

The release of a Slurm plugin further enabled our team to interact with partners’ supercomputing resources, in particular in federated or hybrid-cloud configurations. Such flexibility is increasingly in demand as scalability begins to hit a power-consumption barrier, and as more HPC vendors and users begin to incorporate cloud and other distributed compute platforms. As we realized these tools can benefit more than our organization, we decided to transform Covalent into a product in its own right, and to open-source the software on GitHub in January 2022.

Covalent
Covalent’s user interface enables users to easily visualize and interact with distributed workflows.

Quantum computing, of course, introduces its own set of unique needs. Firmly rooted as a heterogeneous technology from the outset, quantum computers today mainly run Noisy Intermediate-Scale Quantum (NISQ) algorithms which are variational, or hybrid, in nature. These algorithms require classical and quantum resources working in tandem. Whereas traditional HPC workflows often involve loosely coupled tasks spanning many hours or days in a single location, variational quantum workflows are tightly coupled, with timescales sometimes under a single second per task. Since classical and quantum resources may not necessarily be physically located in the same data center, most of the execution time may in fact arise due to data transfer and requeuing – often thousands of times per workflow. A remedy to this requires collaboration among many parties in order to optimize the user experience and maximize hardware occupancy. With Covalent at the center of this conversation, Agnostiq hopes to stimulate a new discussion around how we think about workflows in highly heterogeneous environments.

Users and providers interested in learning more about Covalent can view the source code on GitHub, join the Covalent Slack channel, follow us on Twitter, or reach out to the team for more information at contact@agnostiq.ai. We also welcome open-source contributions and discussions on GitHub. If you like what we are up to, please show your support by starring our repository.