SC24 Data Streaming BoF


Real-Time Scientific Data Streaming to HPC Nodes: Challenges and Innovations

SC24 Data Streaming BoF

Tuesday - Nov 19, 2024
12:15pm-1:15pm ET
Room B213

The most common way scientific data arrives at HPC facilities today is through a set of border gateway nodes connected to some form of shared file systems. Dataflow orchestration tools like Globus have made this approach popular by being programmable, easy to use, and efficient. This approach works well when the subsequent compute job for data analysis can afford to wait in the scheduler’s queue, with the overall completion time being mostly dominated by that wait time.

However, as HPC centers and experimental and observational facilities become more integrated, workflows will demand immediate, real-time feedback where the latency and performance variability that comes with a shared file system is no longer acceptable. While a few streaming workflows have emerged recently, many HPC, data, or network user facilities are not set out to support these workflows out-of-gate, for policy, scheduler, or hardware reasons. However, these workflows are the cornerstone of the DOE Integrated Research Infrastructure (IRI) program, as their stringent timing requirements benchmark the ultimate integration of HPC, data, and networks into seamless compute-in-the-loop workflows for experimental or observational user facilities.

This BoF discusses this alternative way of using HPC by opening with a science use case that exemplifies this new class of emerging workflows. Through a set of lightning talks from user facilities, the BoF will survey how HPC centers address this need today and what they have planned for the near future. These presentations aim to seed the ensuing discussion where the audience can ask questions to key staff of HPC facilities or can bring their specific streaming workflow to the attention of the workflow community.

Agenda

  • 12:15pm-12:20pm — Welcome and set up of polling infrastructure

    Bjoern Enders — National Energy Research Scientific Computing Center (NERSC)
    Rafael Ferreira da Silva — Oak Ridge Leadership Computing Facility (OLCF)

  • 12:20pm-12:30pm — Science use case

    Peter Ercius — Lawrence Berkeley National Laboratory (LBNL)
    Sam Welborn — Lawrence Berkeley National Laboratory (LBNL)

  • 12:30pm-12:50pm — Lightning talks: By User Facilities on Streaming Needs and Solutions

    Network Streaming vs File Transfer
    Eli Dart, Energy Sciences Network (ESnet)

    Data Streaming at ALCF: Future Directions and Use Cases
    Cristine Simpson, Argonne Leadership Computing Facility (ALCF)

    The OLCF Data Streaming to HPC Capability
    Michael Brim, Oak Ridge Leadership Computing Facility (OLCF)

    AIsB — High Performance Access to ECMWF Weather & Climate Data
    Alex Upton, Swiss National Supercomputing Centre (CSCS)

  • 12:50pm-1:10pm — Panel discussion and Open Q&A
    Session leaders and presenters will answer questions from the poll and seeded questions
  • 1:10pm-1:15pm — Closing remarks
    Planning of a Full Day Workshop on Streaming at an ASCR User Facility in the Near Future

Supporters