SmartSim
Last updated: 24 Mar 20250.8.0 — Released on: 25 Sep 2024
https://github.com/CrayLabs/SmartSim
SmartSim is a workflow library that makes it easier to use common Machine Learning (ML) libraries, like PyTorch and TensorFlow, in combination with High Performance Computing (HPC) simulations and applications. SmartSim launches ML infrastructure on HPC systems alongside user workloads and supports most HPC workload managers (e.g. Slurm, PBSPro, SGE). SmartSim also provides a set of client libraries in Python, C++, C, and Fortran. These client libraries allow users to send and receive data between user applications and the machine learning infrastructure. Moreover, the client APIs enable the execution of machine learning tasks like inference and online training from within user code. The exchange of data and execution of machine learning tasks is orchestrated by a high performance in-memory database that is launched and managed by SmartSim.
Execution Environment
User Interfaces
- Python API
- Python Client API
- C++ Client API
- C Client API
- Fortran Client API
Resource Managers
- Slurm
- PBSPro
- SGE
- Linux/MacOS
Transfer Protocols
- TCP/IP
- Unix Domain Sockets (UDS)