Adaptive MPI: Providing Dynamic RTS Support for Large-scale MPI Applications

Period of Performance: 02/21/2017 - 02/20/2018


Phase 1 SBIR

Recipient Firm

Charmworks, Inc.
60 Hazelwood Dr Array
Champaign, IL 61820
Firm POC, Principal Investigator


Problem Statement: High Performance Computing technology has advanced significantly, and advanced parallel computers are being built, with DOE leadership. Yet, the American engineering and manufacturing industry has not started leveraging parallel computing at a significant level, partly because of the challenging nature of modern simulation software. US competitiveness in manufacturing may suffer due to its ineffective utilization of parallel computers. Objective: The Message Passing Interface Standard (MPI) specifies the ubiquitous communica- tion libraries used by computational scientists, engineers, and tool vendors to develop applications that can run in parallel across multiple computer nodes. MPI provides high communication perfor- mance with portability from the smallest clusters to the world’s largest supercomputers. However, these parallel applications have common higher-level needs that MPI does not address, such as checkpointing to recover from hardware faults, computational load imbalance, job malleability, and communication locality. The proposed work addresses those concerns in an advanced MPI imple- mentation. Phase I Goals Currently, adoption of this advanced MPI implementation for new projects requires assistance from expert developers. This is reasonable for research projects, but not for a turn-key product that a business must be able to support as its business scales. This project will complete the MPI implementation’s conformance to the most recent MPI standard, to eliminate a major barrier to entry. The project will develop more automated tools to enable users to benefit from the new implementation’s advanced features. It will validate popular MPI libraries to promote customer confidence that libraries they need will be available. Commercial Applications and Other Benefits: The SBIR project effort is aimed at produc- ing a commercially viable implementation of MPI that incorporates innovative research on parallel programming with adaptive run-times. It will be the main commercial application to result directly from this effort. In addition, it is expected that libraries developed using this new MPI implemen- tation product will be an additional category of commercial applications. These may be developed by the grantee or by third parties. End-user applications is expected to be the largest category of commercial applications enabled by this project effort. This project aims to achieve a broad impact on the state of HPC adoption among manufacturing and engineering industry.