An Efficient Architecture for the Implementation of Message Passing Programming Model on Massive Multiprocessor SoC
Résumé
The BlueGene/L supercoputer , with 65,536 dual-processor compute nodes, was designed from the group up to support ef£ceint execution of mas- sively parallel message passing programs. Part of this support is an optimized implementation of MPI that leverages the hardw are features of BlueGene/L. MPI for BlueGene/L is implemented on top of a more basic message-passing infras- tructure called the message layer . This message layer can be used both to im- plement other higher -le vel libraries and directly by applications. MPI and the message layer are used in the two modes of operation of BlueGene/L: copro- cessor mode and virtual node mode. Performance measurements sho w that our message-passing services deli ver performance close to the hardw are limits of the machine. The y also sho w that dedicating one of the processors of a node to com- munication functions (coprocessor mode) greatly impro ves the message-passing bandwidth, whereas running two processes per compute node (virtual node mode) can have a positi ve impact on application performance.