Collective calls in MPI applications allow all processes within the same communicator to collaborate with each other, while missing to call in any process may lead to unexpected behavior such as deadlock. This thesis presents a method to detect unbalance MPI collective calls among all processes. The main idea is to track the possible execution paths in the control flow graph. If two paths on different processes have different calling histories on the MPI collective calls, the corresponding function will be reported to be incorrect.
We have built a tool called Libra on Linux to implement the detection. Libra has been evaluated using three real world applications, with a real world bug and three injected bugs. The experimental results show that Libra can correctly detect the bug cases and pinpoint the root causes without reporting any false positives.