Some of the material in is restricted to members of the community. By logging in, you may be able to gain additional access to certain collections or items. If you have questions about access or logging in, please use the form on the Contact Page.
Clusters of Symmetric Multiprocessing (SMP) nodes with multi-core Chip Multiprocessors (CMP), also known as SMP-CMP clusters, are ubiquitous today. Message Passing Interface (MPI) is the de facto standard for developing message passing applications for such clusters. Most modern SMP-CMP clusters support Remote Direct Memory Access (RDMA), which allows for flexible and efficient communication schemes but introduces a new model that can be challenging to exploit. This dissertation research explores leveraging the flexibility provided by RDMA to optimize MPI point-to-point communications for both small and medium to large messages on SMP-CMP clusters. For small messages, a scheme is devised that improves the buffer memory management in the existing RDMA-based small message channel design; and a novel shared small message channel design is developed that reduces both individual channel resource requirements as well as the number of channels needed by an MPI application, greatly improving small message channel resource utilization and scalability without adding significant overheads or sacrificing the performance benefits of RDMA. MPI medium and large messages are realized by various rendezvous protocols whose performance is very sensitive to the timing of the critical events in the communication (protocol invocation scenarios). As such, existing MPI implementations that use a fixed protocol across all communications suffer from various performance problems such as unnecessary synchronization and communication progress issues. In my research, I explore the idea of protocol customization that allows different protocols to be used for different situations. First, a repository of protocols that can collectively provide near-optimal performance for all protocol invocation scenarios is developed. This repository provides the foundation for profile-driven and compiler-assisted protocol customization for performance improvement. Furthermore, a communication system with dynamic protocol selection is developed that integrates four protocols into a single system and is able to choose an optimized protocol to suit the run-time characteristics of a particular communication while fully supporting MPI semantics. These techniques reduce unnecessary synchronizations, decrease the number of control messages that are in the critical path of communications, and improve the communication progress, which results in a significantly better communication-computation overlap capability.