Half Day Tutorials


Title : Advanced MPI: New Features of MPI-3
Presenter : Torsten Hoefler, ETH, Switzerland

The new series of MPI standards (MPI-3.0 and MPI-2.2) react to several developments in HPC systems and applications that happened during the last decade. The modernized standard adds several key-concepts to deal with programming massively parallel modern hardware systems. In this tutorial, we will cover the three major concepts: (1) nonblocking collectives and flexible communicator creation, (2) greatly improved remote memory access (RMA) programming, and (3) topology mapping to improve locality and neighborhood ("build your own") collective operations. Nonblocking collectives enable to write applications that are resilient to small time variations (noise), overlap communication and computation, and enable new complex communication protocols. The new remote memory access semantics allow to efficiently exploit modern computing systems that offer RDMA but require a new way of thinking and developing applications. Topology mapping allows to specify the application's communication requirements and enables the MPI implementation to optimize the process-to-node mapping. Last but not least, neighborhood collectives form a powerful mechanism where programmers can specify their own collective operation and allow the MPI implementation to apply additional optimizations.

Content Level
Introductory: 25%, Intermediate: 50%, Advanced: 25%

Audience Prerequisites
We generally assume a basic familiarity with MPI, i.e., attendees should be able to write and execute simple MPI programs. We also assume familiarity with general HPC concepts (i.e., a simple understanding of batch systems, communication and computation tradeoffs, and networks).

Targeted Audience

Everybody interested in the newest developments in distributed-memory programming, especially:
  • practitioners who use MPI to parallelize or optimize their codes
  • students who strive to understand advanced parallel programming
  • researchers working on distributed-memory programming systems (e.g., MPI)
  • teachers who teach basic and advanced MPI concepts
  • support staff who advise users and developers
  • vendors and OEMs to develop faster MPI implementations and codes


Title : Practical Hybrid Parallel Application Performance Engineering
Presenters : Marc-Andre Hermanns,
RWTH Aachen University, JARA-HPC

Allen Malony, University of Oregon
Matthias Weber, TU Dresden, Germany

This tutorial presents state-of-the-art performance tools for leading-edge HPC systems founded on the Score-P community-developed instrumentation and measurement infrastructure, demonstrating how they can be used for performance engineering of effective scientific applications based on standard MPI, OpenMP, hybrid combination of both, and increasingly common usage of accelerators. Parallel performance tools from the Virtual Institute - High Productivity Supercomputing (VI-HPS) are introduced and featured in demonstrations with Scalasca, Vampir, Periscope and TAU. We present the complete workflow of performance engineering, including instrumentation, measurement (profiling and tracing, timing and PAPI hardware counters), data storage, analysis, and visualization. Emphasis is placed on how tools are used in combination for identifying performance problems and investigating optimization alternatives. The knowledge gained in this tutorial will help participants to locate and diagnose performance bottlenecks in their own parallel programs.

Content Level
Introductory: 50 %, Intermediate: 35 %, Advanced: 15 %

Audience Prerequisites
The level of the presentations and particularly the hands-on exercises requires a general understanding of HPC applications and parallel programming with MPI and/or OpenMP. Familiarity with any form of mixed-mode parallel programming is advantageous but not necessary.

Targeted Audience

  • Application developers, striving for best application performance on HPC systems
  • HPC support staff who assist application developers with performance tuning
  • System managers and administrators, responsible for operational aspects of HPC systems and concerned about usability and scalability of optimization tools
  • Computer system manufacturers interested in state-of-the-art software tools
  • Others interested in programming tool environments and application tuning


Title : Resilient applications using MPI-level constructs
Presenters : George Bosilca,
Innovative Computing Laboratory - University of Tennessee
Aurelien Bouteiller,
Innovative Computing Laboratory - University of Tennessee

As supercomputers are entering an era of massive parallelism, the frequency of failures, or the costs incurred to prevent such failures to impact applications, is expected to grow significantly. Unlike more traditional fault management methods, user-­level fault-­tolerance techniques have the potential to prevent full-­scale application restart and therefore lower the cost incurred for each failure, but they demand from the communication middleware the capability to detect and notify failures and resume communications afterward. In the context of MPI, the Fault Tolerance Working Group has been working on providing extensions to MPI that allow communication capabilities to be restored, while maintaining the extreme level of performance to which MPI users are accustomed. This lead to the design of User Level Failure Mitigation (ULFM), a minimal extension of the MPI specification that aims at providing the users with the basic building blocks and tools to construct higher-­level abstractions and introduce resilience to their applications.

In this tutorial, we propose to present a holistic approach to fault tolerance, by introducing multiple fault management techniques, while maintaining the focus on ULFM. The comprehensive presentation of ULFM will use a large set of recovery constructs and apply them to a variety of applications to make them resilient. We will consider the software infrastructures and practical techniques that allows designing and deploying production fault tolerant applications. We will engage participants in implementing a range of common fault tolerant application patterns, starting by considering the simplest case, master-­workers applications, and then growing in difficulty toward complex applications. We will introduce a small example of linear algebra based application and demonstrate, by example, how to transform it into a resilient application, starting from employing Checkpoint/Restart and then a mixture of application specific techniques. We will then summarize by comparing the effective performance of these techniques. This tutorial is targeted toward users with advanced MPI skills, and it does not require any mathematical knowledge as all examples are self contained. However, novice MPI users are welcome to test their understanding in the context of more advanced MPI materials.

Content Level (ex. 25% Introductory, 50% Intermediate, 25% Advanced)
25% Introductory, 75% Intermediate

Attendee Requirements
Some of the exercises will require computer access, thus a laptop will be necessary. Also, a version of ULFM will be necessary during the tutorial. Attendees can install their own version from (http://fault-tolerance.org), or can use the VirtualBox image we will provide (in which case Virtual Box is required).

Audience Prerequisites
A decent understanding of MPI concepts, mainly point-to-point and collective communications. Other necessary background will be provided during the tutorial.

Targeted Audience
All attendees who are curious about the current status of resilience in MPI and/or interested in the expected promise of MPI-enabled fault-tolerant approaches for scientific applications.