Rise of the PreExascale machines: Heterogeneous hardware, programming and applications challenges
The deployment of the first two pre-exascale supercomputers in the Department of Energy's arsenal is staring this fall. They are both in the 100-200PFlop/s category and are the first machines to allow fine-grained sharing of memory between CPUs and GPUs within a node. We will present the important characteristics of the hardware and system software. We will concentrate on the various programming approaches available to users, our experience with them and some of the lessons we have learned already working with early access applications. We will specifically review the portability challenge in the context of directive-based programming of CPUs and GPUs using the OpenMP4.5 standard. The changing landscape of scientific computing architecture has a direct impact on data structures and algorithms, choices about memory and data management and in general understanding the performance aspect of any computational method. The days of being content to measure computational complexity in O(f(n)) flops are gone and even accounting for data motion is not enough. We'd like to engage the community in a discussion of these new challenges.