Programming Abstractions for Data Locality
April 28 – 29, 2014, Swiss National Supercomputing Center (CSCS), Lugano, Switzerland
Co-Chairs
Didem Unat (Ko¸
c University)
Thomas Schulthess (CSCS)
Torsten Hoefler (ETH Z¨
urich)
Anshu Dubey (LBNL)
John Shalf (LBNL)
Workshop Participants/Co-Authors
Adrian Tate (Cray)
Amir Kamil (Lawrence Berkeley National Laboratory)
Anshu Dubey (Lawrence Berkeley National Laboratory)
Armin Gr¨
oßlinger (University of Passau)
Brad Chamberlain (Cray)
Brice Goglin (INRIA)
Carter Edwards (Sandia National Laboratories
1
)
Chris J. Newburn (Intel)
David Padua (UIUC)
Didem Unat (Ko¸
c University)
Emmanuel Jeannot (INRIA)
Frank Hannig (University of Erlangen-Nuremberg)
Gysi Tobias (ETH Z¨
urich)
Hatem Ltaief (KAUST)
James Sexton (IBM)
Jesus Labarta (Barcelona Supercomputing Center)
John Shalf (Lawrence Berkeley National Laboratory)
Karl Fuerlinger (Ludwig-Maximilians-University Munich)
Kathryn O’Brien (IBM)
Leonidas Linardakis (Max Planck Inst. for Meteorology)
Maciej Besta (ETH Z¨
urich)
Marie-Christine Sawley (Intel, Europe)
Mark Abraham (KTH)
Mauro Bianco (CSCS)
Miquel Peric`
as (Chalmers University of Technology)
Naoya Maruyama (RIKEN)
Paul Kelly (Imperial College)
Peter Messmer (Nvidia)
Robert B. Ross (Argone National Laboratory)
Romain Cledat (Intel)
Satoshi Matsuoka (Tokyo Institute of Technology)
Thomas Schulthess (CSCS)
Torsten Hoefler (ETH Z¨
urich)
Vitus Leung (Sandia National Laboratories)
Executive Summary
The goal of the workshop and this report is to identify common themes and standardize concepts for
locality-preserving abstractions for exascale programming models. Current software tools are built on the
premise that computing is the most expensive component, we are rapidly moving to an era that computing is cheap
and massively parallel while data movement dominates energy and performance costs. In order to respond to exascale
systems (the next generation of high performance computing systems), the scientific computing community needs to
refactor their applications to align with the emerging data-centric paradigm. Our applications must be evolved to
express information about data locality. Unfortunately current programming environments offer few ways to do so.
They ignore the incurred cost of communication and simply rely on the hardware cache coherency to virtualize data
movement. With the increasing importance of task-level parallelism on future systems, task models have to support
constructs that express data locality and affinity. At the system level, communication libraries implicitly assume all
the processing elements are equidistant to each other. In order to take advantage of emerging technologies, application
developers need a set of programming abstractions to describe data locality for the new computing ecosystem. The
new programming paradigm should be more data centric and allow to describe how to decompose and how to layout
data in the memory.
Fortunately, there are many emerging concepts such as constructs for tiling, data layout, array views, task and
thread affinity, and topology aware communication libraries for managing data locality. There is an opportunity
to identify commonalities in strategy to enable us to combine the best of these concepts to develop a comprehen-
sive approach to expressing and managing data locality on exascale programming systems. These programming
model abstractions can expose crucial information about data locality to the compiler and runtime system to en-
able performance-portable code. The research question is to identify the right level of abstraction, which includes
techniques that range from template libraries all the way to completely new languages to achieve this goal.
1
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned
subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration
under contract DE-AC04-94AL85000.
Contents
1
Introduction
2
1.1
Workshop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.2
Organization of this Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.3
Summary of Findings and Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.3.1
Motivating Applications and Their Requirements . . . . . . . . . . . . . . . . . . . . .
3
1.3.2
Data Structures and Layout Abstractions . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.3.3
Language and Compiler Support for Data Locality . . . . . . . . . . . . . . . . . . . .
4
1.3.4
Data Locality in Runtimes for Task Models . . . . . . . . . . . . . . . . . . . . . . . .
4
1.3.5
System-Scale Data Locality Management
. . . . . . . . . . . . . . . . . . . . . . . . .
5
2
Background
6
2.1
Hardware Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
2.1.1
The End of Classical Performance Scaling . . . . . . . . . . . . . . . . . . . . . . . . .
6
2.1.2
Data Movement Dominates Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
2.1.3
Increasingly Hierarchical Machine Model . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2.2
Limitations of Current Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
3
Motivating Applications and Their Requirements
10
3.1
State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
3.2
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
3.3
Application requirements
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
3.4
The Wish List
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
3.5
Research Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
4
Data Structures and Layout Abstractions
15
4.1
Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
4.2
Key Points
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
4.3
State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
4.4
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
4.5
Research Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
5
Language and Compiler Support for Data Locality
24
5.1
Key Points
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
5.2
State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
5.3
Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
5.3.1
Multiresolution Tools
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
5.3.2
Partition Data vs Partition Computation . . . . . . . . . . . . . . . . . . . . . . . . .
29
5.3.3
Compositional Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
5.4
Research Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29