6
Data Locality in Runtimes for Task Models
31
6.1
Key Points
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
6.1.1
Concerns for Task-Based Programming Model
. . . . . . . . . . . . . . . . . . . . . .
32
a) Runtime Scheduling Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
b) Task Granularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
6.1.2
Expressing Data Locality with Task-Based systems . . . . . . . . . . . . . . . . . . . .
33
6.2
State of the art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
6.3
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
6.4
Research Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
6.4.1
Performance of task-based runtimes
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
6.4.2
Debugging tools
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
6.4.3
Hint framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
7
System-Scale Data Locality Management
38
7.1
Key points
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
7.2
State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
7.3
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
7.4
Research Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
8
Conclusion
42
8.1
Priorities
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
8.2
Research Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
8.3
Next Steps
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
References
44
Programming Abstractions for Data Locality
1
Chapter 1
Introduction
With the end of classical technology scaling, the clock rates of high-end server chips are no longer increasing,
and all future gains in performance must be derived from explicit parallelism [58, 84]. Industry has adapted
by moving from exponentially increasing clock rates to exponentially increasing parallelism in an effort
to continue improving computational capability at historical rates [10, 11, 1]. By the end of this decade,
the number of cores on a leading-edge HPC chip is expected to be on the order of thousands, suggesting
that programs have to introduce 100x more parallelism on a chip than today. Furthermore, the energy
cost of data movement is rapidly becoming a dominant factor because the energy cost for computation is
improving at a faster rate than the energy cost of moving data on-chip [59]. By 2018, further improvements
to compute efficiency will be hidden by the energy required to move data to the computational cores on
a chip. Whereas current programming environments are built on the premise that computing is the most
expensive component, HPC is rapidly moving to an era where computing is cheap and ubiquitous and data
movement dominates energy costs. These developments overturn basic assumptions about programming
and portend a move from a computation-centric paradigm for programming computer systems to a more
data-centric paradigm. Current compute-centric programming models fundamentally assume an abstract
machine model where processing elements within a node are equidistant. Data-centric models, on the other
hand, provide programming abstractions that describe how the data is laid out on the system and apply
the computation to the data where it resides (in-situ). Therefore, programming abstractions for massive
concurrency and data locality are required to make future systems more usable.
In order to align with emerging exascale hardware constraints, the scientific computing community will
need to refactor their applications to adopt this emerging data-centric paradigm, but modern programming
environments offer few abstractions for managing data locality.
Absent these facilities, the application
programmers and algorithm developers must manually manage data locality using ad-hoc techniques such
as loop-blocking. It is untenable for applications programmers to continue along our current path because
of the labor intensive nature and lack of automation for these transformations offered by existing compiler
and runtime systems.
There is a critical need for abstractions for expressing data locality so that the
programming environment can automatically adapt to optimize data movement for the underlying physical
layout of the machine.
1.1
Workshop
Fortunately, there are a number of emerging concepts for managing data locality that address this critical
need. In order to organize this research community create an identity as an emerging research field, a
two day workshop was held at CSCS on April 28-29 to bring researchers from around the world to discuss
their technologies and research directions. The purpose of the Workshop on Programming Abstractions
for Data Locality (PADAL) was to identify common themes and standardize concepts for locality-preserving
abstractions for exascale programming models (http://www.padalworkshop.org). This report is a compilation
of the workshop findings organized so that they can be shared with the rest of the HPC community to define
the scope of this field of research, identify emerging opportunities, and promote a roadmap for future research
investments in emerging data-centric programming environments.
2