2.2. LIMITATIONS OF CURRENT APPROACHES
Vertical Locality Management
(spatio-temporal optimization)
Horizontal Locality Management
(topology optimization)
Sun$Mi
crosy
stem
s$
Figure 2.4:
Vertical data locality concerns the management of data motion up and down the memory
hierarchy whereas horizontal locality concerns communication between peer processing elements.
locality because of their tendency to virtualize data movement. In the future, software-managed memory
and incoherent caches or scratchpad memory may be more prevalent, but we have yet to offer powerful
abstractions for making such software managed memories productive for the typical application programmer.
Thus, application developers need a set of programming abstractions to describe data locality on the new
computing ecosystems.
Programming Abstractions for Data Locality
9
Chapter 3
Motivating Applications and Their Require-
ments
Discussion of data locality from the perspective of applications requires consideration of the range of mod-
eling methods used for exploring scientific phenomena. Even if we restrict ourselves to only a small group
of scientific applications, there is still a big spectrum to be considered. We can loosely map the applica-
tions along two dimensions: spatial connectivity, and componentization as shown in Figure 3.1. Spatial
connectivity has direct implications on locality: the bottom end of this axis represents zero-connectivity
applications which are embarrasingly parallel and at the top end are the ones with dynamic connectivity
such as adaptive meshing, with static meshes falling somewhere in-between. The componentization axis is
concerned with software engineering where the bottom end represents small static codes, while at the top
end are the large-multicomponent codes where the components are swapped in and out of active state, and
there is constant development and debugging. The HPC applications space mostly occupies the top right
quadrant, and was the primary concern in this workshop.
While the application space itself is very large, the participating applications and experts provided a
good representation of the numerical algorithms and techniques used in the majority of state-of-the-art ap-
plication codes (i.e. COSMO[6, 61], GROMACS[42, 77, 74], Hydra & OP2/PyOP2[15, 79], Chombo[25]).
Additionally, because several of them model multi-physics phenomena with several different numerical and
algorithmic technologies, they highlight the challenges of characterizing the behavior of individual solvers
when embedded in a code base with heterogeneous solvers. These applications also demonstrate the im-
portance of interoperability among solvers and libraries. The science domains which rely on multiphysics
modeling include many physical, biological and chemical systems, e.g. climate modeling, combustion, star
formation, cosmology, blood flow, protein folding to name only a few. The numerical algorithms and solver
technologies on which these very diverse fields rely include structured and unstructured mesh based methods,
particle methods, combined particle and mesh methods, molecular dynamics, and many specialized pointwise
(or 0-dimensional) solvers specific to the domain.
Algorithms for scientific computing vary in their degree of arithmetic intensity and inherent potential for
exploiting data locality.
• For example, GROMACS short-ranged non-bonded kernels treat all pairwise interactions within a
group of particles, performing large quantities of floating-point operations on small amounts of heavily-
reused data, normally remaining within lowest-level cache. Exploiting data locality is almost free here,
yet the higher-level operation of constructing and distributing the particle groups to execution sites,
and the lower-level operation of scheduling the re-used data into registers, require tremendous care
and quantities of code in order to benefit from data locality at those levels. The long-ranged global
component can work at a low resolution, such that a judicious decomposition and mapping confines
a fairly small amount of computation to a small subset of processes. This retains reasonable data
locality, which greatly reduces communication cost.
10
3.1. STATE OF THE ART
Interopera)ng components
Sp
a)
al
c
on
ne
c)
vi
ty
Embarrassingly
parallel, single
component
Nearest neighbor,
few components
Dynamic
connec)vity,
single component
Embarrassingly
parallel, mul)-‐
component
Dynamic
connec)vity,
mul)-‐component
The quadrant where
Most HPC codes fall
Figure 3.1: The distribution of applications with respect to data locality challenges
• By contrast, lower-order solvers for partial differential equation have few operations per data item even
with static meshing and therefore struggle with achieving a high degree of data reuse. Adaptivity in
structured AMR (i.e. Chombo) further reduces the ratio of arithmetic operations to data movement
by moving the application right along the connectivity axis. Unstructured meshes have an additional
layer of indirection that exacerbates this problem.
• Multiphysics applications further add a dimension in the data locality challenge; that of the different
data access patterns in different solvers. For example in applications where particles and mesh methods
co-exist, the distribution of particles relative to the mesh needs to balance spatial proximity with load
balance, especially if the particles tend to cluster in some regions.
There is a genuine concern in the applications communities about protecting the investment already made
in the mature production codes of today, and wise utilization of the scarcest of the resources - the developers’
time. Therefore, the time-scale of change in paradigms in the platform architecture, that might require major
rewrites of codes, is perhaps the most important consideration for the applications community. A stable
programming paradigm with a life-cycle that is several times the development cycle of the code must emerge
for sustainable science. The programming paradigm itself can take any of the forms under consideration,
such as domain-specific languages, abstraction libraries or full languages, or some combination of these. The
critical aspects are the longevity, robustness and the reliability of tools available to make the transition.
3.1
State of the Art
Among the domain science communities relying on modeling and simulation to obtain results, there is
huge variation in awareness and preparedness for the ongoing hardware platform architecture revolution.
The applications component of the workshop was focused on computation-based science and engineering
research efforts that rely upon multi-component codes with many moving parts that require HPC resources
to compute.
For such applications, efficient, sustainable and portable scientific software is an absolute
necessity, though not all practitioners in these communities are cognizant of either the extent or the urgency
of the need for rethinking their approach to software. Even those that are fully aware of the challenges
facing them have been hampered in their efforts to find solutions because of a lack of a stable paradigm
Programming Abstractions for Data Locality
11