5.3. DISCUSSIONS
model built around ZPL-style multidimensional arrays. Its type system distinguishes between data guaran-
teed to be local and data that may be remote using annotations on variable declarations. On the other hand,
access to local and remote data is provided by the same syntax. Thus, Titanium strikes a balance between
the HPF and ZPL approaches, making communication explicit in declarations but allowing the same code
fragments to operate on local and remote data.
UPC, CAF, and Titanium differ from HPF, ZPL, Chapel, and X10 in that programs are written using
an explicit bulk-synchronous Single-Program, Multiple Data (SPMD) execution model. These copies of the
executing binary form the units of locality within these languages, and remote variable instances are refer-
enced based on the symmetric namespaces inherent in the SPMD model. While the flat SPMD models used
in these languages (and commonly in MPI) do allow programmers to colocate data and execution, they will
likely be insufficient for leveraging the hierarchical architectural locality present in emerging architectures.
Currently such models are often forced to be part of hybrid programming models in which distinct locality
abstractions are used to express finer-grained locality concerns.
Recent work in Titanium has replaced the flat SPMD model with the more hierarchical Recursive Single-
Program, Multiple-Data (RSPMD) model [53]. This model groups together data and execution contexts into
teams that are arranged in hierarchical structures, which match the structure of recursive and compositional
algorithms and emerging hierarchical architectures. While the total set of threads is fixed at startup as in
SPMD, hierarchical teams can be created dynamically, and threads can enter and exit teams as necessary.
Titanium provides a mechanism for querying the machine structure at runtime, allowing the same program
to target different platforms by building the appropriate team structure during execution.
Other work has been done to address the limitations of the flat SPMD model in the context of Phalanx
[36] and UPC++ [96], both active libraries for C++. The Phalanx library uses the Hierarchical Single-
Program, Multiple-Data (HSPMD) model, which is a hybrid of SPMD and dynamic tasking. The HSPMD
model retains the cooperative nature of SPMD by allowing thread teams, as in RSPMD, but it allows new
teams of threads to be spawned dynamically. Unlike SPMD and RSPMD, the total set of executing threads
is not fixed at startup. Both RSPMD and HSPMD allow expression of locality and concurrency at multiple
levels, though through slightly different mechanisms, allowing the user to take advantage of hierarchical
architectures. The UPC++ library uses RSPMD as its basic execution model but additionally allows X10-
style asynchronous tasks to be spawned at remote locations. This allows execution to be moved dynamically
to where data are located and adds a further degree of adaptibility to the basic bulk-synchronous SPMD
model.
5.3
Discussions
AGREEMENTS
• PRINCIPLES
– Avoid losing information through premature “lowering” of the program representation. In partic-
ular, many locality-oriented analyses and optimizations are most naturally effective when applied
to multidimensional index spaces and data structures. To that end, languages lacking multidimen-
sional data structures, or compilers that aggressively normalize to 1D representations, undermine
such optimization strategies. Expression of computations in their natural dimensionality and
maintenance of that dimensionality during compilation are key.
– A common theme in many promising language- and library-oriented approaches to locality is
to express distribution- and/or locality-oriented specifications in a program’s variable and type
declarations rather than scattering it throughout the computation.
Since locality is a cross-
cutting concern, this minimizes the amount of code that needs to change when the mapping of
the program’s constructs to the hardware must. The compiler and runtime can then leverage the
locality properties exposed in these declarations to customize and optimize code based on that
information.
– Isolate cross-cutting locality concerns. Locality — data layout and distribution — is fundamen-
tally more difficult than parallelization because it affects all the code that touches the data.
Programming Abstractions for Data Locality
27
5.3. DISCUSSIONS
• SOLUTIONS
– There is a lot of consensus around multidimensional data/partitioning/slicing, and how to iterate
over them (generators, iterators) - parameterisation of layouts.
– There is potential to expose polyhedral concepts to the programmer/IR, and to pass them down
to the backend (eg Chapels domains, mappings)
– The back-end model for the target for code generation and implementation of models/compilers
is missing - for portable implementation, for interoperability
– While we can do a lot with libraries and template metaprogramming, there is a compelling case
for a language-based approach
DISAGREEMENTS
• No consensus on the requirements on intermediate representations and runtime systems
• There was disagreement within the workshop attendees about the extent to which a language’s locality-
specification mechanisms should be explicit (“allocate/run this here”) vs. suggestive (“allocate/run
this somewhere near-ish to here, please”) vs. locality-oblivious or automatic (“I don’t want to worry
about this, someone else [the compiler / the parallel expert] should figure it out for me.”). This
disagreement is arguably an indication that pursuing multiresolution features would be attractive. In
such a model, a programmer could be more or less explicit as the situation warrants; and/or distinct
concerns (algorithm vs. mapping) could be expressed in different parts of the program by programmers
with differing levels of expertise.
• Language-based abstractions are central to tackling locality management, but two fundamentally dif-
ferent design philosophies have emerged that lead to very different implementation strategies – An
explicit ”do what I say” approach that is covered in subsection 5.3.2 and an implicit ”do what I mean”
approach that is described in subsection 5.3.3. More research is required to tease out the benefits and
challenges of implementing each of these approaches in practice because the strategies differ on a very
fundamental level.
• Explicit programmer control over locality is essential, but the expression of locality needs to be portable
across machines. The naive interpretation is that ”explicit control” is synonymous with explicitly
mapping a particular task to specific core. However, in such a case, machine portability would be
compromised. The loose interpretation of ”explicit control” led to misunderstandings and requires
more specific definition. For the purpose of these discussions, ”explicit control” refers primarily to the
interface provided to the performance layer (the “do what I say” layer for expert programmers), and
is not intended to be exposed to the “do what I mean” layer for non-expert programmers.
5.3.1
Multiresolution Tools
The power of a language-based attack on locality is that we can support abstractions that can really help.
A key distinction to be understood and confronted is the tension between tools that control how code is
executed (perhaps with the help of powerful abstractions), and tools that automatically make good choices,
based on information provided by the programmer:
“do what I say”
vs
“do what I mean”
“but tell me only once”
“and let me take it from here”
Vigorous debate on this question led to the conclusion that there is a clear need for both - in fact, a need
for a spectrum of levels of control. A key feature of any automated solution is that it can inter-operate with
explicit control.
Programming Abstractions for Data Locality
28