5.1. KEY POINTS
• Directive-based language extensions: tools like OpenMP
1
, OpenACC
2
, OpenStream
3
decorate a host
language, like C++ or Fortran, with annotations. In OpenMP and its derivatives, the objective is
that these “pragmas” can be ignored to yield purely sequential code with the same semantics. The
directive language is separate from the host language. Directives are similar to annotations in Java
and C#, which are user-extensible and often used to drive aspect-oriented transformation tools. Both
directives and annotations share problems with integration with the host language. Directive-based
tools like OpenMP suffer from compositionality issues, for example in calling a parallel function from
within a parallel loop.
• Global-view vs. Local-view Languages: Global-view languages are those in which data structures, such
as multidimensional arrays, are declared and accessed in terms of their global problem size and indices,
as in shared-memory programming. In contrast, local-view languages are those in which such data
structures are accessed in terms of local indices and node IDs.
• Multiresolution Language Philosophy: This is a concept in which programmers can move from language
features that are more declarative, abstract, and higher-level to those that are more imperative, control-
oriented, and low-level, as required by their algorithm or performance goals. The goal of this approach
is to support higher-level abstractions for convenience and productivity without removing the fine-
grained control that HPC programmers often require in practice. Ideally, the high-level features are
implemented in terms of the lower-level ones in a way that permits programmers to supply their own
implementations. Such an approach supports a separation of roles in which computational scientists
can write algorithms at high levels while parallel computing experts can tune the mappings of those
algorithms to the hardware platform(s) in distinct portions of the program text.
5.1
Key Points
During the PADAL workshop, we identified the following key points to be considered when designing lan-
guages for data locality issues.
• Communication and locality should be clearly evident in the source code, so that programmers have a
clear model of data movement and its associated costs. At the same time, the programming language
should make it easy to port flat-memory code to locality-aware code, or to write code that can execute
efficiently on both local and remote data. One mechanism to accomplish this is to encode locality in
the type system, so that modifying the locality characteristics of a piece of code only requires changing
type declarations. In languages that support generic programming, this also enables a programmer to
write the same code for both local and remote data, with the compiler producing efficient translations
for both cases.
• In addition to providing primitives for moving data to where computation is located, a programming
language should also enable a user to move computation to where data are located. This is particularly
important for irregular applications in which the distribution of data is not known until runtime or
changes over the course of the computation.
For large data sets, code movement is likely to be
significantly cheaper than moving data.
• A program should not require rewriting when moving to a different machine architecture. Instead, the
language should provide a machine model that does not have to be hard-coded into an application.
In particular, the machine model should be represented separately from user code, using a runtime
data structure. The language should either automatically map user code to the machine structure at
compile or launch time or provide the user with mechanisms for adapting to the machine structure
during execution.
• A unified machine model should be provided that encompasses all elements of a parallel program,
including placement of execution, load balancing, data distribution, and resilience.
1
http://openmp.org/
2
http://www.openacc-standard.org/
3
http://openstream.info/
Programming Abstractions for Data Locality
25
5.2. STATE OF THE ART
• Seamless composition of algorithms and libraries should be supported by the language; composition
should not require a code rewrite. The machine model can facilitate composition by allowing a subset
of the machine structure to be provided to an algorithm or library.
• The language should provide features at multiple levels of abstraction, following the multiresolution
design philosophy. For example, it may provide data-parallel operations over distributed data struc-
tures, with the compiler and runtime responsible for scheduling and balancing the computation. At
the same time, the language might also allow explicit operations over the local portions of the data
structure. Such a language would be a combination of global and local view, providing default global-
view declarations and operations while also allowing the user to build and access data structures in a
local-view manner.
• Higher-level features in the language and runtime should be built on top of the same lower-level
features that the user has access to. This enables a user to replace the built-in, default operations with
customized mechanisms that are more suitable to the user’s application. The compiler and runtime
should perform optimizations at multiple levels of abstraction, enabling such custom implementations
to reap the advantages of lower-level optimizations.
5.2
State of the Art
HPF and ZPL are two languages from the 1990s that support high-level locality specifications through the
distribution of multidimensional arrays and index sets to rectilinear views of the target processors. Both
can be considered global view languages, and as a result all communication was managed by the compiler
and runtime. A key distinction between the languages was that all communication in ZPL was syntactically
evident, while in HPF it was invisible. While ZPL’s approach made locality simpler for a programmer
to reason about, it also required code to be rewritten whenever a local/non-distributed data structure or
algorithm was converted to a distributed one. HPF’s lack of syntactic communication cues saved it from this
problem, but it fell afoul of others in that it did not provide a clear semantic model for how locality would be
implemented for a given program, requiring programmers to wrestle with a compiler to optimize for locality,
and to then to rewrite their code when moving to a second compiler that took a different approach.
As we consider current and next-generation architectures, we can expect the locality model for a com-
pute node to differ from one vendor or machine generation to the next. For this reason, the ZPL and HPF
approaches are non-viable. To this end, we advocate pursuing languages that make communication syntac-
tically invisible (to avoid ZPL’s pitfalls) while supporting a strong semantic model as a contract between
the compiler and programmer (to avoid HPF’s). Ideally, this model would be reinforced by execution-time
queries to support introspection about the placement of data and tasks on the target architecture.
Chapel is an emerging language that takes this prescribed approach, using a first-class language-level
feature, the locale to represent regions of locality in the target architecture. Programmers can reason about
the placement of data and tasks on the target architecture using Chapel’s semantic model, or via runtime
queries. Chapel follows the Partitioned Global Address Space (PGAS) philosophy, supporting direct access
to variables stored on remote locales based on traditional lexical scoping rules. Chapel also follows the
multiresolution philosophy by supporting low-level mechanisms for placing data or tasks on specific locales,
as well as high-level mechanisms for mapping global-view data structures or parallel loops to the locales.
Advanced users may implement these data distributions and loop decompositions within Chapel itself, and
can even define the model used to describe a machine’s architecture in terms of locales.
X10 [22] is another PGAS language that uses places as analogues to Chapel’s locales. In X10, execution
must be colocated with data. Operating on remote data requires spawning a task at the place that owns the
data. The user can specify that the new task run asynchronously, in which case it can be explicitly synchro-
nized later and any return value accessed through a future. Thus, X10 makes communication explicit in the
form of remote tasks. Hierarchical Place Trees [92] extend X10’s model of places to arbitrary hierarchies,
allowing places to describe every location in a hierarchical machine.
Unified Parallel C (UPC), Co-Array Fortran (CAF), and Titanium [93] are three of the founding PGAS
languages. UPC supports global-view data structures and syntactically-invisible communication while CAF
has local-view data structures and syntactically-evident communication. Titanium has a local-view data
Programming Abstractions for Data Locality
26