3.5. RESEARCH AREAS
• Source-to-source transformations for mapping from high level language to low level language. With
appropriately designed transformation tools that allow for language extensions, it might even become
possible for the science communities to develop their own domain specific language without having to
build the underlying infrastructure.
3.5
Research Areas
Several research areas emerged from the discussions during the workshop that can directly benefit the
applications communities. Some of them involve research to be undertaken by the application developers,
while others are more applicable to the compilers and tools communities. From the applications perspective,
it is important to understand the impact of different locality models. For instance, with the increase in
the heterogeneity of available memory options it is worthwhile evaluating whether scratch-pads are more
useful to the applications, or should the additional technology just be used to deepen the cache hierarchy.
Similarly, within the caching model it is important to know whether adding horizontal caching is a valuable
proposition for the applications. These concerns tie into what could be the most urgent area of research
for the applications community; what should a high level multi-component framework look like in order to
maximize data locality in the presence of diverse and conflicting demands of data movement. The questions
to be addressed include:
• what methodology should be used to determine what constitutes a component,
• what degree of locality awareness is appropriate in a component,
• what is the optimum level of abstraction in the component-based design, i.e. who is aware of spatial
decomposition, and who is aware of functional decomposition, if it exists,
• how to architect various layers in the framework of the code so that numerical complexity does not
interleave with the complexity arising out of locality management, and
• how to account for concerns other than data locality such as runtime management within the framework
so that they do not collide with one another.
Tools for aiding the development of scientific software often either hide all complexity of interaction
with the system from the application, or they leave it entirely to the application. The former try to cover
too many corner cases and become unwieldly, often providing the wrong capability, while the latter all
but eliminate portability. True success in achieving scientific and performance goals of the applications is
more likely to be achieved by co-operation between the application and the programming models/tools. In
an ideal world applications should be able to express locality guidelines best suited to them and the code
translators/compilers/runtimes should be able to translate them into performant executables without facing
too many optimization blockers.
Programming Abstractions for Data Locality
14
Chapter 4
Data Structures and Layout Abstractions
In this chapter, we discuss the key considerations when designing data structure and layout abstractions,
emerging approaches with (im)mature solutions, and potential research areas. We focus on locality man-
agement on data-parallel algorithms and leave the discussion on task-oriented abstractions to Chapter 6.
Our goals are to enable a range of abstractions, to converge on ways to specify these abstractions and
the mapping between them, and to enable these abstractions to be freely and effectively layered without
undue restrictions. The abstractions must be flexible to accommodate the diversity of ways in which the
data structures can be organized and represented for several reasons. First, users are diverse in what data
representations they find to be convenient, intuitive and expressive. Second, there are differences between
what is natural to users and what leads to efficient performance on target architectures. And third, efficient
mapping of data structure organization and representations to a diverse collection of target architectures
must be accommodated in a general, portable framework. A single abstraction is unlikely to span these three
kinds of diversities effectively.
Recently, a number of programming interfaces such as Kokkos [31], TiDA [87], proposed OpenMP4
extensions [26], GridTools [34], hStreams
1
, DASH [35], and Array Extensions have arisen to give developers
more control over data layout and to abstract the data layout itself from the application. One goal of this
workshop was to normalize the terminology used to describe the implementation of each of these libraries,
and to identify areas of commonality and differentiation between the different approaches. Commonalities in
the underlying implementation may pave the way to standardization of an underlying software infrastructure
that could support these numerous emerging programming interfaces.
4.1
Terminology
Before going into further discussions, we define relevant terminology.
Memory Space is an address range of memory with unique memory access characteristics. Examples
include different explicitly addressable levels of the memory hierarchy (scratchpads), NUMA nodes of
different latencies, different coherence domains associated with subsets of CPUs, GPUs or co-processors,
or different types of memories (e.g. cached, software-managed or persistent).
Iteration Space defines the space of indices generated by a loop nest (the space scoped out by the iteration
variables of the loops) irrespective of traversal order. The dimensionality of the iteration space is
typically defined in terms of the number of loop nests (e.g. a N-nested loop defines a N-Dimensional
iteration space). Traversal order indicates the order in which the loop nest visits these indices.
Tiling The purpose of the tiling is to shorten the distance between successive references to the same memory
location, so that it is more probable that the memory word resides in the memory levels near to the
1
https://software.intel.com/en-us/articles/prominent-features-of-the-intel-manycore-platform-software-stack-intel-mpss-
version-34
15