Programming abstractions for

Yüklə 0,54 Mb.

Pdf görüntüsü

səhifə	10/23
tarix	24.12.2017
ölçüsü	0,54 Mb.
	#17201

1 ... 6 7 8 9 10 11 12 13 ... 23

4.3. STATE OF THE ART

• Employ standard language features, e.g. in library-based solutions. Languages vary in their support

for library-based data layout abstractions.

– C++ seems to provide an opportunity for enabling data layout abstractions due to its ability

to extend the base language syntax using template metaprogramming. C++ meta-programming

can cover many desired capabilities, e.g. polymorphic data layout and execution policy, where

specialization can be hidden in template implementation and controlled by template parameters.

Although C++ template metaprogramming oﬀers more syntactic elegance for expressing solutions,

the solution is ultimately a library based approach because the code generated by the template

metaprogram is not understood by the baseline compiler and therefore the compiler cannot provide

optimizations that take advantage of the higher-level abstractions implemented by the templates.

The primary opportunity in the C++ approach is the hope that the C++ standards committee

would adopt it as part of the standard. The standards committee is aggressive at adoption, and

it already supports advanced features like lambdas, and the C++ Standard Library. However if

these syntactic extensions are adopted, compiler writers would still need to explicitly target those

templates and make use of the higher-level semantics they represent. At present it is not clear

how well this strategy will work out.

– In contrast to C++, Fortran is relatively limited and inﬂexible in terms of its ability to extend the

syntax, but having multidimensional arrays as ﬁrst class objects gives it an advatage in expressing

data locality. A huge number of applications are implemented in Fortran, and computations

with regular data addressing are common. Library-based approaches to extending locality-aware

constructs into Fortran are able to exploit the explicit support for multidimensional arrays in the

base language. However, these library-based approaches may seem less elegant in Fortran because

of the inability to perform syntactic extensions in the base language

– Dynamically-typed scripting languages like Python, Perl, and Matlab provide lots of ﬂexibility to

users, and enable some forms of metaprogramming, but some of that ﬂexibility can make optimiza-

tion diﬃcult. Approaches to overcome the performance challenges of scripting languages involve

using the integrated language introspection capabilities of these languages (particularly Python)

that enables the scripting system to intercept known motifs in the code and use Just in Time (JIT)

rewriting or specialization. Examples include Copperhead [19] and PyCuda [57], which recognize

data-parallel constructs and rewrites and recompiles them as CUDA code for GPUs. SEJITS and

the ASP frameworks [18] are other examples that use specializers to recognize particular algorith-

mic motifs and invoke specialized code rewriting rules to optimize those constructs. This same

machinery can be used to recognize and rewrite code that uses metaprogramming constructs that

express data locality information.

• Augment base languages with directives or embedded domain-speciﬁc languages (DSLs). Examples

include OpenMP, OpenACC, Threading Building Blocks

and Thrust

Most contributors to this report worked within the conﬁnes of existing language standards, thereby max-

imizing the impact and leveraging market breadth of the supporting tool chain (e.g., compilers, debuggers,

proﬁlers). Wherever proﬁtable, the research plan is to “redeem” existing languages by amending or extending

them, e.g. via changes to the speciﬁcations or by introducing new ABIs.

The interfaces the authors of this report are developing are Kokkos [31], TiDA [87], C++ type support,

OpenMP extensions to support SIMD, GridTools [34], hStreams

), and DASH [35]. The Kokkos library

supports expressing multidimensional arrays in C++, in which the polymorphic layout can be decided at

compile time. An algorithm written with Kokkos uses the AM of C++ with the data speciﬁcation and access

provided by the interface of Kokkos arrays. Locality is managed explicitly by matching the data layout with

the algorithm logical locality.

TiDA allows the programmer to express data locality and layout at the array construction. Under TiDA,

each array is extended with metadata that describes its layout and tiling policy and topological aﬃnity for

https://www.threadingbuildingblocks.org

http://docs.nvidia.com/cuda/thrust

https://software.intel.com/en-us/articles/prominent-features-of-the-intel-manycore-platform-software-stack-intel-mpss-

version-34

Programming Abstractions for Data Locality

4.4. DISCUSSION

an intelligent mapping on cores. This metadata follows the array through the program so that a diﬀerent

conﬁguration in layout or tiling strategy do not require any of the loop nests to be modiﬁed. Various layout

scenarios are supported to enable multidimensional decomposition of data on NUMA and cache coherence

domains. Like Kokkos, the metadata describing the layout of each array is carried throughout the program

and into libraries, thereby oﬀering a pathway towards better library composability.

TiDA is currently

packaged as a Fortran library and is minimally invasive to Fortran codes. It provides a tiling traversal

interface, which can hide complicated loop traversals, parallelization or execution strategies. Extensions are

being considered for the C++ type system to express semantics related to the consistency (varying or

uniform) of values in SIMD lanes. This is potentially complementary to ongoing investigations in how to

introduce new OpenMP-compatible ABIs that deﬁne a scope within which more relaxed language rules

may allow greater layout optimization, e.g. for physical storage layout that’s more amenable to SIMD.

GridTools provides a set of libraries for expressing distributed memory implementations of regular grid

applications, like stencils. It is not meant to be universal, in the sense that non-regular grid applications

should not be expressed using Gridtools libraries, even though possible in principle, for performance rea-

sons. Since the constructs provided by GridTools are high level and semi-functional, locality issues are

taken into account at the level of performance tuners and not by application programmers. At the semantic

level the locality is taken into consideration only implicitly. The hStreams library provides mechanisms

for expressing and implementing data decomposition, distribution, data binding, data layout, data reference

characteristics, and execution policy on heterogeneous platforms. DASH is built on a one-sided communi-

cation substrate and provides a PGAS abstraction in C++ using operator overloading. The DASH AM is

basically a distributed parallel machine with the concept of hierarchical locality.

As can be seen, there is no single way of treating locality concerns, and there is no consensus on which

one is the best. Each of these approaches is appealing in diﬀerent scenarios that depend on the scope of the

particular application domain. There is the opportunity of naturally building higher level interfaces using

lower level ones. For instance, TiDA or DASH multidimensional arrays could be implemented using Kokkos

arrays, and GridTools parallel algorithms could use the DASH library, and Kokkos arrays for storage, etc.

This is a potential beneﬁt from interoperability that arises from using a common language provided with

generic programming capabilities.

Ultimately, the use of lambdas to abstract the iteration space and metadata to carry information about

the abstracted data layouts are common themes across all of these implementations.

This points to a

potential for a lower-level standardization of data structures and APIs that can be used under-the-covers by

all of these APIs (a common abstraction layer that could be used by each library solution). One outcome of

the workshop is to initiate eﬀorts to explicitly deﬁne the requirements for a common runtime infrastructure

that could be used interoperable across these library solutions.

4.4

Discussion

This chapter presents and begins to resolve several key challenges:

• Deﬁning the abstraction layers. The logical layer is where the domain expert provides a semantic

speciﬁcation and oﬀers hints about the program’s execution patterns. The physical layer is where

the performance tuner speciﬁes data and execution policy controls that are designed to provide best

performance on target machines. These layers are illustrated in Figure 4.1.

• Enumerating the data and execution policy controls of interest. These are listed below in this section

and are highlighted in Figure 4.1.

• Suggest some mechanisms which enable a ﬂexible and eﬀective mapping from the logical down to the

physical layer, while maintaining a clean separation of controls and without limiting the freedom of

expression and eﬃciency at each layer. One class of mechanisms is oriented around individual data

objects, e.g. with data types, and another is oriented around control structures, e.g. with ABIs that

enable a relaxation of language rules. The choice between these two orientations is illustrated in Figure

3.1.

Programming Abstractions for Data Locality

Yüklə 0,54 Mb.

Dostları ilə paylaş:

1 ... 6 7 8 9 10 11 12 13 ... 23