Programming abstractions for

Yüklə 0,54 Mb.

Pdf görüntüsü

səhifə	11/23
tarix	24.12.2017
ölçüsü	0,54 Mb.
	#17201

1 ... 7 8 9 10 11 12 13 14 ... 23

Implementation

4.4. DISCUSSION

Logical

Physical

Separation of Concerns

Semantic

Specification

Performant

Implementation

rie

n

1. Define and

discover mapping

2. Circumvent limitations

posed by each application

domain

3. Choose between

functional and object

orientation

ect

rie

Key challenges:

4. Expose opportunity and

promote productivity

- Semantic Control

- Descriptive Annotation

by Domain Experts

- Performance Control

- Execution Policy

by Performance Programmers

Figure 4.1:

Separation of concerns in data structure abstractions

Separation of concerns. High performance computing presents scientists and performance tuners with

two key challenges: exposing parallelism and eﬀectively harvesting that parallelism. A natural separation of

concerns arises from those two eﬀorts, whereby the scope of eﬀort for each of domain experts and performance

tuning experts can be limited, and the two may be decoupled without overly restricting each other. Charting

a solid path forward toward a clean separation of concerns, deﬁning appropriate levels of abstraction, and

highlighting properties of language interfaces that will be eﬀective at managing data abstractions are the

subject of this eﬀort.

Domain experts specify the work to be accomplished. They want the freedom to use a representation of

data that is natural to them. Most of them prefer not to be forced to specify performance-related details.

Performance tuning experts want full control over performance, without having to become domain experts.

So they want domain experts to fully expose opportunities for parallelism, without over-specifying how that

parallelism is to be harvested. This leads to a natural separation of concerns between a logical abstraction

layer, at which semantics are speciﬁed - the upper box in Figure 3.1, and a lower, physical abstraction layer,

at which performance is tuned and the harvesting of parallelism on a particular target machine is controlled

- the lower box in the ﬁgure. This separation of concerns allows code modiﬁcations to be localized, at each

of the semantic and performance control layers. The use of abstraction allows high-level expressions to be

polymorphic across a variety of low-level trade-oﬀs at the physical implementation layer. Several interfaces

are now emerging that maintain the discipline of this separation of concerns, and that oﬀer alternative ways

of mapping between the logical and physical abstraction layers.

Performance-related controls pertain to 1) data and to 2) execution policy.

1. Data controls may be used to manage:

• Decomposition, which tends to be either trivial (parameterizable and automatic; perhaps along

multiple dimensions) or not (explicit, and perhaps hierarchical)

• Mechanisms for, and timing of, distribution to data space/locality bindings

• Data layout, the arrangement of data according to the addressing scheme, mapping logical ab-

stractions to arrangement in physical storage

• Binding, storage to memories that support a particular access pattern (e.g. read only, stream-

ing), to a phase-dependent depth in the memory hierarchy (prefetching, marking non-temporal),

to memory structures which support diﬀerent kinds of sharing (software-managed or hardware-

managed cache), or to certain kinds of near storage (e.g. scalar or SIMD registers)

Programming Abstractions for Data Locality

4.4. DISCUSSION

2. Execution policy controls may be used to manage

• Decomposition of work, e.g. iterations, nested iterations, hierarchical tasks

• Ordering of work, e.g. recursive subdivision, work stealing

• Association of work with data, e.g. moving work to data, binding to hierarchical domains like a

node, an OpenMP place, a thread or a SIMD lane

Mechanisms.

These controls may be applied at diﬀerent scopes and granularities through a variety

of mechanisms:

• Data types - these are ﬁne-grained, applying to one variable or parameter at a time; they apply across

the lexical scope of the variable

• Function or construct modiﬁers - instead of applying to individual variables, these apply a policy to

everything in a function or control construct

• Environmental variable controls - these global policies apply across a whole program

Note that through modularity, the scope to which data types and function or construct modiﬁers may be

reﬁned. For example, with many functions, a given variable’s data type may vary by call site.

These controls may be applied at diﬀerent scopes and granularities through a variety of mechanisms. The

interactions and interdependencies of data controls, execution controls, and the management of granularity

or scope can get quite complex. We use SIMD as a motivating example to illustrate some of the issues.

• The order among dimensions in a multi-dimensional array obviously impacts which dimension’s ele-

ments are contiguous. This, in turn, can aﬀect which dimension is best to vectorize with a unit stride, to

maximize memory access eﬃciency. Relevant compiler transformations may include loop interchange,

data re-layout, and the selection of which loop nesting level to vectorize. The dimension order for an

array can be speciﬁed with a data type, or an new ABI at a function boundary can be used to create

a smaller scope within which indicators of the physcial layout cannot escape, such that the compiler

can be left free to re-layout data within that scope.

• A complicating factor for this data layout is that the best layout may vary within a scope. For example,

one loop nest may perform better with one dimensional ordering, or one AoSoA (array of structures of

arrays) arrangement, while the best performance for the next loop nest may favor a diﬀerent layout.

This can lead to a complex set of trade-oﬀs that are not optimally solved with greedy schemes. It

is possible to isolate the diﬀerent nests in diﬀerent functions, and to use an ABI to hide where data

relayout occurs.

• The number of elements that can be packed into a ﬁxed-width SIMD register may depend on the

precision of each element, for some target architectures. Consider a loop iteration that contains a mix

of double- and single-precision operands, where vlen single-precision operations may be packed into

a single SIMD instruction, whereas since only vlen/2 double-precision operations may be packed into

a single instruction, two instructions are required. Matching the number of available operations may

require unrolling the loop by an additional factor of 2. For situations such as this, a hard encoding

of the elements in each SIMD register using types may inhibit the compiler’s freedom to make tuning

trade-oﬀs.

We have many areas of agreement. Leaving the programmer free from being forced to specify performance

controls makes them more productive, and leaving the tuner free to add controls without introducing bugs

through inadvertent changes to semantics makes them more productive. If the tuner is more productive,

and if the tuner has a clear set of priorities of which controls to apply and the ability to apply them

eﬀectively, they can achieve better performance more quickly. Finally, isolating the performance tuning

controls and presenting them in a fashion which allows target-speciﬁc implementations to be applied easily

make performance more performance portable.

Programming Abstractions for Data Locality

Yüklə 0,54 Mb.

Dostları ilə paylaş:

1 ... 7 8 9 10 11 12 13 14 ... 23