Regent: a high-Productivity Programming Language for hpc with Logical Regions

Yüklə 254,9 Kb.

Pdf görüntüsü

səhifə	3/8
tarix	23.02.2018
ölçüsü	254,9 Kb.
	#27490

1 2 3 4 5 6 7 8

either allow parallel execution or require that the tasks be

serialized in the order they were issued.

Besides managing the tasks, the Legion runtime also man-

ages the regions.

Tasks are written using logical regions,

which simply name collections of objects. During execution

each logical region may correspond to any number of physi-

cal instances, which are actual allocated copies of the data.

Separating the logical and physical levels allows important

patterns, such as having multiple copies of read-only data,

to be expressed directly.

The Legion abstractions allow programmers to write eﬃ-

cient task-based programs that run out-of-order, asynchro-

nously, and in a distributed fashion. However, because Le-

gion is embedded in C++, which does not understand the

semantics of tasks and regions, the Legion API is forced to

expose functionality beyond the logical layer of the program-

ming model. Programmers must generally write Legion pro-

grams with some awareness of both the logical and physical

levels.

Regent exposes only the logical level.

The lower-level,

physical details of the Legion execution model are hidden

from the programmer.

For a naive implementation, this

would be disastrous for performance.

However, with the

support of static analysis and compiler optimizations, Re-

gent is able to close the gap between logical and physical

constructs and provide a seamless abstraction to the pro-

grammer.

In the rest of this section we present the Regent and Le-

gion programming models in more detail, examine the dif-

ferences between the two, and demonstrate how a Regent

compiler is able to translate between them. In Section 4 we

consider the optimizations performed by the compiler that

enable this translation to be eﬃcient.

3.1

Tasks

Tasks are the fundamental unit of control in both Regent

and Legion. Tasks are issued in program order, exactly as

they are written in the text, and every possible program ex-

ecution is guaranteed to be indistinguishable from serial ex-

ecution. As discussed in Section 2, tasks specify the regions

they use and their permissions for those regions (whether the

task performs reads, writes, or reductions to each region).

In addition, tasks declare which ﬁelds of the objects in the

region the task accesses. Together, the declaration that spe-

ciﬁc ﬁelds of a region are accessed with certain privileges is

called a region requirement.

Whenever two tasks are non-interfering, accessing either

disjoint regions, diﬀerent ﬁelds of the same region, or the

same ﬁelds with compatible permissions (e.g., both tasks

only read the ﬁeld or only perform the same reduction to the

ﬁeld), Regent allows those tasks to run in parallel. Wherever

two tasks interfere, Regent inserts the appropriate synchro-

nization and copy operations to ensure that the data depen-

dence is handled properly. In addition to regions, tasks can

also take partition arguments and specify partition require-

ments.

When writing (or compiling) to the Legion C++ inter-

face, several additional aspects of the Legion runtime im-

plementation are exposed, requiring additional user eﬀort.

Legion’s dynamic dependence analysis imposes a cost with

every task launched. To ensure that this overhead stays oﬀ

the critical path, the Legion runtime is itself asynchronous

and parallel [9]. The goal is for the runtime to run ahead of

the application, issuing tasks and analyzing task interference

in advance of when those tasks can actually run. Pipeline

stalls, blocking operations, and excessive analysis costs can

all cause the runtime to fall behind and hurt the perfor-

mance of the application. Legion mitigates these issues by

providing more sophisticated abstractions which can result

in higher performance, but also have more complex seman-

tics.

3.1.1

Avoiding Pipeline Stalls

Task execution in Legion is pipelined. In general, a task

must complete a pipeline stage before it passes to the next

stage. If a given stage stalls for any reason, that task and

any task that depends on it also stalls. Mapping, described

in greater detail in Section 3.3, is one pipeline stage. When

a task is mapped, a processor is selected to execute the task

and memories are chosen to hold the physical instances of

each of its region arguments.

Because tasks can execute subtasks, Legion must wait for

all subtasks to map before it can consider a parent task to

have completed mapping. In general the only way to know

that a parent task cannot issue more subtasks is that the

parent task has terminated, which can result in unnecessary

pipeline stalls when the task in question never intended to

launch any subtasks.

Legion allows users to annotate tasks as leaf tasks if they

launch no subtasks, a mechanism inherited from Sequoia [22].

In Legion, the runtime considers the mapping of a leaf task

to be complete once the task itself is mapped, avoiding un-

necessary pipeline stalls for dependent operations.

3.1.2

Avoiding Blocking Operations

Tasks can produce results in one of two ways: direct return

values, or as a side-eﬀect on a region argument. In Legion,

operations can block whenever a parent task consumes a

result produced by one of its child tasks. The Legion runtime

provides ways of avoiding blocking on both kinds of task

results.

When tasks produce direct return values, Legion wraps

those values in futures. Users can block to obtain the value

of a future, but Legion also supports passing futures as in-

puts to other tasks without blocking. In this way, the pro-

grammer can describe the ﬂow of values between tasks with-

out blocking, allowing the runtime to run further ahead and

hide runtime analysis costs. Futures are visible in the C++

sample codes in Listing 3 lines 18-19 and Listing 4 line 23.

When a parent task needs to read the results of a region

written by a child task, unless the parent task has explicitly

indicated otherwise, the Legion runtime must conservatively

assume that the parent task may attempt to access the re-

gions used by the child task as soon as the child returns. To

preserve sequential semantics, the runtime blocks the parent

while the child task is in ﬂight to ensure the child’s results

are available before the parent continues. To avoid blocking,

parents must declare to the runtime that the region data is

not required by unmapping (releasing) the physical instance

of the region prior to calling the child.

3.1.3

Reducing Analysis Costs

Even when execution does not stall in the runtime or block

in the application, the cost of dynamic analysis itself can

cause the runtime to fall behind. One approach for reduc-

ing runtime overheads is to use index space task launches.

Yüklə 254,9 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8