Regent: a high-Productivity Programming Language for hpc with Logical Regions

Partitioning in PENNANT: zones (top left), sides

Yüklə 254,9 Kb.

Pdf görüntüsü

səhifə	2/8
tarix	23.02.2018
ölçüsü	254,9 Kb.
	#27490

1 2 3 4 5 6 7 8

Figure 1: Partitioning in PENNANT: zones (top left), sides

(top right), and points (bottom).

in the memory hierarchy—for example, because they may

be striped across nodes—and no ﬁxed layout in memory—

for example, because diﬀerent tasks or processors may prefer

array-of-structs or struct-of-arrays layouts. Regions can be

recursively partitioned to match the hierarchical structure

of memory, and to facilitate parallel execution on subsets of

data. Regions can also be partitioned multiple times to ex-

press sophisticated communications patterns involving mul-

tiple views of the data, including ones with aliasing between

views.

We describe an optimizing compiler for Regent that trans-

lates Regent programs into eﬃcient implementations for Le-

gion, a dynamic, task-based asynchronous runtime system

with native support for tasks and logical regions [9]. Regent

simpliﬁes the Legion programming model. Many details of

programming to the Legion runtime system can be managed

statically by the Regent compiler, resulting in Regent pro-

grams that are both written at a higher level and with fewer

lines of code than the corresponding Legion programs. Sev-

eral novel optimizations allow the Regent compiler to achieve

performance equivalent to hand-tuned codes written directly

to the Legion C++ API.

To motivate the Regent programming model, we present

excerpts from a Regent implementation of PENNANT [23],

a Lagrangian hydrodynamics code. Each of the following

sections discusses one of our contributions:

• Section 3 presents the Regent programming model and

provides a detailed comparison with Legion.

• Section 4 presents compiler optimizations that are im-

portant for achieving high performance in Regent pro-

grams.

• Section 5 discusses the implementation of the Regent

compiler.

• Section 6 evaluates the performance of three applica-

tions written in Regent.

Section 7 discusses related work, and Section 8 concludes.

MOTIVATING EXAMPLE

To motivate our presentation of Regent, we begin by in-

troducing the language through a series of excerpts from a

task adv pos full(points : region(point), dt : double) where

reads(points.{px0, pu0, pf, pmaswt}), writes(points.{px, pu})

do

4

var fuzz = 1e−99

var dth = 0.5

∗ dt

for p in points do

var pap = (1.0 / max(p.pmaswt, fuzz))

∗p.pf

8

var pu = p.pu0 + dt

∗pap

p.pu = pu

p.px = p.px0 + dth

∗(pu + p.pu0)

end

Listing 1: PENNANT kernel task in Regent.

task simulate(zones all : region(zone),

zones all p : partition(disjoint, zones all),

points all : region(point),

points all private : region(point),

points all private p : partition(disjoint, points all private),

conf : conﬁg)

where

reads(zones all, points all private),

writes(zones all, points all private)

var dt = conf.dtmax

var time = 0.0

var tstop = conf.tstop

while time < tstop do

dt = calc global dt(dt, dtmax, dthydro, time, tstop)

for i = 0, conf.npieces do

adv pos full(points all private p[i], dt)

end

−− ...

for i = 0, conf.npieces do

dthydro min= calc dt hydro(zones all p[i], dt, dtmax)

end

time += dt

end

Listing 2: PENNANT main task excerpt in Regent.

Regent implementation of the PENNANT mini-app. PEN-

NANT [23] implements Lagrangian hydrodynamics for a

subset of the functionality provided by FLAG [13], a pro-

duction code at Los Alamos National Laboratory (LANL).

PENNANT operates on 2D unstructured meshes, with data

structures representing the fundamental elements in 0, 1,

and 2 dimensions called points, edges and zones. To deal

with zones with an arbitrary number of edges, PENNANT

adds an intermediate data structure called a side, represent-

ing the triangular area between an edge and the center of

the zone (see Figure 1 top right).

Simulation in PENNANT proceeds in alternating phases,

reading values stored on zones and scattering to the points,

and gathering values on points and writing to the zones.

This structure is illustrated in an excerpt from the PEN-

NANT main loop in Listing 2, where two phases are visible:

the ﬁrst phase advances points using forces previously com-

puted (lines 16-18), while the second phase computes dt for

the next timestep from the geometry of the updated mesh

(lines 22-24). The rest of the PENNANT timestep loop fol-

lows in a straightforward way from these examples, calling

each phase’s kernels in turn.

To allow Regent programs to express that tasks execute

on subsets of the data, regions can be partitioned into sub-

regions. Partitioning a region is a primitive operation in

Regent, and the resulting partition, which is the collection

of subregions of the parent region, is a ﬁrst-class value. In

void adv pos full(const Task

∗task,

2

const std::vector

®ions,

Context ctx, HighLevelRuntime

∗runtime)

{

PhysicalRegion points0 = regions[0];

Accessor points px0 x(points0, PX0 X);

Accessor points px0 y(points0, PX0 Y);

Accessor points pu0 x(points0, PU0 X);

Accessor points pu0 y(points0, PU0 Y);

Accessor points pf x(points0, PF X);

Accessor points pf y(points0, PF Y);

Accessor points pmaswt(points0, PMASWT);

PhysicalRegion points1 = regions[1];

Accessor points px x(points1, PX X);

Accessor points px y(points1, PX Y);

Accessor points pu x(points1, PU X);

Accessor points pu y(points1, PU Y);

Future f0 = task−>futures[0];

double dt = f0.get result();

double fuzz = 1e−99;

double dth = 0.5

∗ dt;

22

IndexIterator it(points0.get logical region().get index space());

while (it.has next()) {

size t count;

ptr t start = it.next span(count);

ptr t end(start.value + count);

for (ptr t p = start; p < end; p++) {

double frac = (1.0 / max(points pmaswt.read(p), fuzz));

double pap x = frac

∗ points pf x.read(p);

double pap y = frac

∗ points pf y.read(p);

double pu x = points pu0 x.read(p) + dt

∗ pap x;

double pu y = points pu0 y.read(p) + dt

∗ pap y;

points pu x.write(p, pu x);

points pu y.write(p, pu y);

points px x.write(p, points px0 x.read(p) +

dth

∗(pu x + points pu0 x.read(p)));

points px y.write(p, points px0 y.read(p) +

dth

∗(pu y + points pu0 y.read(p)));

}

Listing 3: PENNANT kernel task in Legion C++ API.

runtime−>unmap region(ctx, pr points all private);

Domain domain = Domain::from rect<1>(

Rect<1>(Point<1>(0), Point<1>(conf.npieces − 1)));

IndexLauncher launcher(ADV POS FULL, domain,

TaskArgument(), ArgumentMap());

launcher.add region requirement(

RegionRequirement(points all private p, 0 /

∗ projection ∗/,

READ ONLY, EXCLUSIVE, points all private));

launcher.add ﬁeld(0, PX0 X);

launcher.add ﬁeld(0, PX0 Y);

launcher.add ﬁeld(0, PU0 X);

launcher.add ﬁeld(0, PU0 Y);

launcher.add ﬁeld(0, PF X);

launcher.add ﬁeld(0, PF Y);

launcher.add ﬁeld(0, PMASWT);

launcher.add region requirement(

RegionRequirement(points all private p, 0 /

∗ projection ∗/,

READ WRITE, EXCLUSIVE, points all private));

launcher.add ﬁeld(1, PX X);

launcher.add ﬁeld(1, PX Y);

launcher.add ﬁeld(1, PU X);

launcher.add ﬁeld(1, PU Y);

launcher.add future(dt);

runtime−>execute index space(ctx, launcher);

Listing 4: PENNANT task launch in Legion C++ API.

PENNANT, partitions are created during program initial-

ization (not shown) and passed as arguments to the main

simulation task. Two such partitions are shown in Listing 2

on lines 2 and 5.

Partitioning a region assigns one or more colors (small in-

tegers) to the region’s elements; there is one subregion per

color containing all the elements with that color. Zones are

an example of a disjoint partition where none of the subre-

gions overlap. For performance, it is advantageous for the

colored subregions to be compact, though the application

will run correctly with any coloring. Figure 1 shows one

possible coloring of zones.

Partitions of sides and points are computed from the par-

tition of zones using the topology of the mesh. Sides sim-

ply take on the color of their corresponding zone. Points

are partitioned in a more sophisticated way, to account for

aliasing at the boundaries of the subregions of zones. Points

are colored by every zone they are adjacent to, leaving each

point with one or more colors. Points are ﬁrst partitioned

according to the number of colors assigned; points with only

one color are placed in a subregion of private points while

points with multiple colors are put into a subregion of ghost

points. Each subregion is then further partitioned accord-

ing to the colors of the points, as shown in Figure 1. This

partitioning scheme allows Regent to limit data movement

when reducing the forces applied by the zones on the points.

Because the private points are known to be disjoint from all

ghost points, and the private points are further partitioned

into disjoint pieces for each submesh, those partitions of the

mesh are isolated from any data movement in the system.

Regent tracks these relationships between regions, and en-

sures that tasks only access data for which they have de-

clared the appropriate privileges: Regent tasks must say

whether they plan to access a region with read, read/write,

or reduction privileges (and, in the case of reduction privi-

leges, the reduction operator must also be speciﬁed). Tasks

may only call other tasks with subsets of their own priv-

ileges. The call to

adv pos full

in Listing 2 line 17 is safe

because

simulate

holds a superset of the privileges required

adv pos full

, but also because Regent understands that

the partition access at

points all private p[i]

is a subregion of

points all private

, for which

simulate

has privileges. Similarly,

all pointer accesses (such as those in Listing 1 lines 7-10) are

associated with a particular region, ensuring that the access

stays safely within the bounds of the regions for which the

task has privileges.

For comparison to the Regent code above, Listings 3 and

4 show implementations of the same tasks written with the

Legion C++ API. Listing 3 corresponds to Listing 1 while

Listing 4 corresponds to the three lines of Listing 2 lines

16-18. Clearly the C++ API is more verbose. However, be-

yond the length, these code samples also illustrate that Le-

gion exposes a programming model with more moving parts

than Regent. Regent is able to handle the additional as-

pects (which we discuss in Section 3) automatically within

the compiler, and so hides them from the programmer and

provides a higher-level interface while maintaining the per-

formance of hand-tuned Legion.

PROGRAMMING MODEL

Before we discuss the Regent programming model in more

detail, we explain more about Legion so that the reader can

appreciate the diﬀerences between the two systems. Legion,

the runtime which Regent targets, is implemented as a soft-

ware out-of-order processor [9]. Tasks are issued to the Le-

gion runtime in program order, but the underlying Legion

scheduler may reorder tasks and execute them in parallel if

it can prove that it is safe to do so. The Legion runtime

analyzes each task’s privileges for its region arguments to

identify when tasks are using the same regions in ways that

Yüklə 254,9 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8