Figure 1: Partitioning in PENNANT: zones (top left), sides
(top right), and points (bottom).
in the memory hierarchy—for example, because they may
be striped across nodes—and no fixed layout in memory—
for example, because different tasks or processors may prefer
array-of-structs or struct-of-arrays layouts. Regions can be
recursively partitioned to match the hierarchical structure
of memory, and to facilitate parallel execution on subsets of
data. Regions can also be partitioned multiple times to ex-
press sophisticated communications patterns involving mul-
tiple views of the data, including ones with aliasing between
views.
We describe an optimizing compiler for Regent that trans-
lates Regent programs into efficient implementations for Le-
gion, a dynamic, task-based asynchronous runtime system
with native support for tasks and logical regions [9]. Regent
simplifies the Legion programming model. Many details of
programming to the Legion runtime system can be managed
statically by the Regent compiler, resulting in Regent pro-
grams that are both written at a higher level and with fewer
lines of code than the corresponding Legion programs. Sev-
eral novel optimizations allow the Regent compiler to achieve
performance equivalent to hand-tuned codes written directly
to the Legion C++ API.
To motivate the Regent programming model, we present
excerpts from a Regent implementation of PENNANT [23],
a Lagrangian hydrodynamics code. Each of the following
sections discusses one of our contributions:
• Section 3 presents the Regent programming model and
provides a detailed comparison with Legion.
• Section 4 presents compiler optimizations that are im-
portant for achieving high performance in Regent pro-
grams.
• Section 5 discusses the implementation of the Regent
compiler.
• Section 6 evaluates the performance of three applica-
tions written in Regent.
Section 7 discusses related work, and Section 8 concludes.
2.
MOTIVATING EXAMPLE
To motivate our presentation of Regent, we begin by in-
troducing the language through a series of excerpts from a
1
task adv pos full(points : region(point), dt : double) where
2
reads(points.{px0, pu0, pf, pmaswt}), writes(points.{px, pu})
3
do
4
var fuzz = 1e−99
5
var dth = 0.5
∗ dt
6
for p in points do
7
var pap = (1.0 / max(p.pmaswt, fuzz))
∗p.pf
8
var pu = p.pu0 + dt
∗pap
9
p.pu = pu
10
p.px = p.px0 + dth
∗(pu + p.pu0)
11
end
12
end
Listing 1: PENNANT kernel task in Regent.
1
task simulate(zones all : region(zone),
2
zones all p : partition(disjoint, zones all),
3
points all : region(point),
4
points all private : region(point),
5
points all private p : partition(disjoint, points all private),
6
conf : config)
7
where
8
reads(zones all, points all private),
9
writes(zones all, points all private)
10
do
11
var dt = conf.dtmax
12
var time = 0.0
13
var tstop = conf.tstop
14
while time < tstop do
15
dt = calc global dt(dt, dtmax, dthydro, time, tstop)
16
for i = 0, conf.npieces do
17
adv pos full(points all private p[i], dt)
18
end
19
20
−− ...
21
22
for i = 0, conf.npieces do
23
dthydro min= calc dt hydro(zones all p[i], dt, dtmax)
24
end
25
time += dt
26
end
27
end
Listing 2: PENNANT main task excerpt in Regent.
Regent implementation of the PENNANT mini-app. PEN-
NANT [23] implements Lagrangian hydrodynamics for a
subset of the functionality provided by FLAG [13], a pro-
duction code at Los Alamos National Laboratory (LANL).
PENNANT operates on 2D unstructured meshes, with data
structures representing the fundamental elements in 0, 1,
and 2 dimensions called points, edges and zones. To deal
with zones with an arbitrary number of edges, PENNANT
adds an intermediate data structure called a side, represent-
ing the triangular area between an edge and the center of
the zone (see Figure 1 top right).
Simulation in PENNANT proceeds in alternating phases,
reading values stored on zones and scattering to the points,
and gathering values on points and writing to the zones.
This structure is illustrated in an excerpt from the PEN-
NANT main loop in Listing 2, where two phases are visible:
the first phase advances points using forces previously com-
puted (lines 16-18), while the second phase computes dt for
the next timestep from the geometry of the updated mesh
(lines 22-24). The rest of the PENNANT timestep loop fol-
lows in a straightforward way from these examples, calling
each phase’s kernels in turn.
To allow Regent programs to express that tasks execute
on subsets of the data, regions can be partitioned into sub-
regions. Partitioning a region is a primitive operation in
Regent, and the resulting partition, which is the collection
of subregions of the parent region, is a first-class value. In
1
void adv pos full(const Task
∗task,
2
const std::vector
®ions,
3
Context ctx, HighLevelRuntime
∗runtime)
4
{
5
PhysicalRegion points0 = regions[0];
6
Accessor points px0 x(points0, PX0 X);
7
Accessor points px0 y(points0, PX0 Y);
8
Accessor points pu0 x(points0, PU0 X);
9
Accessor points pu0 y(points0, PU0 Y);
10
Accessor points pf x(points0, PF X);
11
Accessor points pf y(points0, PF Y);
12
Accessor points pmaswt(points0, PMASWT);
13
PhysicalRegion points1 = regions[1];
14
Accessor points px x(points1, PX X);
15
Accessor points px y(points1, PX Y);
16
Accessor points pu x(points1, PU X);
17
Accessor points pu y(points1, PU Y);
18
Future f0 = task−>futures[0];
19
double dt = f0.get result();
20
double fuzz = 1e−99;
21
double dth = 0.5
∗ dt;
22
IndexIterator it(points0.get logical region().get index space());
23
while (it.has next()) {
24
size t count;
25
ptr t start = it.next span(count);
26
ptr t end(start.value + count);
27
for (ptr t p = start; p < end; p++) {
28
double frac = (1.0 / max(points pmaswt.read(p), fuzz));
29
double pap x = frac
∗ points pf x.read(p);
30
double pap y = frac
∗ points pf y.read(p);
31
double pu x = points pu0 x.read(p) + dt
∗ pap x;
32
double pu y = points pu0 y.read(p) + dt
∗ pap y;
33
points pu x.write(p, pu x);
34
points pu y.write(p, pu y);
35
points px x.write(p, points px0 x.read(p) +
36
dth
∗(pu x + points pu0 x.read(p)));
37
points px y.write(p, points px0 y.read(p) +
38
dth
∗(pu y + points pu0 y.read(p)));
39
}
40
}
41
}
Listing 3: PENNANT kernel task in Legion C++ API.
1
runtime−>unmap region(ctx, pr points all private);
2
Domain domain = Domain::from rect<1>(
3
Rect<1>(Point<1>(0), Point<1>(conf.npieces − 1)));
4
IndexLauncher launcher(ADV POS FULL, domain,
5
TaskArgument(), ArgumentMap());
6
launcher.add region requirement(
7
RegionRequirement(points all private p, 0 /
∗ projection ∗/,
8
READ ONLY, EXCLUSIVE, points all private));
9
launcher.add field(0, PX0 X);
10
launcher.add field(0, PX0 Y);
11
launcher.add field(0, PU0 X);
12
launcher.add field(0, PU0 Y);
13
launcher.add field(0, PF X);
14
launcher.add field(0, PF Y);
15
launcher.add field(0, PMASWT);
16
launcher.add region requirement(
17
RegionRequirement(points all private p, 0 /
∗ projection ∗/,
18
READ WRITE, EXCLUSIVE, points all private));
19
launcher.add field(1, PX X);
20
launcher.add field(1, PX Y);
21
launcher.add field(1, PU X);
22
launcher.add field(1, PU Y);
23
launcher.add future(dt);
24
runtime−>execute index space(ctx, launcher);
Listing 4: PENNANT task launch in Legion C++ API.
PENNANT, partitions are created during program initial-
ization (not shown) and passed as arguments to the main
simulation task. Two such partitions are shown in Listing 2
on lines 2 and 5.
Partitioning a region assigns one or more colors (small in-
tegers) to the region’s elements; there is one subregion per
color containing all the elements with that color. Zones are
an example of a disjoint partition where none of the subre-
gions overlap. For performance, it is advantageous for the
colored subregions to be compact, though the application
will run correctly with any coloring. Figure 1 shows one
possible coloring of zones.
Partitions of sides and points are computed from the par-
tition of zones using the topology of the mesh. Sides sim-
ply take on the color of their corresponding zone. Points
are partitioned in a more sophisticated way, to account for
aliasing at the boundaries of the subregions of zones. Points
are colored by every zone they are adjacent to, leaving each
point with one or more colors. Points are first partitioned
according to the number of colors assigned; points with only
one color are placed in a subregion of private points while
points with multiple colors are put into a subregion of ghost
points. Each subregion is then further partitioned accord-
ing to the colors of the points, as shown in Figure 1. This
partitioning scheme allows Regent to limit data movement
when reducing the forces applied by the zones on the points.
Because the private points are known to be disjoint from all
ghost points, and the private points are further partitioned
into disjoint pieces for each submesh, those partitions of the
mesh are isolated from any data movement in the system.
Regent tracks these relationships between regions, and en-
sures that tasks only access data for which they have de-
clared the appropriate privileges: Regent tasks must say
whether they plan to access a region with read, read/write,
or reduction privileges (and, in the case of reduction privi-
leges, the reduction operator must also be specified). Tasks
may only call other tasks with subsets of their own priv-
ileges. The call to
adv pos full
in Listing 2 line 17 is safe
because
simulate
holds a superset of the privileges required
by
adv pos full
, but also because Regent understands that
the partition access at
points all private p[i]
is a subregion of
points all private
, for which
simulate
has privileges. Similarly,
all pointer accesses (such as those in Listing 1 lines 7-10) are
associated with a particular region, ensuring that the access
stays safely within the bounds of the regions for which the
task has privileges.
For comparison to the Regent code above, Listings 3 and
4 show implementations of the same tasks written with the
Legion C++ API. Listing 3 corresponds to Listing 1 while
Listing 4 corresponds to the three lines of Listing 2 lines
16-18. Clearly the C++ API is more verbose. However, be-
yond the length, these code samples also illustrate that Le-
gion exposes a programming model with more moving parts
than Regent. Regent is able to handle the additional as-
pects (which we discuss in Section 3) automatically within
the compiler, and so hides them from the programmer and
provides a higher-level interface while maintaining the per-
formance of hand-tuned Legion.
3.
PROGRAMMING MODEL
Before we discuss the Regent programming model in more
detail, we explain more about Legion so that the reader can
appreciate the differences between the two systems. Legion,
the runtime which Regent targets, is implemented as a soft-
ware out-of-order processor [9]. Tasks are issued to the Le-
gion runtime in program order, but the underlying Legion
scheduler may reorder tasks and execute them in parallel if
it can prove that it is safe to do so. The Legion runtime
analyzes each task’s privileges for its region arguments to
identify when tasks are using the same regions in ways that