either allow parallel execution or require that the tasks be
serialized in the order they were issued.
Besides managing the tasks, the Legion runtime also man-
ages the regions.
Tasks are written using logical regions,
which simply name collections of objects. During execution
each logical region may correspond to any number of physi-
cal instances, which are actual allocated copies of the data.
Separating the logical and physical levels allows important
patterns, such as having multiple copies of read-only data,
to be expressed directly.
The Legion abstractions allow programmers to write effi-
cient task-based programs that run out-of-order, asynchro-
nously, and in a distributed fashion. However, because Le-
gion is embedded in C++, which does not understand the
semantics of tasks and regions, the Legion API is forced to
expose functionality beyond the logical layer of the program-
ming model. Programmers must generally write Legion pro-
grams with some awareness of both the logical and physical
levels.
Regent exposes only the logical level.
The lower-level,
physical details of the Legion execution model are hidden
from the programmer.
For a naive implementation, this
would be disastrous for performance.
However, with the
support of static analysis and compiler optimizations, Re-
gent is able to close the gap between logical and physical
constructs and provide a seamless abstraction to the pro-
grammer.
In the rest of this section we present the Regent and Le-
gion programming models in more detail, examine the dif-
ferences between the two, and demonstrate how a Regent
compiler is able to translate between them. In Section 4 we
consider the optimizations performed by the compiler that
enable this translation to be efficient.
3.1
Tasks
Tasks are the fundamental unit of control in both Regent
and Legion. Tasks are issued in program order, exactly as
they are written in the text, and every possible program ex-
ecution is guaranteed to be indistinguishable from serial ex-
ecution. As discussed in Section 2, tasks specify the regions
they use and their permissions for those regions (whether the
task performs reads, writes, or reductions to each region).
In addition, tasks declare which fields of the objects in the
region the task accesses. Together, the declaration that spe-
cific fields of a region are accessed with certain privileges is
called a region requirement.
Whenever two tasks are non-interfering, accessing either
disjoint regions, different fields of the same region, or the
same fields with compatible permissions (e.g., both tasks
only read the field or only perform the same reduction to the
field), Regent allows those tasks to run in parallel. Wherever
two tasks interfere, Regent inserts the appropriate synchro-
nization and copy operations to ensure that the data depen-
dence is handled properly. In addition to regions, tasks can
also take partition arguments and specify partition require-
ments.
When writing (or compiling) to the Legion C++ inter-
face, several additional aspects of the Legion runtime im-
plementation are exposed, requiring additional user effort.
Legion’s dynamic dependence analysis imposes a cost with
every task launched. To ensure that this overhead stays off
the critical path, the Legion runtime is itself asynchronous
and parallel [9]. The goal is for the runtime to run ahead of
the application, issuing tasks and analyzing task interference
in advance of when those tasks can actually run. Pipeline
stalls, blocking operations, and excessive analysis costs can
all cause the runtime to fall behind and hurt the perfor-
mance of the application. Legion mitigates these issues by
providing more sophisticated abstractions which can result
in higher performance, but also have more complex seman-
tics.
3.1.1
Avoiding Pipeline Stalls
Task execution in Legion is pipelined. In general, a task
must complete a pipeline stage before it passes to the next
stage. If a given stage stalls for any reason, that task and
any task that depends on it also stalls. Mapping, described
in greater detail in Section 3.3, is one pipeline stage. When
a task is mapped, a processor is selected to execute the task
and memories are chosen to hold the physical instances of
each of its region arguments.
Because tasks can execute subtasks, Legion must wait for
all subtasks to map before it can consider a parent task to
have completed mapping. In general the only way to know
that a parent task cannot issue more subtasks is that the
parent task has terminated, which can result in unnecessary
pipeline stalls when the task in question never intended to
launch any subtasks.
Legion allows users to annotate tasks as leaf tasks if they
launch no subtasks, a mechanism inherited from Sequoia [22].
In Legion, the runtime considers the mapping of a leaf task
to be complete once the task itself is mapped, avoiding un-
necessary pipeline stalls for dependent operations.
3.1.2
Avoiding Blocking Operations
Tasks can produce results in one of two ways: direct return
values, or as a side-effect on a region argument. In Legion,
operations can block whenever a parent task consumes a
result produced by one of its child tasks. The Legion runtime
provides ways of avoiding blocking on both kinds of task
results.
When tasks produce direct return values, Legion wraps
those values in futures. Users can block to obtain the value
of a future, but Legion also supports passing futures as in-
puts to other tasks without blocking. In this way, the pro-
grammer can describe the flow of values between tasks with-
out blocking, allowing the runtime to run further ahead and
hide runtime analysis costs. Futures are visible in the C++
sample codes in Listing 3 lines 18-19 and Listing 4 line 23.
When a parent task needs to read the results of a region
written by a child task, unless the parent task has explicitly
indicated otherwise, the Legion runtime must conservatively
assume that the parent task may attempt to access the re-
gions used by the child task as soon as the child returns. To
preserve sequential semantics, the runtime blocks the parent
while the child task is in flight to ensure the child’s results
are available before the parent continues. To avoid blocking,
parents must declare to the runtime that the region data is
not required by unmapping (releasing) the physical instance
of the region prior to calling the child.
3.1.3
Reducing Analysis Costs
Even when execution does not stall in the runtime or block
in the application, the cost of dynamic analysis itself can
cause the runtime to fall behind. One approach for reduc-
ing runtime overheads is to use index space task launches.