Regent: A High-Productivity Programming Language for
HPC with Logical Regions
Elliott Slaughter
Stanford University
slaughter@cs.stanford.edu
Wonchan Lee
Stanford University
wonchan@cs.stanford.edu
Sean Treichler
Stanford University
sjt@cs.stanford.edu
Michael Bauer
NVIDIA Research
mbauer@nvidia.com
Alex Aiken
Stanford University
aiken@cs.stanford.edu
ABSTRACT
We present Regent, a high-productivity programming lan-
guage for high performance computing with logical regions.
Regent users compose programs with tasks (functions eligi-
ble for parallel execution) and logical regions (hierarchical
collections of structured objects). Regent programs appear
to execute sequentially, require no explicit synchronization,
and are trivially deadlock-free. Regent’s type system catches
many common classes of mistakes and guarantees that a pro-
gram with correct serial execution produces identical results
on parallel and distributed machines.
We present an optimizing compiler for Regent that trans-
lates Regent programs into efficient implementations for Le-
gion, an asynchronous task-based model. Regent employs
several novel compiler optimizations to minimize the dy-
namic overhead of the runtime system and enable efficient
operation. We evaluate Regent on three benchmark appli-
cations and demonstrate that Regent achieves performance
comparable to hand-tuned Legion.
Categories and Subject Descriptors
D.3.2 [Programming Languages]: Language Classifica-
tions—Concurrent, distributed, and parallel languages; D.3.4
[Programming Languages]: Processors—Compilers, Op-
timization
Keywords
Regent; Legion; logical regions; task-based runtimes
1.
INTRODUCTION
Modern supercomputers feature distributed memory ar-
chitectures with deep memory hierarchies. Currently, the
state of the art in programming this class of machines is
the MPI+X hybrid programming model.
While MPI+X
codes achieve good performance, they do so at a cost to pro-
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
SC ’15, November 15 - 20, 2015, Austin, TX, USA
© 2015 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ISBN 978-1-4503-3723-6/15/11. . . $15.00
DOI:
http://dx.doi.org/10.1145/2807591.2807629
grammer productivity and performance portability. Users
of MPI+X must explicitly manage data movement and syn-
chronization within and between nodes. Furthermore, they
must also explicitly overlap communication with computa-
tion for optimal performance, a task made difficult by the
need to interface with two disparate programming models.
In addition, the degree to which communication and com-
putation must be overlapped to achieve good performance
depends on machine-specific factors, resulting in poor per-
formance portability in aggressively hand-tuned codes.
An alternative that is receiving considerable attention is
writing programs for task-based runtimes.
While there is
significant variation among the current approaches [9, 4, 6,
19, 27], the common element is a graph of tasks to be exe-
cuted, where the graph’s edges capture ordering dependen-
cies between tasks. The advantage of the task-based ap-
proach is that the computation is expressed at a higher level
than MPI+X, which allows for both more aggressive opti-
mization by the programming model’s implementation and
correspondingly less effort by programmers to express the
same optimizations by hand (as well as better portability).
A disadvantage of all the current task-based models is that
they are runtime systems (i.e., libraries) embedded in a host
language that does not understand the task-based model’s
higher-level semantics.
Programmers must do additional
work to maintain important invariants across calls to the
runtime system, resulting in a programming interface that
is more complex and verbose than a true programming lan-
guage implementation could provide. Furthermore, impor-
tant optimizations that require static analysis to be feasible
are simply beyond the scope of dynamic runtime systems.
To address these challenges we present Regent, a high-
productivity programming language for high performance
computing.
Regent features two key abstractions: tasks
and logical regions. Regent programs look like ordinary se-
quential programs with calls to tasks, which are functions
that the programmer has marked as eligible for parallel ex-
ecution. Regent guarantees that any parallel execution is
consistent with the sequential execution of a Regent pro-
gram. Internally, dependencies between tasks are inferred
automatically, freeing the user from the need to explicitly
synchronize or manage data movement around the machine.
Regent programs are also trivially deadlock-free and avoid
a number of classes of mistakes possible in lower level dis-
tributed programming.
Logical regions [9, 33, 10], or simply regions, are collec-
tions of structured objects. Regions have no fixed location