int i = 10, a;
while (i-- > 0) {
if (b[i] == 0) [R1 => i]
/* add: R2 => a */
break;
a = 1;
} [ R1 => i, R2 => a ]
Figure 2. To ensure proper semantics, any values altered during a
loop iteration have to be written back at all side exists, because the
value could still be pending from a previous loop iteration. The map
recorded for the if instruction, for example, initially only contains
the necessary information to update the loop counter (i). The tail
map at the end of the loop in contrast indicates that a has to be
updated as well. This information is subsequently merged into all
side exits to ensure that the if instruction not only updates i but
also a in case of a side exit. This is necessary, because as we will
discuss in the next section we will perform register coalescing on
the tail map, ensuring that the next loop iteration finds all values
in the corresponding register without actually writing back any
values into the Java local variable locations. Thus each side exit
is responsible for this step, even for values that were carried over
from the previous iteration and have not been redefined yet in this
iteration.
unconditionally branch back to the loop header once an iteration
has been completed.
As we record traces across method calls, the input of our com-
piler is also free from method invocations, which significantly sim-
plifies code generation.
The work of our trace compiler can be divided into three phases:
a) stack deconstruction and transformation into SSA form, b) anal-
ysis, and c) code generation. Each phase is accomplished in a single
sweep over the trace and thus executes roughly in linear time.
3.1
Stack Deconstruction
During the initial phase, we deconstruct the Java stack and replace
references to stack locations and local variables with the index
number of the instruction that defined the corresponding value. By
doing so, we effectively also transform the trace into SSA form
(Figure 3). For this, we track for each stack location and local
variable the index of the instruction that defined the corresponding
value, and update on the fly each operand reference. To import the
initial context which consists of values stored on the Java stack
and in local variables when the trace starting point is reached, the
virtual machine automatically reports read pseudo instructions for
all occupied local variable slots.
At the end of the loop, a renaming table and the tail map are
used to decide which incoming context values are loop invariant.
The renaming table maps local variables to their SSA names and is
updated as each instruction is recorded and transformed on-the-fly
into SSA. Each time a value is written into a local variable, a new
SSA name is assigned and the renaming table is updated to reflect
the new SSA name. For each use the renaming table is consulted to
obtain the current SSA name for the corresponding local variable,
which is recorded instead of the actual local variable index.
Each mapping in the tail map consists of the location where
the value was defined and the local variable index where the value
has to be written back to. A mapping that refers to a definition
other than the read pseudo instruction we initially filled in for
a given local variable slot indicates that the local variable context
was redefined in the loop body. For these instructions the read
pseudo instruction is transformed into a φ pseudo instruction.
φ pseudo instructions are similar to read pseudo instructions
in that they read a value from the VM stack or a local variable be-
fore entering the loop. Along the loop edge, however, φ pseudo
iconst_0
istore_2
iconst_0
istore_1
A: iload_1
sipush 1000
if_icmpge B
iinc 2,1
iinc 1,1
goto A
B: getstatic System.out
iload 2
invokevirtual println(int)
return
0: read L1
(phi L1,4)
1: read L2
(phi L2,5)
2: sipush 1000
3: if_icmpge bail-out
4: iinc 1,1
5: iinc 0,1
6: goto 3
bail-out:
return to VM
Figure 3. Phase 1: Stack deconstruction and transformation into
SSA form. In our SSA-based IR, operands refer to the defining in-
struction in the trace instead of stack locations or local variables.
Incoming values are represented through explicit read instructions,
which in this case read local variables 1 and 2. In a single forward
scan through the code, all operand references are updated by track-
ing the definition point for each stack location and local variable in
a renaming table. The table initially contains the mappings (L1:0,
L2:1), which causes the two iinc instructions to be updated with
new operands. At the same time, the renaming table is updated to
(L1:4, L2:5) because the iinc instructions redefined the values in lo-
cal variables 1 and 2. At the end of the loop, we check which entries
in the renaming table have been modified. All modified entries are
loop variables
, and their corresponding read instructions are each
replaced with a φ instruction that reads the updated value along
the loop edge, instead of the initial one. The remaining entries are
flagged as loop invariant.
instructions return the value produced by a downstream instruction
in the previous iteration. As the virtual machine has already in-
serted read instructions for all context values, we do not actually
have to insert any φ pseudo instructions into the trace code. We
merely transform the corresponding read pseudo instruction into
a φ pseudo instruction.
It should be noted that we do not have to calculate a dominator
tree to place φ nodes, because our traces contain at most one merge
point, which is the trace entry where the loop edge and the entry
edge meet. Furthermore, our initial phase also performs a loop
invariant analysis—essentially for free.
3.2
Code Analysis in SSA form
Once we have obtained the SSA-based intermediate representation,
we perform a series of data-flow analyses on the code (phase 2).
Also these analyses occur during a single forward sweep over the
trace. Among others, we propagate the loop invariant information
to dependent instructions. In other words, all instructions that only
depend on loop-invariant inputs are flagged as loop invariant and
will be hoisted out of the loop body.
Initially, only the read and φ pseudo instructions inserted at the
beginning of the trace are considered loop-invariant. read pseudo
instructions are invariant, because they merely read values from
the Java stack and local variables into temporary registers and the