HotpathVM: An Effective jit compiler for Resource-constrained Devices

Yüklə 217,56 Kb.

Pdf görüntüsü

səhifə	7/8
tarix	07.11.2018
ölçüsü	217,56 Kb.
	#78936

1 2 3 4 5 6 7 8

Figure 6. An example for a loop with more than one frequently ex-

ecuted path. If we supported just a single hotpath for each loop and

then encountered A, F, G ﬁrst and recorded a trace for it, all other

combinations (A, B, C, E, G and A, B, D, E, G) would result in

expensive bail-out operations in which the VM has to recover from

failed guard instructions. Our secondary trace approach avoids this

problem.

tion. If a loop calls into the memory allocator on every iteration,

the overhead of running the memory allocator often by far out-

weighs any speedup that compilation of the rest of the loop code

could achieve. Thus not compiling such loops in the ﬁrst place is an

acceptable initial heuristics.

Since the runtime of the loop is domi-

nated by the runtime of the memory allocator, the performance loss

will be small.

There is, however, a class of loops that would beneﬁt from com-

pilation despite the presence of memory allocation instructions.

Since Java does not allow explicit stack allocation of temporary

objects, regular heap allocation instructions are used. As these tem-

porary objects are only live for a single iteration, we could actu-

ally bypass the allocator altogether and allocate temporary stack

space instead. This would enable us to speed up the loop body with-

out building up any unused heap blocks that the garbage collector

would have to reclaim (which it can’t since we block garbage col-

lection while executing compiled code.)

We are currently working on a specialized linear-time escape

analysis that will allow us to identify such temporary objects at

runtime.

Trace Merging

Trace-based JIT compilation as presented in the previous two sec-

tions works well for very regular code. Unfortunately, code is rarely

purely regular. Loops often do not contain a single dominant path,

but several frequently executed paths (Figure 6). To achieve good

performance in this more general case, we have to support multiple

hot paths through a loop.

For this, similarly to Dynamo we record secondary (child) traces

every time we exit a trace along a non-exception edge (Figure 7).

The guard instruction is updated to jump to the child trace, instead

of returning to the VM. The child trace shares the upstream code

with its parent trace, and is compiled bottom-up just as its parent

trace, with the register allocator initialized to the ﬁnal state of the

Our compiler currently also rejects such loops since memory allocation

requires a callback into the virtual machine and is thus effectively a native

method invocation, which in our prototype abort trace compilation.

B

E

Figure 7. Recording secondary traces. Every time we exit a trace

along a non-exception edge, we immediately start recording a new

trace. The guard instruction that triggered the recording of the new

trace is then patched to point to the newly recorded and compiled

trace.

parent trace. This is necessary in order for the child trace to know

in which register the parent trace stored values that were computed

upstream.

If a child trace is left through a side exit, we resume recording

but that trace will also be directly associated with the parent trace.

In essence we lazily record all actually executed traces through the

loop as we encounter them, and connect them to the single loop

header of the parent trace.

In certain cases we also have to insert compensation code along

the new edge to the child trace, for example in order to compute a

value that was dead in the parent trace, but is now active in the child

trace. These cases are easily recognizable, because a use refers to a

deﬁnition in the parent trace that has no register allocation assigned

to it.

It should be noted that this approach leads to a certain amount

of code duplication, however, it also permits specialized code opti-

mization along each path (Figure 8).

Invariant code motion in the presence of secondary traces also

warrants special consideration. By aggressively hoisting computa-

tions out of the initially recorded primary trace, a large number of

registers would be blocked for further use by secondary traces. To

circumvent this problem, we selectively spill certain invariant val-

ues into memory when switching from parent to child traces.

An interesting effect can be observed in the presence of nested

loops. As shown in Figure 9, in a nested loop construct the inner

part of the loop tends to be ”hotter” than the outer parts of the

nested loop. Thus, the ﬁrst loop header to be detected as such is

likely B, because it is part of the inner loop. Once the trace B − C

has been recorded and is executed, a new trace is recorded as soon

C exits to D instead of following the loop edge. We record a new

child trace D − A and reconnect it to B, which we consider the

loop header. The fact that we only consider B as a loop header

as far as optimization is concerned means that the primary trace

B −C will get more resources allocated than C −D −A. However,

considering that the inner loop is likely the hotter path, this is

actually exactly what we want. Effectively, we have turned the

loop inside out, giving maximum consideration to the hottest loop

region, and slightly disadvantaging the outer loop parts.

Yüklə 217,56 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8