HotpathVM: An Effective jit compiler for Resource-constrained Devices

Yüklə 217,56 Kb.

Pdf görüntüsü

səhifə	2/8
tarix	07.11.2018
ölçüsü	217,56 Kb.
	#78936

1 2 3 4 5 6 7 8

trace has been identiﬁed: it is then dynamically compiled into na-

tive code via a nontraditional application of SSA form, which we

call Trace SSA (TSSA).

In the classical use of SSA form, a control-ﬂow graph is trans-

lated into SSA form in its entirety, and φ nodes are placed in

control-ﬂow join nodes. Our approach differentiates between the

values used in a trace being compiled, which are in SSA form, and

values in the rest of the VM, which are not. The VM explicitly

moves data from the stack and local variables into dedicated SSA

variables before any generated native code is called, and explic-

itly moves non-dead SSA results back onto the stack and into local

variables on every exit from such an optimized trace (including side

exits). This approach enables the just-in-time compiler to perform

aggressive optimizations on the trace, including moving operations

on SSA values across side exit points. Because instruction traces

are essentially linear (they may contain only internal back edges)

liveness analysis and placement of φ nodes are straightforward. Our

system supports fairly sophisticated merging of multiple traces that

have a common ancestor.

The remainder of this paper is organized as follows. In Sec-

tion 2, we describe our approach to identify and extract linear code

sequences from non-sequential programs. Section 3 describes our

trace compiler, which is a specialized JIT compiler that translates

bytecode traces to PowerPC machine code. In Section 4 we de-

tail our approach to extend primary traces with secondary traces to

cover irregular control-ﬂow scenarios. Then (Section 5), we give

a brief overview of the current state of our prototype implementa-

tion. Related work is discussed in Section 6. Our paper concludes

in Section 7.

Trace Selection and Recording

When loading bytecode programs into memory, traces are not read-

ily visible in the code. In fact, most general purpose code is not

even purely sequential. The Java bytecode format groups code into

methods that are associated with classes, with all code for a class

stored in individual class ﬁles.

To transform such non-sequential code into linear sequences

of instructions, we use an approach called software trace schedul-

ing [9]. Software trace scheduling is based on the observation that

a physical CPU does not actually execute a code graph, but merely

follows linear instruction sequences (traces) along the edges of that

graph. To obtain a trace from a code graph, we record instructions

at runtime as they are executed. As not all code is equally worthy of

compilation and optimization, we limit our efforts to loops and only

start recording bytecode traces once we have identiﬁed a frequently

executed loop header.

This section gives an overview of the main steps—identifying

loop headers, recording traces, and conditions for ending the

recording.

2.1

Identifying Loop Headers

To identify loop headers, we use a simple heuristics that ﬁrst ap-

peared in Dynamo [3], a framework for dynamic runtime optimiza-

tion of binary code.

Initially, our virtual machine interprets the bytecode program in-

struction by instruction. Each time a backwards branch instruction

is executed, the destination of that jump is recorded as a potential

loop header. The rationale of this approach is that the general ﬂow

of control in bytecode is forward, and thus each loop has to contain

at least one backward branch.

To ﬁlter actual loop headers from the superset of potential

loop headers, we track the invocation frequency of (potential) loop

headers. Only after the execution frequency of a potential loop

header exceeds a certain threshold, our VM starts recording a

bytecode trace (Figure 1). To reduce the overhead for code that is

hotspot

getstatic System.out

for (i = 0; i < 1000; ++i)

++k;

System.out.println(k);

}

iconst_0

istore_2

iconst_0

istore_1

iload_1

sipush 1000

if_icmpge B

iinc 2,1

iinc 1,1

goto A

iload_2

invokevirtual println(int)

return

public static void main(String[] args) {

int i, k = 0;

hotpath

trace

Figure 1. Recording bytecode traces. The source program (upper

part of the ﬁgure) contains a frequently executed loop that we

want to compile (a hotspot). The other parts of the program are

executed infrequently and compiling them would likely not much

beneﬁt overall performance. The bottom part of the ﬁgure shows

the resulting bytecode program. A frequently executed backward

branch to A triggers the recording of a trace. The trace is complete

when execution returns to the loop header. We also refer to recorded

traces as ”hotpaths”, as they are the results of tracing hotspots.

not suitable for trace-based JIT compilation, the execution monitor

disconnects itself from the virtual machine if no usable traces can

be found and a second threshold is exceeded.

As branch instructions appear frequently in Java bytecode, it

is essential to minimize the housekeeping overhead for keeping

track of the invocation frequency of backward branches. In our

ﬁrst KVM-based implementation we used a relatively small (977

entries) closed hash table, consisting of 16-bit integer counters, for

this purpose. Every time a backward-branch instruction is executed,

the corresponding counter in the hash table is incremented. Colli-

sions in the hash table are intentionally tolerated as their impact on

code generation is relatively limited. Collisions can merely lead to

overestimation of the ”hotness” of a code region, triggering code

generation for ”cold” code. In the worst case this overestimation

will cause a slight performance degradation as the VM might be

unable to recover the compilation cost in terms of cycles spent on

code optimization vs. cycles recovered from more efﬁcient program

execution.

Using a hash table is well suited for interpreters such as KVM,

which interpret the native bytecode format as it exists in Java class

ﬁles. JamVM, in contrast, translates the bytecode format into its

own intermediate representation prior to execution. Amongst oth-

ers, relative branch addresses are resolved to absolute target ad-

dresses and constant values are retrieved from the constant pool

and stored directly in the intermediate representation. To further

reduce the overhead of the branch monitoring approach described

above, we directly embed the proﬁling information in the interme-

diate representation. This allows us to avoid the additional hash

table lookup per branch instruction.

Yüklə 217,56 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8