HotpathVM: An Effective jit compiler for Resource-constrained Devices

Yüklə 217,56 Kb.

Pdf görüntüsü

səhifə	6/8
tarix	07.11.2018
ölçüsü	217,56 Kb.
	#78936

1 2 3 4 5 6 7 8

IADD:

if constant(son[0]) && constant(son[1]) {

value = OP(son[0],son[1])

return

}

if constant(son[0])

swap(son[0], son[1])

if int_const(son[1]) && imm16(son[1])

emit(addi, reg(son[0]), value(son[1]))

else

emit(add, reg(son[0]), reg(son[1]))

Figure 4. The backend iterates over the code bottom-up, which

allows us to use simple pattern-matching for emitting specialized

instruction forms. The generator function for the IADD bytecode

instruction, for example, ﬁrst checks whether its left operand is

constant, and swaps its operands in that case. This is to ensure that

if one operand is constant, this is always the right one, as expected

by the addi machine instruction. Only operands that cannot be

folded into a constant are assigned a register using the reg function.

The corresponding code for the deﬁnition will be emitted later, once

the code generator arrives at the deﬁning instruction.

on the bytecode interpreter to record the value calculated by each

instruction in the trace. The compiler merely tracks a bit whether

operands are constant or not, and if all operands of an instruction

are constant, the result previously recorded by the virtual machine

is assumed to be the correct result value of the instruction.

The only JVML instruction that warrants special treatment is

the IINC instruction. In contrast to all other JVML instructions it

does not store the value it generates on top of the stack and thus

our stack recording mechanism cannot capture the value. Instead,

it directly pushes the new value into a local variable. Our recording

code is aware of this irregularity and records the value of the local

variable instead of the top of the stack for IINC instructions.

Only if constant folding fails we assign a register to the in-

struction deﬁning the operand, which is the main reason why we

perform code generation in reverse order. It guarantees that we al-

ways encounter all uses of a value before its actual deﬁnition. The

code for the deﬁnition is only generated if not all uses could be

resolved through constant folding and specialized instruction pat-

terns. A common example is the addi instruction that allows to

directly embed one operand as an immediate as long the value is

within a certain range. If all uses of the value can be directly em-

bedded in such a fashion, it is not necessary to actually generate

that value at the deﬁnition site and no register has to be assigned

to it. In case of forward code generation we would have to decide

whether to emit the code for a deﬁnition before we know whether

the value will actually be used or not, which in turn would require

to analyze each use of the value.

As code generation continues, we will eventually reach the

instruction we just assigned a register to, and generate the necessary

code to produce a value into that register. Code is only generated for

instructions that were previously assigned a register. If we are able

to satisfy all uses statically through constant folding, the instruction

will have no register allocation when we arrive at it, and it is

considered dead.

Besides simplifying register allocation, bottom-up code genera-

tion also enables us to perform efﬁcient pattern matching to emit

specialized machine instruction forms (Figure 4). As described

above, we try to fold all operands into constants, and if we real-

ize that one operand of an add instruction is a constant, for exam-

ple, we emit the specialized addi machine instruction (in case of a

PowerPC backend).

02801008: cmpwi r2,1000

0280100c: bgel- 0x2801034

02801010: addi r3,r3,1

02801014: addi r2,r2,1

02801018: b 0x2801008

Figure 5. The ﬁnal code emitted by our backend for the example

introduced above. By assigning ﬁxed registers to the φ instructions,

we avoid any trailing move instruction at the end of the loop to

correct the state of the register allocator before following the loop

edge back to the loop header. The branch instruction in the original

bytecode was compiled as a guard instruction, which exits the

compiled code when execution diverges from the predicted path.

We use the link register of the PowerPC to record the location

where this bail-out happened, which allows the VM to unwind the

execution state accordingly.

The resulting machine code for the example introduced in the

previous section is shown in Figure 5.

3.4

Side Exits

The initial version of our compiler did not generate speciﬁc ma-

chine code for individual side exits. Guard instructions were trans-

lated to conditional branch and link instructions that all pointed to

the same compensation code. Every time a guard instruction is ex-

ecuted (whether it passes or fails), the processor updates the link

sentially the address of the conditional branch instruction that was

generated for the guard instruction. We can make such liberal use of

the link register, because we don’t invoke any methods from within

compiled traces. If a guard instruction fails, execution is transfered

to the side exit compensation code shared by all guard instructions.

The compensation code ﬁrst has to locate the appropriate register

to local variable mapping information by translating back from the

machine address recorded by the branch instruction to the address

of the branch instruction in the intermediate representation that ul-

timately contains the pointer to the register mapping information.

Benchmarks have shown that this approach creates a signiﬁcant

overhead for traces that are frequently exited through side exists

after only few iterations (or no complete iteration at all). We have

thus changed our code generator to generate dedicated compensa-

tion code for each side exit.

Each side exit stub consists of a series of memory stores trans-

ferring values back from hardware registers into the appropriate

local variable storage area.

If the side exit occurred while executing an inlined method, the

compensation code has to generate additional stack frames to write

back the current state.

In our implementation we actually write back the values ﬁrst,

and then create the new stack frames ”around” the already written

back stack and local variable information, because most temporary

values are held in hardware registers, which in turn would have

to be ﬂushed to memory before we can execute the high-level

functions of JamVM that are responsible for creating new stack

frames. As a consequence we have to ensure that the garbage

collector is not triggered while we are writing back the stack frames

(or while we execute compiled code). We do so by holding the

garbage collector lock at all times when compiled code is executed.

While in principle this blocks the garbage collector from re-

claiming unused memory while compiled code is executed, and

could cause a memory overﬂow, in practice this does not seem to

be a signiﬁcant limitation. Performance-critical code such as a loop

rarely performs any memory allocations.

The main reason for seeing so few memory allocations inside

loop bodies is the inherent cost of the memory allocation opera-

Yüklə 217,56 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8