2003-06-26-Reoptimizer2.txt
3.89 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
Thu Jun 26 14:43:04 CDT 2003
Information about BinInterface
------------------------------
Take in a set of instructions with some particular register
allocation. It allows you to add, modify, or delete some instructions,
in SSA form (kind of like LLVM's MachineInstrs.) Then re-allocate
registers. It assumes that the transformations you are doing are safe.
It does not update the mapping information or the LLVM representation
for the modified trace (so it would not, for instance, support
multiple optimization passes; passes have to be aware of and update
manually the mapping information.)
The way you use it is you take the original code and provide it to
BinInterface; then you do optimizations to it, then you put it in the
trace cache.
The BinInterface tries to find live-outs for traces so that it can do
register allocation on just the trace, and stitch the trace back into
the original code. It has to preserve the live-ins and live-outs when
it does its register allocation. (On exits from the trace we have
epilogues that copy live-outs back into the right registers, but
live-ins have to be in the right registers.)
Limitations of BinInterface
---------------------------
It does copy insertions for PHIs, which it infers from the machine
code. The mapping info inserted by LLC is not sufficient to determine
the PHIs.
It does not handle integer or floating-point condition codes and it
does not handle floating-point register allocation.
It is not aggressively able to use lots of registers.
There is a problem with alloca: we cannot find our spill space for
spilling registers, normally allocated on the stack, if the trace
follows an alloca(). What might be an acceptable solution would be to
disable trace generation on functions that have variable-sized
alloca()s. Variable-sized allocas in the trace would also probably
screw things up.
Because of the FP and alloca limitations, the BinInterface is
completely disabled right now.
Demo
----
This is a demo of the Ball & Larus version that does NOT use 2-level
profiling.
1. Compile program with llvm-gcc.
2. Run opt -lowerswitch -paths -emitfuncs on the bytecode.
-lowerswitch change switch statements to branches
-paths Ball & Larus path-profiling algorithm
-emitfuncs emit the table of functions
3. Run llc to generate SPARC assembly code for the result of step 2.
4. Use g++ to link the (instrumented) assembly code.
We use a script to do all this:
------------------------------------------------------------------------------
#!/bin/sh
llvm-gcc $1.c -o $1
opt -lowerswitch -paths -emitfuncs $1.bc > $1.run.bc
llc -f $1.run.bc
LIBS=$HOME/llvm_sparc/lib/Debug
GXX=/usr/dcs/software/evaluation/bin/g++
$GXX -g -L $LIBS $1.run.s -o $1.run.llc \
$LIBS/tracecache.o \
$LIBS/mapinfo.o \
$LIBS/trigger.o \
$LIBS/profpaths.o \
$LIBS/bininterface.o \
$LIBS/support.o \
$LIBS/vmcore.o \
$LIBS/transformutils.o \
$LIBS/bcreader.o \
-lscalaropts -lscalaropts -lanalysis \
-lmalloc -lcpc -lm -ldl
------------------------------------------------------------------------------
5. Run the resulting binary. You will see output from BinInterface
(described below) intermixed with the output from the program.
Output from BinInterface
------------------------
BinInterface's debugging code prints out the following stuff in order:
1. Initial code provided to BinInterface with original register
allocation.
2. Section 0 is the trace prolog, consisting mainly of live-ins and
register saves which will be restored in epilogs.
3. Section 1 is the trace itself, in SSA form used by BinInterface,
along with the PHIs that are inserted.
PHIs are followed by the copies that implement them.
Each branch (i.e., out of the trace) is annotated with the
section number that represents the epilog it branches to.
4. All the other sections starting with Section 2 are trace epilogs.
Every branch from the trace has to go to some epilog.
5. After the last section is the register allocation output.