This is an optimizer created for cs 6810, compiler optimizations, and Western Michigan University. See below for the exact optimizations performed.
Run the following in the root directory of the repository (the same directory that this README is in):
make
Sometimes the build process will randomly fail... probably because of the fact that make
is configured to build with multiple threads. Just run make again and it should work out.
There are 3 components to this project, and calling make
will build each one in the following order:
- The antlr4 runtime library
This is needed for antlr4 to work and has definitions for core antlr4 functionality.
You can find the source code for this portion in ./antlr/lib/antlr4-runtime. I copied the code from the official antlr4 repository.
- The parser functions generated by antlr4
This generates the classes such as ilocBaseVisitor
which I extend.
You can find the grammar file and the generated library in ./antlr/src/parser. It uses the antlr4 jar file to generate c++ files. The jar file is in ./antlr/lib/.
- My driver and optimizer code
This uses the generated antlr4 code to parse an iloc file, extract basic blocks, apply LVN, global common subexpression elimination, dead code elimination, and register allocation.
The source code for this portion is in .antlr/src/driver.
Once you've built the project, you can use iloc.sh
, found at the root level of this repository (the same level as this README), to run every .il
file in ./input
through the optimizer. The optimized files are saved in ./output
as {filename}.ra.il
. The script first outputs the time it took to optimize each file, and then it outputs the difference between the outputs of the original .il
file and the optimized ra.il
file (which should only show a difference in instructions executed).
This script provides part of the functionality for the master iloc.sh
script.
It tests the output differences between an unoptimized and optimized .il
file.
You must be in the ./antlr
directory to use this script. Usage:
cd ./antlr # must be here!
quick-test.sh ../input/qs.il # tests qs.il
quick-test.sh # tests all files in ../input
In the current state of the optimizer, the following optimizations are applied in this order.
- Local value numbering
- Global common subexpression elimination
- Dead code elimination
- Global register allocation
You can pick and choose which optimizations to apply when you run the driver:
# in ./antlr
./driver ../input/qs.il ls # lvn and global common subexpression elimination
./driver ../input/qs.il sd # gcse and dead code elimination
./driver ../input/qs.il lsdr # all optimizations mentioned above.
./driver ../input/qs.il # same as lsdr
Dead code elimination and register allocation require the global common subexpression optimization (s
) to be run before them.
Register allocation is done with the Chaitin-Briggs bottom up algorithm. Live ranges are computed from SSA, an interference graph is built, and registers are allocated by attempting to color the graph. If the graph can't be colored, uncolored live ranges are spilled and another attempt is made at coloring the graph.
Live ranges are spilled by inserting store instructions after every definition and inserting load instructions before every use. This essentially changes the location of a live range from being in a register to being in a spot in memory. The live ranges are stored on the stack.
Since iloc is pass by reference, an additional step has to be taken when spilling function arguments. Within the function, all arguments are treated as if they are live at all times in live variable analysis until they are spilled. When they are spilled, before each return, they need to be loaded back into their allocated register so that they can be used outside of the function.
The register allocation process crashes when trying to allocate for ./input/dynamic.lvn.dbre.dce.ruc.il
and input/newdyn.lvn.dbre.dce.ruc.il
.
As mentioned before, since iloc is pass by reference, special cases have to be caught when doing register allocation.
.frame quicksort, 8, %vr4, %vr5, %vr6
...
icall partition, %vr4, %vr5, %vr6 => %vr11
...
ret
In this example, the quicksort
function takes three arguments. Inside of the function, another function, partition
, is called. Since iloc is pass by reference, partition
can (and does) change the contents of some of the arguments. This is fine in iloc, since each argument register is allocated a unique virtual register. But when it comes to register allocation, special care must be taken to make sure the right values are in the right registers upon return from a function.
To model pass by reference from outside of a function, I added lvalues to each call instruction to represent a new definition of the arguments. This made sure that live variable analysis correctly identified the affects of call by reference.
icall partition, %vr4, %vr5, %vr6 => %vr11
# can redefine %vr4, %vr5, and %vr6 as well
icall partition, %vr4, %vr5, %vr6 => %vr11, %vr4, %vr5, %vr6
From inside of the function, I added a special case to register allocation that considered arguments to always be live until they were spilled. When a function argument is spilled, I make sure that it is re-loaded into the register passed as an argument before any return.
.frame quicksort, 36, %vr4, %vr7, %vr6
storeAI %vr4 => %vr0, -36 # vr4 is spilled
...
loadAI %vr0, -36 => %vr4 # make sure vr4 has what it's supposed to have for call by reference before return
ret
To calculate live ranges, I convert the program to SSA form and create a live range for every unique SSA name. I then merge any live ranges that get merged at phi nodes.
This can have bad effects when trying to run register allocation with a very small number of available registers (such as 2). A large live range like this can be made of a number of SSA virtual registers. When spilling a live range like this, the store/load instructions are successfully inserted, but since the live range is made of multiple virtual registers throughout the program, each instance can still interfere with one other live range each. Each of these interferences go back into the original live range, meaning that the graph can't be colored.
I was not able to correct this issue, but I think that creating new live ranges upon spilling might fix it. Creating new live ranges would separate the spread out virtual registers into their own live ranges and reduce interference.
I also tried not counting interferences on live ranges that have already been spilled, but that had unintended consequences that created incorrect optimized code.
To implement global common subexpression elimination, we need to utilize the fact that in iloc, syntactically equivalent expressions always store into the same register. However, not all input provided actually followed this. For example, while_array.il:
...
testle %vr7 => %vr8 <--
cbr %vr8 -> .L1
.L0: nop
loadI 1 => %vr5
i2i %vr5 => %vr4
loadI 10 => %vr6
comp %vr4, %vr6 => %vr7
testgt %vr7 => %vr8 <--
...
These two test
instructions are actually different operations, but they store into the same register. To address this issue, instead of storing "what defined %vr8
" when I see it, I store something more like "what last defined %vr8
, and does it exactly match what I'm currently looking at?".
Another instance of instructions "breaking" the "syntactic equivalence" rule of iloc are i2i
instructions. I don't actually believe they're "breaking" the rule, because I remember talking about the distinction between memory and expression registers in class, but it still needed to be accounted for.
This was further complicated by the fact that local value numbering was changing some i2i
instructions into loadI
instructions, so they would appear to be expression registers, but later be used later as memory registers.
To address this issue, before any optimization pass got to touch the input code, I had another pass go through each instruction and mark it as a memory register or an expression register based on the opcode in used in the instruction that defined it. Then, in my optimizer code, I would never delete instructions that store to a memory register.
After that, I realized that some expressions used memory registers as operands and it would sometimes mean I couldn't delete them, even if I had already seen them. The reason I couldn't delete them was the following: because memory registers don't follow the "syntactic equivalence" rule of iloc, a syntactically equivalent expression that used a memory register could still be semantically different.
To address this, when marking registers as memory or expression registers, I introduced a third register type called a "mixed expression" register. A mixed expression register is an expression register that uses memory registers as operands and is not allowed to be deleted.
File | Original | Optimized - No Register Allocation | 6 registers | 8 registers | 12 registers | 16 registers |
---|---|---|---|---|---|---|
arrayparam.il |
841 | 474 | 876 | 528 | 474 | 474 |
bubble.il |
4374 | 2761 | 4208 | 3895 | 2832 | 2761 |
check.il |
140 | 5 | 5 | 5 | 5 | 5 |
dynamic.il |
39155 | 24335 | N/A | 31749 | 26107 | 24360 |
fib.il |
274 | 230 | 639 | 537 | 230 | 230 |
gcd.il |
103 | 84 | 170 | 108 | 84 | 84 |
newdyn.il |
136919 | 85151 | 141227 | 117812 | 90527 | 85151 |
qs.il |
4574 | 3464 | N/A | 4982 | 3654 | 3464 |
while_array.il |
377 | 278 | 537 | 339 | 278 | 278 |
File | No Register Allocation | 6 registers | 8 registers | 12 registers | 16 registers |
---|---|---|---|---|---|
arrayparam.il |
0.278s | 0.531s | 0.366s | 0.358s | 0.403s |
bubble.il |
1.013s | 1.943s | 2.024s | 1.904s | 1.634s |
check.il |
0.055s | 0.061s | 0.054s | 0.052s | 0.051s |
dynamic.il |
5.765s | N/A | 9.670s | 11.857s | 12.009s |
fib.il |
0.040s | 0.084s | 0.071s | 0.056s | 0.060s |
gcd.il |
0.096s | 0.277s | 0.217s | 0.163s | 0.143s |
newdyn.il |
2.155s | 4.807s | 4.450s | 4.871s | 3.223s |
qs.il |
0.568s | N/A | 1.471s | 1.346s | 0.959s |
while_array.il |
0.127s | 0.288s | 0.248s | 0.199s | 0.182s |