A set of simple LLVM passes for analyzing llvm bitcode generated by tipc
TIP is a "Tiny Imperative Programming" language developed by Anders Moeller and Michael I. Schwartzbach for the Static Program Analysis lecture notes that they developed for graduate instruction at Aarhus University.
Accompanying those notes is a Scala implementation that provides a number of static analysis implementations and interpreter-based evaluators.
This project processes the output tipc
a compiler from TIP programs to LLVM bitcode.
There are four passes in this project:
funvisitpass
: the simplest imaginableFunctionPass
printinstpass
: a function pass that identifies and prints a subset of instructions that are relevant for TIP programsuserspass
: an extension ofprintinstpass
that prints the users of an instructionintervalrangepass
: a skeletal interval range analysis pass
The first three are intended just to give a basic idea of how to write little passes so that you can experiment with getting up the LLVM pass writing learning curve.
The third serves as a basis for getting into pass writing a bit more deeply. It is an intentionally partial implementation of an interval range analysis for integer expressions (which are encoded as bitcode variables). This serves to illustrate:
- how SSA can be exploited to simplify the management of analysis state;
- how data flow analyses can exploit the use-def relationships that are explicit in SSA; and
- how abstract domains and associated transfer functions can be encoded to support an analysis.
The pass is not intended to reflect the architecture, coding style, or efficiency one finds in LLVM analysis passes. Rather the intention is to very clearly express how the analysis functions as relates to the above issues.
These passes were developed and tested on Ubuntu 18.04 LTS using llvm-9
.
To install the necessary packages on ubuntu run:
apt install libllvm-9-ocaml-dev libllvm9 llvm-9 llvm-9-dev llvm-9-doc llvm-9-examples llvm-9-runtime llvm-9-tools libclang-common-9-dev
This project uses CMake to manage the build process. There are a set of CMakeLists.txt
files that orchestrate the build process.
You need to create a build
directory within which you will build the passes. To get started you should create that, empty, build directory if it doesn't exist. Follow these steps:
mkdir build
cd build
cmake ..
make
During development you need only run steps 1 and 2 a single time, unless you modify the CmakeLists.txt file. Just run make in the build directory to rebuild after making changes to your tool source. If you run into trouble with cmake
then you can delete the file build/CMakeCache.txt
and rerun the cmake
command.
This will create a set a subdirectory for each pass in the build
directory.
These passes all emit output on the standard error stream.
To run a pass you use LLVM's opt
tool. Since the passes are built using llvm-9 you need to use that version of opt
-- if it is not the default you can invoke it as opt-9
. You run a pass using the following command:
opt -load <pathto>/passfile.so -passname < <pathto>/file.bc >/dev/null
where passfile.so
is the name of the shared object (dynamic) library for the pass and passname
is the name used to register the pass (see the declaration of the form static RegisterPass<...> X(...)
in the source of the pass, or just look below).
Since opt
writes the transformed bitcode file to output you need to either pipe the result to /dev/null
, as above, or use the -o <filename>
option to redirect it to a file.
There are four passes in this project:
funvisitpass
: the simplest imaginableFunctionPass
;passname
isfvpass
printinstpass
: a function pass that identifies a subset of instructions that are relevant for TIP programs;passname
ispipass
userspass
: an extension ofprintinstpass
that prints the users of an instruction;passname
isuserspass
intervalrangepass
: a skeletal interval range analysis pass that depends on the program being converted to SSA form, so you need to add the-mem2reg
option to theopt
command line;passname
isirpass
These are all intra-procedural FunctionPass
es and, as such, are limited in their precision. There are many other limitations as well, e.g.,
- only a limited set of instructions are supported
- only a limited set of binary opcodes is supported
- the interval computations are unsound with respect to computer arithmetic
- the interval computations for some operators are very imprecise
- there is no widening implemented for intervals These are intentional as the passes are provided just as a context for learning about LLVM, i.e., by students.
If you want to understand what real LLVM passes look like, then have a look at LLVM source code.
The directory src/intervalrangepass/test
contains a set of tests interval*.tip
which can be run using the script runirpass.sh
. The script outputs a file, which are stored here as interval*.irpass
, that record the results of running the pass.
The source files are annotated with expected results, but it can be tricky to compare the source level view of the analysis results with the view in terms of the bitcode file. This is due in part to the lowering of the program to bitcode, which makes temporary computations explicit and thereby tracked by the analysis, and by the application of optimizations associated with the -mem2reg
pass.
There is a debug flag that you can add to the command line, -intervalrange-debug
, which prints information like the updating of interval values, adding user instructions to the worklist, etc.
Note that due to the limitations stated above these expected results are not computed correctly. Also take care that if you have a loop in your example, like interval5.tip
, the irpass
may diverge.
Perhaps the most relevant documentation for this project is the tutorial Writing an LLVM Pass.
There is lots of great general advice about using LLVM available:
- https://www.cs.cornell.edu/~asampson/blog/llvm.html
- the [LLVM Programmer's Manual(http://llvm.org/docs/ProgrammersManual.html) is a key resource
- someone once told me to just use a search engine to find the LLVM APIs and its a standard use case for me, e.g., I don't remember where the docs are I just search for
llvm irbuilder