Coder Social home page Coder Social logo

bitey's Introduction

Bitey - Bitcode Import Tool

Import LLVM bitcode directly into Python and use it as an extension module.

Warning

THIS IS PROOF-OF-CONCEPT SOFTWARE THAT DESPITE ITS CUTE NAME, MIGHT ACTUALLY BITE YOUR ENTIRE LEG OFF. USE AT YOUR OWN RISK!

Requirements

You'll need to have a pretty complete LLVM development environment installed on your machine. Bitey has been developed using LLVM/Clang-3.1. You might need to install it yourself.

In addition, you need to install the llvm-py extension. Get it at at http://www.llvmpy.org.

Bitey is unlikely to work with any older version of LLVM or the llvm-py extension--especially preinstalled versions distributed with your operating system. You need to be using bleeding-edge modern versions of these libraries.

Example and Basic Tutorial

First, you need some C code. Something important like computing a fibonacci number:

/* fib.c */
int fib(int n) {
    if (n < 3) {
       return 1;
    } else {
       return fib(n-1) + fib(n-2);
    }
}

Now, compile it into LLVM bitcode using clang:

bash % clang -emit-llvm -c fib.c

This makes an object file fib.o as usual--only the .o file contains LLVM bitcode. Now, just import it into Python:

>>> import bitey
>>> import fib
>>> fib.fib(38)
39088169
>>>

Yes, that's it. Bitey does not use the C compiler, the linker, or the dynamic loader. You don't write wrapper functions either. Write normal C, compile it with clang, and import it. Done.

Bitey understands most basic C datatypes including integers, floats, void, pointers, arrays, and structures. Because it builds a ctypes based interface, you would access the code using the same techniques. Here is an example that mutates a value through a pointer:

/* mutate.c */

void mutate_int(int *x) {
     *x *= 2;
}

Here's how you would use this:

% clang -emit-llvm -c mutate.c
% python 
>>> import bitey
>>> import mutate
>>> import ctypes
>>> x = ctypes.c_int(2)
>>> mutate.mutate_int(x)
>>> x.value
4
>>> 

Here is an example involving a structure:

/* point.c */
#include <math.h>

struct Point {
    double x;
    double y;
};

double distance(struct Point *p1, struct Point *p2) {
    return sqrt((p1->x - p2->x)*(p1->x - p2->x) + 
                (p1->y - p2->y)*(p1->y - p2->y));
}

To run:

% clang -emit-llvm -c point.c
% python
>>> import bitey
>>> import point
>>> p1 = point.Point(3,4)
>>> p2 = point.Point(6,8)
>>> point.distance(p1,p2)
5.0
>>> 

One subtle issue with structure wrapping is that LLVM bitcode doesn't encode the names of structure fields. So, Bitey simply assigns them to an indexed element variable like this:

>>> p1.e0         # (Returns the .x component)
3
>>> p1.e1         # (Returns the .y component)
4

This can be fixed using a pre-load module as described in the "Advanced Topics" section later.

If you need to combine two LLVM object files together into a single importable module, use llvm-ld like this:

% llvm-ld point.o fib.o -b combined.o
% python
>>> import bitey
>>> import combined
>>> combined.fib(10)
55
>>> p1 = combined.Point(3,4)
>>> p2 = combined.Point(6,8)
>>> combined.distance(p1,p2)
5.0
>>> 

The C code you write can link with external libraries, but you might need to take special steps to load the library prior to import. For example, suppose you compiled the Fibonacci code into a shared library like this:

# OS-X
% gcc -bundle -export_dynamic fib.c -o libfib.so     

# Linux
% gcc -shared fib.c -o libfib.so

Now, suppose you had some C code that wanted to access this library:

/* sample.c */
#include <stdio.h>
extern int fib(int n);

void print_fib(int n) {
    while (n > 0) {
        printf("%d\n", fib(n));
        n--;
    }
}

If you try to build it normally, you'll get an error:

% clang -emit-llvm -c sample.c
% python
>>> import bitey
>>> import sample
LLVM ERROR: Program used external function 'fib' which could not be resolved!
% 

However, you can load the library yourself doing this:

% python
>>> import bitey
>>> bitey.load_library("./libfib.so")
<CDLL './libfib.so', handle 1003cfc60 at 10049d090>

>>> import sample >>> sample.print_fib(10) 55 34 21 13 8 5 3 2 1 1 >>>

It is important to note that Bitey is NOT a wrapper generator meant to access already-compiled C libraries. It only exposes functionality that has been explicitly compiled as LLVM bitcode. To access the contents of a library, you would need to compile and link it using clang and llvm-ld as shown in the examples.

How it works

Bitey extends Python with an import hook that looks for .o files containing LLVM bitcode. Type signatures and other information in the bitcode are then used to build a ctypes-based binding to the natively compiled functions contained within an LLVM execution engine. It's all a bit magical, but the LLVM JIT generates the executable code whereas Bitey makes the ctypes binding to it---all behind the scenes on import.

It's important to stress that Bitey does not use the C compiler, the linker, the dynamic loader, or make calls to subprocesses. It is completely self-contained and only uses the functionality of llvm-py and ctypes.

Performance

The performance profile of Bitey is going to be virtually identical that of using ctypes. LLVM bitcode is translated to native machine code and Bitey builds a ctypes-based interface to it in exactly the same manner as a normal C library.

As a performance experiment, here is a simple C function that checks if a number is prime or not:

int isprime(int n) {
    int factor = 3;
    /* Special case for 2 */
    if (n == 2) {
        return 1;
    }
    /* Check for even numbers */
    if ((n % 2) == 0) {
       return 0;
    }
    /* Check for everything else */
    while (factor*factor < n) {
        if ((n % factor) == 0) {
            return 0;
        }
        factor += 2;
    } 
    return 1;
 }

Try compiling this code into LLVM and a C shared library:

% clang -O3 -emit-llvm -c isprime.c

# OS-X
% gcc -O3 -bundle -undefined dynamic_lookup isprime.c -o isprime.so

# Linux
% gcc -O3 -shared isprime.c -o isprime.so

Now, let's put Bitey and ctypes in a head-to-head performance battle:

>>> import bitey
>>> from isprime import isprime as isprime1
>>> import ctypes
>>> ex = ctypes.cdll.LoadLibrary("./isprime.so")
>>> isprime2 = ex.isprime
>>> isprime2.argtypes=(ctypes.c_int,)
>>> isprime2.restype=ctypes.c_int
>>> 
>>> from timeit import timeit
>>> # Bitey
>>> timeit("isprime1(3)","from __main__ import isprime1")
1.1813910007476807
>>> # ctypes
>>> timeit("isprime2(3)", "from __main__ import isprime2")
1.2408909797668457
>>> 
>>> # Bitey
>>> timeit("isprime1(10143937)", "from __main__ import isprime1")
9.839216947555542
>>> # ctypes
>>> timeit("isprime2(10143937)", "from __main__ import isprime2")
9.663991212844849
>>> 

As you can see, the performance is just about the same. The main difference would come down to the efficiency of LLVM vs. gcc code optimization.

Advanced Usage

If you're up for a bit of adventure, the module creation process can be altered through the use of pre and post loading files.

A pre-load file provides Python code that executes within the newly created module prior to the LLVM-binding step. One use of this code is to specify the names of fields on data structures. For example, you can create the following pre-load file for the earlier Point example:

# point.pre.py

class Point:
    _fields_ = ['x','y']

If you do this, you'll find that the field-names get fixed:

>>> import point
>>> p = point.Point(3,4)
>>> p.x
3.0
>>> p.y
4.0
>>>

You could also use a pre-load file to load library dependencies:

# sample.pre.py
import bitey
bitey.load_library("./libfoo.so")

A post-load file allows you alter the contents of the module after LLVM-binding. You could use this to apply decorators or add additional support code. For example:

# point.post.py
#
# Example of decorating a function already wrapped

def decorate(func):
    def wrapper(*args, **kwargs):
        print "Calling", func.__name__
        return func(*args, **kwargs)
    wrapper.__name__ = func.__name__
    return wrapper

# Wrap the distance wrapper already created
distance = decorate(distance)

The combination of the pre/post loading files gives you almost unlimited opportunity for insane evil when loading the bitcode. It must be stressed that that these files are executed in the space of the module being created---they are not separate imports (i.e., the pre, post, and LLVM bindings all co-exist in the same module namespace).

Automatic Binding

In the examples, it is necessary to use import bitey for modules to be recognized and loaded. If you want to skip this step and make everything automatic, create a bitey.pth file that contains the following statement:

# bitey.pth
import bitey

Now, copy this file to the Python site-packages directory.

FAQ

Q: Will Bitey ever support C++?

A: No. C++ can bite me (*)

(*) I also wrote Swig and still have C++ scars.

Q: Why is it called "Bitey?"

A: Well, "Bitey" is so much more catchy than simply calling it something boring like "BIT (Bitcode Import Tool)". Plus, just like @johnderosa's pet Pomeranian of the same name, you're never quite sure whether "Bitey" is adorably cute or a viscious beast that will constantly nip your leg. Actually, I just like the ring of it--"Bitey" sort of rhymes with "Enterprisey".

Discussion Group

A discussion group for Bitey is available at http://groups.google.com/group/bitey

Authors

bitey's People

Contributors

dabeaz avatar takluyver avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bitey's Issues

Magic number checking code in loader.py seems wrong

In _check_magic you have:

magic = open(filename,"rb").read(4)

And then:

`elif magic == b'\x42\x43':``

It seems wrong to me. magic is 4 bytes long! You maybe meant:

elif magic[:2] == b'\x42\x43 ?

Without this change bitey examples don't run

Patch to make biteymagic work w/o spawning CLI window on Windows

While executing with biteymagic on Windows, command line window pops (calling clang action) which can be improved (disabled) by adding couple of lines:

--- biteymagic.bak  2012-11-10 22:34:43.406173200 +0100
+++ biteymagic.py   2012-11-10 22:30:38.421798200 +0100
@@ -75,10 +75,19 @@

         if not os.path.exists(o_path):
             try:
-                subprocess.check_output(['clang', '-c', '-emit-llvm'] +
-                                        line.split() + [c_name],
-                                        stderr=subprocess.STDOUT,
-                                        cwd=lib_dir)
+                if sys.platform == "win32":
+                    startupinfo = subprocess.STARTUPINFO()
+                    startupinfo.dwFlags |= subprocess.STARTF_USESHOWWINDOW
+                    subprocess.check_output(['clang', '-c', '-emit-llvm'] +
+                                            line.split() + [c_name],
+                                            stderr=subprocess.STDOUT,
+                                            cwd=lib_dir,
+                                            startupinfo=startupinfo)
+                else:
+                    subprocess.check_output(['clang', '-c', '-emit-llvm'] +
+                                            line.split() + [c_name],
+                                            stderr=subprocess.STDOUT,
+                                            cwd=lib_dir)
             except subprocess.CalledProcessError as e:
                 print(e.output, file=sys.stderr)
                 print("ERROR: command `%s` failed." %

python dumps core on "import fib" in "typical" install

The question is: which versions of the libs are required?

I've tried to install things in "dumb user" way (in Ubuntu 12.04):

sudo apt-get install clang llvm-dev python-llvm \  
&& git clone https://github.com/dabeaz/bitey.git \  
&& cd bitey \  
&& python ./setup.py build \  
&& sudo python ./setup.py install  \  
&& python  
Python 2.7.3 (default, Aug  1 2012, 05:16:07)   
[GCC 4.6.3] on linux2  
Type "help", "copyright", "credits" or "license" for more information.  
>>> import bitey  
>>> import fib  
Segmentation fault (core dumped)

where

$ cat fib.c
/* fib.c */
int fib(int n) {
    if (n < 3) {
       return 1;
    } else {
       return fib(n-1) + fib(n-2);
    }
}
$ file fib.*
fib.c: ASCII text
fib.o: LLVM bitcode
$ dpkg -l | egrep 'clang|llvm-dev|python-llvm'
ii  clang                                   3.0-6ubuntu3                            Low-Level Virtual Machine (LLVM), C language family frontend
ii  libclang-common-dev                     3.0-6ubuntu3                            clang library - Common development package
ii  llvm-dev                                2.9-7                                   Low-Level Virtual Machine (LLVM), libraries and headers
ii  python-llvm                             0.6+svn105-1                            Python bindings for LLVM

segfault if py_module gets garbage collected

I was working on an IPython extension which provides a cell magic for running C code via bitey, e.g.:

In [4]: %load_ext biteymagic

In [5]: %%bitey
   ...: int life() { return 42; }
   ...: 

In [6]: life()
Out[6]: 42

However I was running into random segfaults, often after running %timeit or similar, indicating that garbage collection was likely to blame. To cut to the chase, the issue was because I called bitey.loader.build_module to build a module, extracted the functions into the current namespace, but never saved the built module in sys.modules. When the module was garbage collected (and hence _llvm_engine/_llvm_module get deleted).... BOOM!

To fix the bug, either of the following would be sufficient:

  1. I could add the module to sys.modules. In this case, I'd suggest a short note in the docstring.
  2. Keep a reference in the python wrapper function object to the execution engine, i.e. set _llvm_engine and _llvm_module as attributes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.