Coder Social home page Coder Social logo

decompiler's Introduction

decompiler Build Status

A multi-backends decompiler written in python. It currently supports IDA and Capstone.

Usage with Capstone

Install Capstone's Python bindings like followings:

$ sudo pip install capstone

Then try out the decompiler:

from capstone import *
from decompiler import *
from host import dis
from output import c

# Create a Capstone object, which will be used as disassembler
md = Cs(CS_ARCH_X86, CS_MODE_32)

# Define a bunch of bytes to disassemble
code = "\x55\x89\xe5\x83\xec\x28\xc7\x45\xf4\x00\x00\x00\x00\x8b\x45\xf4\x8b\x00\x83\xf8\x0e\x75\x0c\xc7\x04\x24\x30\x87\x04\x08\xe8\xd3\xfe\xff\xff\xb8\x00\x00\x00\x00\xc9\xc3"

# Create the capstone-specific backend; it will yield expressions that the decompiler is able to use.
disasm = dis.available_disassemblers['capstone'].create(md, code, 0x1000)

# Create the decompiler
dec = decompiler_t(disasm, 0x1000)

# Transform the function until it is decompiled
dec.step_until(step_decompiled)

# Tokenize and output the function as string
print(''.join([str(o) for o in c.tokenizer(dec.function).tokens]))

The snippet of code above should output:

func() {
   s0 = 0;
   if (*s0 == 14) {
      s2 = 134514480;
      3830();
   }
   return 0;
}

Much like Capstone itself, the capstone backend does not know what address is a string, and has no concept of named location. This is why 3830() and 134514480 appear as they do in the decompiled code above. You can give this information to the disassembler backend for a prettier output:

disasm.add_string(134514480, "string")
disasm.add_name(3830, "func_3830")
print(''.join([str(o) for o in c.tokenizer(dec.function).tokens]))

Now the decompiled output is:

func() {
   s0 = 0;
   if (*s0 == 14) {
      s2 = 'string';
      func_3830();
   }
   return 0;
}

Current status

It is currently capable of decompiling small functions with fairly simple control flow. It may also be able to decompile larger functions by pure luck. It shows what can be done in a few thousand lines of python.

Test binaries are provided in tests/.

How does it work?

This project is based on a paper by van Emmerik titled Static Single Assignment for Decompilation.

Roadmap

This project could use some improvements in the following areas:

  • more instructions are needed. currently this decompiler supports a very limited number of x86/x64 instructions.
  • there is currently no attempt at data type analysis, which would be necessary in order to produce a recompilable output, or even a more correct output.
  • add support for different types of assemblies (ARM, etc).
  • add support for more calling conventions. currently, only SystemV x64 ABI (x64 linux gcc) is supported. under other compilers, function calls will be displayed without parameters.
  • add a GUI for renaming variables, inverting if-else branches, and other easy things.
  • when possible, functions called from the one being decompiled should be analysed to determine function arguments and restored registers.

decompiler's People

Contributors

aquynh avatar einstein- avatar ekse avatar gabc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

decompiler's Issues

"ir-parser" disasm error

I'm trying to run tests/common/ply/ir_parser.py (for sample code it contains). I get:

$ PYTHONPATH=../../../src python ir_parser.py
Traceback (most recent call last):
  File "ir_parser.py", line 271, in <module>
    print parse(text)
  File "ir_parser.py", line 256, in parse
    return parser.parse(text, lexer=ir_lexer.lexer)
  File "/usr/lib/python2.7/dist-packages/ply/yacc.py", line 269, in parse
    return self.parseopt_notrack(input,lexer,debug,tracking,tokenfunc)
  File "/usr/lib/python2.7/dist-packages/ply/yacc.py", line 1051, in parseopt_notrack
    tok = self.errorfunc(errtoken)
  File "ir_parser.py", line 233, in p_error
    raise RuntimeError("Syntax error in input: %s" % (repr(p), ))
RuntimeError: Syntax error in input: LexToken(:,':',2,8)

Is there more info about this disassembler/syntax?

Call arguments not detected

Currently no attempt is done for recognizing arguments to function calls, which leads to wrong decompiled output.

How to use assembly as input?

If i just want the assembly to be decompiled, not binary as input to be disassembled and decompiled at the same time. What should i do?

Unit tests not working

I'm getting several errors while doing unit tests:
no module named ply.yacc
no module named expressions
..............................statements

Loop decompilation not supported?

I ran :

          i = 0;
    100:  if (i >= 100) goto 400;
          i = i + 1;
          goto 100;
    400:  return i;

thru dec.step_until(step_decompiled), and not really getting any decompiled code, output is SSA basic blocks, the same as for dec.step_until(step_decompiled).

Does that mean that loops are not supported yet? Note that SSA of non-looping constructs is trivial matter (like converting out of SSA). The real complications start with loops. And I wonder how sound is your out-of-SSA algorithm.

Proposing a PR to fix a few small typos

Issue Type

[x] Bug (Typo)

Steps to Replicate and Expected Behaviour

  • Examine src/ir/generic.py, tests/common/disassembler.py and observe yeilds, however expect to see yields.
  • Examine src/filters/simplify_expressions.py and observe substraction, however expect to see subtraction.
  • Examine src/ir/init.py and observe independant, however expect to see independent.
  • Examine src/host/ida/ui/graph_test.py and observe differenciate, however expect to see differentiate.
  • Examine src/ir/intel.py and observe comparision, however expect to see comparison.

Notes

Semi-automated issue generated by
https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md

To avoid wasting CI processing resources a branch with the fix has been
prepared but a pull request has not yet been created. A pull request fixing
the issue can be prepared from the link below, feel free to create it or
request @timgates42 create the PR. Alternatively if the fix is undesired please
close the issue with a small comment about the reasoning.

https://github.com/timgates42/decompiler/pull/new/bugfix_typos

Thanks.

IDA UI is broken

Was not fixed after rewriting most of the decompiler code. It is most certainly broken currently.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.