Coder Social home page Coder Social logo

Comments (7)

pfalcon avatar pfalcon commented on July 29, 2024

Ah, and yes, IR needs to be documented ;-).

from decompiler.

EiNSTeiN- avatar EiNSTeiN- commented on July 29, 2024

I'm not sure what you mean by that. Converting assembly into IR is done by language-specific modules (see src/ir and src/host/*/dis), it's not intended to be done by hand. The "ir parser" is meant for testing the decompiler steps without coupling the tests to a specific disassembler. Outside of that specific use case, it's much too limited to drive a full fledged decompiler, because there would be no way to express things like operand size, operator type (floating point additions vs. integer, etc).

from decompiler.

pfalcon avatar pfalcon commented on July 29, 2024

I'm not sure what you mean by that. Converting assembly into IR is done by language-specific modules (see src/ir and src/host/*/dis)

Then you strongly couple your decompiler to particular mundane disassemblers there. If I don't own IDA and have architecture not supported by Capstone (and it supports only the pop-up-to-boringness ones ;-) ), then I'm hosed - in a sense that I need to dig very deep into many aspects of your decompiler to interface it to something else, instead of just making that "something else" output a standard IR textual form and feed it into your decompiler.

it's much too limited to drive a full fledged decompiler, because there would be no way to express things like operand size, operator type (floating point additions vs. integer, etc).

Ok, if your IR supports all those features, can you please consider extending the syntax, and adding parsing support for that (I assume dump support already work)?

And I assume you made your own IR syntax for a reason, and I can give only +1 on that, because when you look into some existing solution, you immediately get an expression that it's over-engineered, but well, a subset of LLVM syntax might work ;-).

from decompiler.

EiNSTeiN- avatar EiNSTeiN- commented on July 29, 2024

Then you strongly couple your decompiler to particular mundane disassemblers there.

src/ir is the generic disasembler-to-ir code, src/host is strongly coupled to the underlying disassembler (IDA, capstone) but can be ported over to other disassemblers fairly easily with the generic part (in  src/ir) saying the same.

If I don't own IDA and have architecture not supported by Capstone [...] I need to dig very deep into many aspects of your decompiler to interface it to something else

Currently this decompiler only support intel assembly (and not all instructions either), so if your goal is to decompile anything else you will need to write the disassembler-to-ir code for whichever combination of host software and assembly language you wish to decompile.

Ok, if your IR supports all those features, can you please consider extending the syntax, and adding parsing support for that (I assume dump support already work)?

The IR is not meant to be parsed from text with ir_parser.py, that is only for testing purpose. If you want to parse assembly into IR, you would not go through an intermediary "text" that can be dumped/parsed out. What you would do is write support for your target assembly language in src/ir and then write a host-specific module for the disassembler you want to use in src/host.

from decompiler.

pfalcon avatar pfalcon commented on July 29, 2024

If you want to parse assembly into IR, you would not go through an intermediary "text" that can be dumped/parsed out.

Sorry, that's exactly what I will do, and that's the basic requirement. It's complex stuff, so having good (human-friendly) representation for intermediate steps is vital. Also, nobody will be able to write "decompiler for everything", so that reduces people to writing "decompiler for X", and that immediately drastically prunes target user base and the reason the decompilation is where it is, with unmaintainable C crapware like Boomerang in ashes for a decade, and bunch of folks writing new crippled toy-likes, e.g. this dude https://github.com/electrojustin/triad-decompiler has a segfaulting thing which can decompile (simple) loops, but can't eliminate superfluous assignments because it doesn't do SSA, yours can do well with contracting expressions in acyclic code, but doesn't do loops, etc., etc.

The only solution to that problem is to completely decouple "decompiler" from "convert machine-specific asm to a generic IR" part. Then maybe there will be critical mass to work on "decompiler" part. It's oh so sad that people don't see this obvious solution ;-).

from decompiler.

EiNSTeiN- avatar EiNSTeiN- commented on July 29, 2024

The only solution to that problem is to completely decouple "decompiler" from "convert machine-specific asm to a generic IR" part. Then maybe there will be critical mass to work on "decompiler" part. It's oh so sad that people don't see this obvious solution ;-).

These 2 parts are well decoupled in my code already. I guess you could write a dumper and parser for IR as it is now, but currently the ir_parser.py is just a toy for testing purpose, it was never meant to parse text for decompilation purpose. I use it mostly for testing the SSA form.

Right now the only way to dump out IR (or any other intermediate decompilation step) is to use the C output class (src/output/c.py) but that is a lossy translation as it's meant to look like readable C. As I mentioned it will lose tons of information about operands.

You could very well write a new output module that is more verbose just for IR. I would merge a PR for this without problem, but it's not on my roadmap to write one.

from decompiler.

pfalcon avatar pfalcon commented on July 29, 2024

Thanks for explanation. I'll think about it, but lack of loop support prioritizes getting back to looking at other folks' stuff. For reference, proper conversion out of SSA in presence of loops is where I stuck with my crippled toy, a compiler-in-python https://github.com/pfalcon/llvm-codegen-py . I smartly left boring parts to things like clang, and thought that using LLVM IR which is already in SSA will make my task much easier. Turns out, conversion out of SSA requires about same effort and similar algos as converting to SSA, and actually one algo is similar to register allocation, so it itches to combine them, but then it only gets more complex... end result: project stuck.

Ah, and also for reference, next in my queue is https://github.com/pfalcon-mirrors/decomp-6502-arm . That does conversion out of SSA, but bugs were reported for loops, surprise. Funnily, the guy eventually just deleted the repo, prompting me to mirror this GPL code, as a tribute to vain community efforts ;-).

from decompiler.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.