Coder Social home page Coder Social logo

snowman's People

Contributors

yegord avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

snowman's Issues

Use relocations to disassemble more accurately

Snowman gets pretty confused with the following dll:
http://people.mozilla.org/~jmuizelaar/snowman/switch.dll

__declspec(dllexport)
const char *get(int k)
{
        switch (k+1) {
                case 0:
                        return "zero";
                case 1:
                        return "one";
                case 2:
                        return "two";
                case 3:
                        return "three";
                default:
                        return "other";
        }
}

const char *get2(int k)
{
        switch (k) {
                case 0:
                        return "zero";
                case 1:
                        return "one";
                case 2:
                        return "two";
                case 3:
                        return "three";
                default:
                        return "other";
        }
}

__declspec(dllexport)
const char *(*get3)(int k) = get2;

int DllMain(long handle, long reason, void* reserved)
{
        return 1;
}

Using the relocations in .reloc we can avoid treating the addresses in the jump table as instructions for disassembly.

Unbounded recursion in PropagateLiveness

0 libsystem_malloc.dylib 0x00007fff81846ca0 tiny_malloc_from_free_list + 12
1 libsystem_malloc.dylib 0x00007fff818473c3 szone_malloc_should_clear + 320
2 libsystem_malloc.dylib 0x00007fff81849868 malloc_zone_malloc + 71
3 libsystem_malloc.dylib 0x00007fff8184a27c malloc + 42
4 libc++.1.dylib 0x00007fff898b528e operator new(unsigned long) + 30
5 snowman 0x0000000103c26df0 std::1::pair<boost::unordered::iterator_detail::iterator<boost::unordered::detail::ptr_node<nc::core::ir::Term const*> >, bool> boost::unordered::detail::table_impl<boost::unordered::detail::set<std::1::allocator<nc::core::ir::Term const*>, nc::core::ir::Term const, boost::hash<nc::core::ir::Term const>, std::1::equal_to<nc::core::ir::Term const*> > >::emplace_impl<nc::core::ir::Term const* const&>(nc::core::ir::Term const* const&, nc::core::ir::Term const* const&&&) + 208 (unique.hpp:410)
6 snowman 0x0000000103c2b85f nc::core::ir::liveness::LivenessAnalyzer::makeLive(nc::core::ir::Term const
) + 79 (Liveness.h:46)
7 snowman 0x0000000103c2bb30 nc::core::ir::liveness::LivenessAnalyzer::propagateLiveness(nc::core::ir::Term const
) + 592 (LivenessAnalyzer.cpp:220)
8 snowman 0x0000000103c2bcda nc::core::ir::liveness::LivenessAnalyzer::propagateLiveness(nc::core::ir::Term const
) + 1018 (LivenessAnalyzer.cpp:237)
9 snowman 0x0000000103c2bbcb nc::core::ir::liveness::LivenessAnalyzer::propagateLiveness(nc::core::ir::Term const
) + 747 (iterator:1171)
10 snowman 0x0000000103c2bb30 nc::core::ir::liveness::LivenessAnalyzer::propagateLiveness(nc::core::ir::Term const_) + 592 (LivenessAnalyzer.cpp:220)
11 snowman 0x0000000103c2bcda nc::core::ir::liveness::LivenessAnalyzer::propagateLiveness(nc::core::ir::Term const_) + 1018 (LivenessAnalyzer.cpp:237)
12 snowman 0x0000000103c2bbcb nc::core::ir::liveness::LivenessAnalyzer::propagateLiveness(nc::core::ir::Term const_) + 747 (iterator:1171)
13 snowman 0x0000000103c2bb30 nc::core::ir::liveness::LivenessAnalyzer::propagateLiveness(nc::core::ir::Term const_) + 592 (LivenessAnalyzer.cpp:220)
14 snowman 0x0000000103c2bcda nc::core::ir::liveness::LivenessAnalyzer::propagateLiveness(nc::core::ir::Term const_) + 1018 (LivenessAnalyzer.cpp:237)
15 snowman 0x0000000103c2bbcb nc::core::ir::liveness::LivenessAnalyzer::propagateLiveness(nc::core::ir::Term const_) + 747 (iterator:1171)
16 snowman 0x0000000103c2bb30 nc::core::ir::liveness::LivenessAnalyzer::propagateLiveness(nc::core::ir::Term const_) + 592 (LivenessAnalyzer.cpp:220)
17 snowman 0x0000000103c2bcda nc::core::ir::liveness::LivenessAnalyzer::propagateLiveness(nc::core::ir::Term const_) + 1018 (LivenessAnalyzer.cpp:237)
18 snowman 0x0000000103c2bbcb nc::core::ir::liveness::LivenessAnalyzer::propagateLiveness(nc::core::ir::Term const_) + 747 (iterator:1171)
19 snowman 0x0000000103c2bb30 nc::core::ir::liveness::LivenessAnalyzer::propagateLiveness(nc::core::ir::Term const_) + 592 (LivenessAnalyzer.cpp:220)

I'll send the binary via-email

RFC: External dependencies like libbfd.

@yegord would you mind if we integrate the CMake-scripts for finding external dependencies like libbfd or libELF?

The ELFparser we have today is good but since different platforms (notably HPPA, Sparc and MIPS) have different extension for their ELF ABI wouldn't it be good to let an external library handle this if found?

what are those "void**" for argument and return register of a function?

This simple strlen implementation for Allegrex:

89014c4:    move       $v1, $a0
89014c8:    lb         $v0, 0($v1)
89014cc:    bnez       $v0, 0x089014C8
89014d0:    addiu      $v1, $v1, 1
89014d4:    nor        $v0, $zr, $a0
89014d8:    jr         $ra
89014dc:    addu       $v0, $v1, $v0

where only registers a0, v1 and v0 are used, gave me that strange c++ output:

void** strlen(unsigned char* a0, void** a1, void** a2, void** a3, void** t0, void** t1, void* t2) {
    unsigned char* v1_8;

    v1_8 = a0;
    while (*v1_8) {
        ++v1_8;
    }
    return (uint32_t)(v1_8 + 1) + ~(uint32_t)a0;
}

Note: Allegrex can have up to 8 argument as registers (a0-a3, t0-t3).

Assertion failed: DefinitionGenerator.cpp, line 515.

Application Specific Information: Assertion failed: ((jump->thenTarget().basicBlock() == thenBB && jump->elseTarget().basicBlock() == elseBB) || (jump->thenTarget().basicBlock() == elseBB && jump->elseTarget().basicBlock() == thenBB)), function makeExpression, file src/nc/core/ir/cgen/DefinitionGenerator.cpp, line 515.

I can send the binary required to reproduce this by email if desired.

Snowman should use a more sophisticated disassembly technique than linear

ARM decompilation currently seems to suffer quite a bit from confusing code and data and this should help there.

There are lots of options for a better technique

  • mcsema has something better but I don't know much about it.
  • ByteWeight http://security.ece.cmu.edu/byteweight/ seems to be what BAP is switching or is at least is a good candidate.
  • Dagger uses MCObjectDisassembler (a recursive traversal disassembler) from LLVM which made it upstream but was removed. In my experience it did not work very well.
  • I haven't looked at what radare uses.

Unable to set register as unsigned.

The following:

            case MIPS_INS_MULTU: {
                auto operand0 = operand(0);
                auto operand1 = operand(1);
                _[
                    regizter(MipsRegisters::hilo()) ^= (zero_extend(std::move(operand0), 64) * zero_extend(std::move(operand1), 64))
                ];
                break;
            }

will result in extra typecasts since snowman believes / assumes that 'hilo' is signed (int64) and not as it should be in this case uint_64.

"intrinsic" on test eax, eax

Hi,

Recently I implemented snowman in x64dbg and we noticed a weird thing. This code:

00007FF66E0D1444 | 48 83 EC 38              | sub rsp,38                              |
00007FF66E0D1448 | 48 83 64 24 20 00        | and qword ptr ss:[rsp+20],0             |
00007FF66E0D144E | 41 B9 01 00 00 00        | mov r9d,1                               |
00007FF66E0D1454 | 4C 8D 44 24 40           | lea r8,qword ptr ss:[rsp+40]            |
00007FF66E0D1459 | 41 8D 51 10              | lea edx,dword ptr ds:[r9+10]            |
00007FF66E0D145D | 48 C7 C1 FE FF FF FF     | mov rcx,FFFFFFFFFFFFFFFE                |
00007FF66E0D1464 | E8 AB FA FC FF           | call 7FF66E0A0F14                       |
00007FF66E0D1469 | 85 C0                    | test eax,eax                            |
00007FF66E0D146B | 78 0A                    | js plzplz.7FF66E0D1477                  |
00007FF66E0D146D | 80 7C 24 40 00           | cmp byte ptr ss:[rsp+40],0              |
00007FF66E0D1472 | 75 03                    | jnz plzplz.7FF66E0D1477                 |
00007FF66E0D1474 | CC                       | int3                                    |
00007FF66E0D1475 | EB 00                    | jmp plzplz.7FF66E0D1477                 |
00007FF66E0D1477 | 48 83 C4 38              | add rsp,38                              |
00007FF66E0D147B | C3                       | ret                                     |

Shows as:

void fun_140001444() {
    int32_t eax1;
    signed char v2;

    eax1 = fun_13ffd0f14();
    if (!"intrinsic"() && v2 == 0) {
    }
    return;
}

The tree appears to go in infinity recursion: tree

Here is a binary (modified to only have this function, no malware): https://mega.co.nz/#!TgZz2LJa!RhX6Cc-SUiw8-IQufGamfLncm7PorI5odyCQf8mkk7Y

Adding ALLEGREX architecture to snowman

I may plan to add a new architecture instead of using the current MIPS architecture which is a work in progress. because ALLEGREX is not recognized by capstone framework and the fact the latter is using LLVM makes the implementation of ALLEGREX too complex. There are subtle differences which make the use of MIPS architecture not viable for decompiling ALLEGREX code.

So, I am about to provide a specific disassembler (the one provided by pspdecompiler, based on prxtools one, but with addition of a decomposer) for the architecture analyser. It means handling of all instructions including VFPU.

I am pretty sure people from uOFW project may be interested as well. But I am also pretty sure that it will be hard to decompile kernel modules because they may use some tricks which are not ABI compliant, so I am expecting for more tasks to do than making a simple disassembler/analyzer.

As for the author in http://lists.derevenets.com/pipermail/snowman/2015-August/000002.html, it may be great that he/she contributes as well here (PRX handling).

Issue with stack handling.

How does snowman handle the stack pointer? It seems like instructions/registers trying to access the stack pointer register will fail on MIPS.

Why snowman doesn't resolve symbol on OSX?

Just decompiled a hello world binary stated in the examples page. Got this:

// snowman doesn't resolve the symbol
int64_t g100001010 = 0x100000fa0;

void fun_100000f88(int64_t rdi) {
    goto g100001010;
}

int64_t _main() {
    fun_100000f88("Hello, World!");
    return 0;
}

int64_t g100001000 = 0;

void fun_100000fa0() {
    goto g100001000;
}

Shouldn't it be the following?:

int _main() {
    puts("Hello, World!");
    return 0x0;
}

Btw, disassembly doesn't produce symbol info in asm code, too.
Any suggestion?

Platform:
OSX 10.10

Use the $gp register as a global pointer and not as a local per function.

On MIPS the $gp register is saved over over calls between functions.
How should this been taken care of? As for now it will become marked as a local variable and then the liveliness analyzer will kill it off.

"For the N32 and N64 ABIs, a function must preserve the $S0-$s7 registers, the global pointer ($gp or $28), the stack pointer ($sp or $29) and the frame pointer ($30). The O32 ABI is the same except the calling function is required to save the $gp register instead of the called function."

Add support for conditional calls.

On MIPS for example there are not only conditional jumps but also conditional calls.

Any hint on how to implement this would be nice.

Also I don't see why the else in a conditional jump cannot be a nullptr. On MIPS you've got delay branches which becomes the directSuccessor and is run before the condition is evaluated: so actually I want to jump to the directSuccessor()'s directSucceror(). Any hints?

Detect C-NULL-pointers when transforming code.

Sometimes I see pointers by the generated code looking like '(void **)0', kinda obvious example of a 'NULL' declaration for x86/AMD64. But there did (does?) exist where this actually is a valid pointer.

I guess if we could match NULL-pointers for every arch and replace them by 'NULL' readability would increase for the novice.

Add function signatures / declarations...

With some effort this could be implemented based on van Emmerik's approach in boomerang. Allowing customization of header files with function declarations in-place. It would improve the readability and the typechecking of decompiled code.

Does snowman handle intrinsic functions?

Most compilers allow to insert special instructions through intrinsic functions. This is a way to avoid having .asm files and a good way to help the compiler to be aware of which registers are involved in a intrinsic function and to clobber the necessary ones.

It would be interesting to allow snowman to issue specific intrinsic function instead of inline assembly so it can link the terms used as the arguments with previous statements and the result of the function to future statements, what inline assembly statements are unable to do.

Of course those intrinsic functions are specific to an architecture and do not obey the same ABI rules as standard functions. They must be seen as user named N-ary operators.

Multiplication dropped in decompiled program

The following program is not properly decompiled when compiled for x86-64 with clang (Apple LLVM version 6.0)

long f(int x)
{
        long l = 0;
        while (x)
        {
                l *= l;
                l += x;
                x--;
        }
        return l;
}
_f:
pushq   %rbp
movq    %rsp, %rbp
xorl    %eax, %eax
testl   %edi, %edi
je  0x37
movslq  %edi, %rcx
incq    %rcx
xorl    %edx, %edx
nopw    %cs:_f(%rax,%rax)
movq    %rdx, %rax
imulq   %rax, %rax
decl    %edi
leaq    -0x1(%rcx,%rax), %rdx
leaq    -0x1(%rcx), %rcx
jne 0x20
addq    %rcx, %rax
popq    %rbp
retq

http://people.mozilla.org/~jmuizelaar/snowman/f.o

0 constant gets replaced with function name

The same program as in issue #30 is decompiled as:

_f is used as a 0 constant

int64_t _f(int32_t edi) {
int64_t rax2;
int64_t rcx3;
int64_t rdx4;
int64_t rax5;
int64_t rax6;

*(int32_t*)&rax2 = (int32_t)_f;
*((int32_t*)&rax2 + 1) = (int32_t)_f;
if (edi != _f) {
    rcx3 = edi + 1;
    *(int32_t*)&rdx4 = (int32_t)_f;
    *((int32_t*)&rdx4 + 1) = (int32_t)_f;
    do {
        rax5 = rdx4;
        rax6 = rax5 * rax5;
        --edi;
        rdx4 = rcx3 + rax6 + -1;
        --rcx3;
    } while (!(int1_t)(edi == _f));
    rax2 = rax6 + rcx3;
}
return rax2;

}

Build fail in ida

Hey @yegord
I am trying to build snowman for ida with Qt 4.8.7 idasdk6.6 for ida 6.6.
The last two prebuild packages http://derevenets.com/
0.6 and 0.7 no one was able to load into ida.Plugin couldent be loaded.
I have rebuild windows Qt 4.8.7 with namespace QT Release

then in snowman
Cmake configure line
cmake -G "Visual Studio 12" -D -DCMAKE_BUILD_TYPE=Release -D QT_NAMESPACE=QT -D IDA_PLUGIN_ENABLED=YES -D IDA_64_BIT_EA_T=NO -D NC_QT5=NO ../src

cmake --build .

fails in command prompt but builds in visual studio.

A note on Cmake configure.
Cmake apparently disregard the =Release flag, and try to nevertheless build as debug.
I build as Release in visual no trouble.

build is regonised in ida.
but after i switch to snowman windows, ida crash.
looks like some heap problem.
pastie of the crashdump here.
http://pastebin.com/MyGDm2WF

Standalone snowman.exe is working great, just not plugin.

Any ideas ?

EDIT!
I rebuilded Qt 4.8.4 with namespace QT, the one ida uses itself and same issue.

Is constant folding and propagation working for IR expression?

Is constant folding not supposed to simplify this expression v1= v0 + 0; into v1 = v0;? I have a lot of lines like that because with Allegrex move $v1, $v0is indeed encoded as addiu $v1, $v0, $zero with $zero whatever it is chosen for $zero to return a register assigned to constant 0 or simply a constant 0 itself.

Constant propagation is also a feature present?

use issues instead of todo.asciidoc?

Currently there is a todo file in the repository, but is there a reason not to use issues for those? (I could add the issues if you don't have time for it).

can you upgrade qt version to qt 5.4.0?

Boost version: 1.57.0
CMake Error at D:/CMake/share/cmake-3.3/Modules/FindQt4.cmake:1326 (message):
Found unsuitable Qt version "5.4.0" from
C:/Qt/Qt5.4.0/5.4/msvc2013/bin/qmake.exe, this code requires Qt 4.x
Call Stack (most recent call first):
CMakeLists.txt:107 (find_package)

Configuring incomplete, errors occurred!
See also "D:/snowman/src/build/CMakeFiles/CMakeOutput.log".

i try to use cmake generate snowman's msvc project,but it tell me only supports Qt 4.x

ambiguous declarations

gcc 4.6.4 yells on this:
[ 25%] Building CXX object nc/core/CMakeFiles/nc-core.dir/ir/cgen/CodeGenerator.cpp.o
In file included from /home/markus/Downloads/snowman/src/nc/core/likec/FunctionDeclaration.h:31:0,
from /home/markus/Downloads/snowman/src/nc/core/likec/FunctionDefinition.h:32,
from /home/markus/Downloads/snowman/src/nc/core/ir/cgen/CodeGenerator.cpp:42:
/home/markus/Downloads/snowman/src/nc/core/likec/ArgumentDeclaration.h: In constructor ‘nc::core::likec::ArgumentDeclaration::ArgumentDeclaration(nc::core::likec::Tree&, const QString&, const nc::core::likec::Type_)’:
/home/markus/Downloads/snowman/src/nc/core/likec/ArgumentDeclaration.h:47:48: error: conversion from ‘int’ to ‘std::unique_ptrnc::core::likec::Expression’ is ambiguous
/home/markus/Downloads/snowman/src/nc/core/likec/ArgumentDeclaration.h:47:48: note: candidates are:
/usr/include/c++/4.6/bits/unique_ptr.h:136:17: note: constexpr std::unique_ptr<_Tp, _Dp>::unique_ptr(std::nullptr_t) [with _Tp = nc::core::likec::Expression, _Dp = std::default_deletenc::core::likec::Expression, std::nullptr_t = std::nullptr_t]
/usr/include/c++/4.6/bits/unique_ptr.h:120:7: note: std::unique_ptr<_Tp, _Dp>::unique_ptr(std::unique_ptr<_Tp, _Dp>::pointer) [with _Tp = nc::core::likec::Expression, _Dp = std::default_deletenc::core::likec::Expression, std::unique_ptr<_Tp, Dp>::pointer = nc::core::likec::Expression]
make[2]: *** [nc/core/CMakeFiles/nc-core.dir/ir/cgen/CodeGenerator.cpp.o] Error 1
make[1]: *** [nc/core/CMakeFiles/nc-core.dir/all] Error 2

Support for more than one register for return values.

AFAIK the x86 takes more than one register; MIPS has got a total of 4 registers which is not able to overlap all 4 at once with a pseudo-register. (2 regs for integer operations and 2 regs for float point operations.)

Weird return value for a MIPS/Allegrex function

Decompilation of

8900470:    addiu      $sp, $sp, -8
8900474:    sw         $ra, 4($sp)
8900478:    jal        0x0890C484
890047c:    nop
8900480:    lw         $ra, 4($sp)
8900484:    lui        $v1, 0x0891
8900488:    sw         $v0, 11904($v1)
890048c:    jr         $ra
8900490:    addiu      $sp, $sp, 8

gave me

int32_t startTest(void** a0, void** a1, void** a2) {
    int32_t v0_4;

    v0_4 = sceKernelGetSystemTimeLow();
    startSystemTime = v0_4;
    return 0x8910000;
}

I would expect something like:

int32_t startTest() {
    int32_t v0_4;

    v0_4 = sceKernelGetSystemTimeLow();
    startSystemTime = v0_4;
    return v0_4;
}

Ability to rename calls to registers...

As @aerosoul94 pointed out the way to implement a syscall on most architectures would not yield in a function call like '__syscall_XXX' which would be to prefer:

[16/10/15 18:45:03 ] aerosoul94: so it would be like __syscall_XXX(r3, r4);
[16/10/15 18:45:06 ] Markus: yes
[16/10/15 18:45:31 ] aerosoul94: i think your method would output r11(r3, r4);

so we need to either have a way to mark a call as syscall or being able to rename the output from call().

'expressions' : a namespace with this name does not exist (mips/allegrex related)

happens in mips and allegrex projects, specifically:
AllegrexInstructionAnalyzer:189,197
MipsInstructionAnalyzer:97,103
in hlide and nihilus projects.
now, i'm completely clueless to how to solve these, so i'm just leaving the issue here.
maybe someone else can fix it
I'm using Visual Studio 2010 with qt 4.8.6 and boost 1.58.

"no member named 'abs' in namespace 'std';"

/Users/nietzsche/Downloads/snowman/src/nc/core/likec/BinaryOperator.cpp:84:25: error: 
      no member named 'abs' in namespace 'std'; did you mean simply 'abs'?
    int absPrecedence = std::abs(precedence);
                        ^~~~~~~~
                        abs
/usr/include/stdlib.h:129:6: note: 'abs' declared here
int      abs(int) __pure2;
         ^
#include <cstdlib> fixed this issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.