zyantific / zasm Goto Github PK

View Code? Open in Web Editor NEW

302.0 15.0 42.0 1.34 MB

x86-64 Assembler based on Zydis

License: MIT License

C++ 100.00%

asmjit zydis cmkr x86-64 assembler-x86 jit reverse-engineering x86-assembly cpp17 assembler

zasm's People

Contributors

Stargazers

Watchers

Forkers

crackercat fengjixuchui mrexodia clayne push0-ret differentprogramming fake-cheater gitter-badger backuphouse fcccode killvxk gmh5225 y11en germanaizek dpreed dtpeters hksoobe data-man nasingfaund wbaby d3v1l401 gekas4488 codingman psyirius raineldev xr4zz3rs webstorage119 ludoplex untyper brugarolas ego ajgappmark gauravssnl ezhangle wxfcc graftmc brinkqiang anthonyprintup khwc crackedmatter audaine thomasxm cnguoyj

zasm's Issues

Invalid encoding

Incorrect instruction coding. Faced the problem that I incorrectly encode ANY INSTRUCTION. After looking at the code for a bit, I sort of realized that it thinks that I am passing the wrong operand size to the instruction. It doesn't matter, just a preface. As I understood further when I looked at the function

i wanna to encode push 0xDEADC0DE ( 0x68 opcode), i got error because i use x64 library to encode x32 instructions and encoder thinks that my operand immediate value is signed but i use x64 bit application and thats why

static ZyanU8 ZydisGetSignedImmSize(ZyanI64 imm)
{
    if (imm >= ZYAN_INT8_MIN && imm <= ZYAN_INT8_MAX)
    {
        return 8;
    }
    if (imm >= ZYAN_INT16_MIN && imm <= ZYAN_INT16_MAX)
    {
        return 16;
    }
    if (imm >= ZYAN_INT32_MIN && imm <= ZYAN_INT32_MAX)
    {
        return 32;
    }

    return 64;
}

all checks failed and i got 64 bit size, and thats wrong. I got it because of x64 application.

i need to pass to parameters value with 0xffffffff???????? if i wanna say function to encode 32bit value but it is not convenient way.

a.emit(ZYDIS_MNEMONIC_PUSH, Imm((int32_t)0xdeadc0de));

solved, but how can i do it in more convenient way?

Reduce STL dependencies within the interfaces

Compiling question

Hello,I am trying to incorporate zasm into my project and I found that I have to include headers and link libraries not only zasm part but also Zydis.lib and its headers to make it compile?Is that okey or am I doing smth wrong?

Impossible to set ReadWrite access to file descriptor in FileStream

Hello,it seems there is a bug that does not allow to open file descriptor in read&write mode despite we have StreamMode::ReadWrite enum value.The problem in the following line of FileStream::open function _wfopen_s(&fp, path.wstring().c_str(), mode == StreamMode::Read ? L"rb" : L"wb");.The problem if we pass ReadWrite enum value to FileStream::open method or to constructor it open the file in wb mode and overwrites the content.It is not the problem for load and store func cause as I understood there is no sence to pass FileStream in ReadWrite value to save func cause it has to clear the content before writing bytes using his own patterns.But it is problem for user who wants to use FileStream as a wrapper to work with files cause he is not able to open file in rw mode using FileStream.I could create pull request but I can not firgure out how to fix it to not break load and store func.

P.S.As an option we can write something like this _ _wfopen_s(&fp, path.wstring().c_str(), mode == StreamMode::Read ? L"rb" : mode == StreamMode::Write ? L"wb" : L"r+b");
but it will break save func cause file will not be cleared before writing our Program;

Immediate too large

#include "examples.common.hpp"

#include <iostream>
#include <windows.h>

static void* allocatePage(std::size_t codeSize)
{
#ifdef _WIN32
    return VirtualAlloc(0, codeSize, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
#else
    // TODO: mmap for Linux.
    return nullptr;
#endif
}

static std::size_t estimateCodeSize(const zasm::Program& program)
{
    std::size_t size = 0;
    for (auto* node = program.getHead(); node != nullptr; node = node->getNext())
    {
        if (auto* nodeData = node->getIf<zasm::Data>(); nodeData != nullptr)
        {
            size += nodeData->getTotalSize();
        }
        else if (auto* nodeInstr = node->getIf<zasm::Instruction>(); nodeInstr != nullptr)
        {
            const auto& instrInfo = nodeInstr->getDetail(program.getMode());
            if (instrInfo.hasValue())
            {
                size += instrInfo->getLength();
            }
            else
            {
                std::cout << "Error: Unable to get instruction info\n";
            }
        }
        else if (auto* nodeEmbeddedLabel = node->getIf<zasm::EmbeddedLabel>(); nodeEmbeddedLabel != nullptr)
        {
            const auto bitSize = nodeEmbeddedLabel->getSize();
            if (bitSize == zasm::BitSize::_32)
                size += 4;
            if (bitSize == zasm::BitSize::_64)
                size += 8;
        }
    }
    return size;
}

int main()
{
    using namespace zasm;

    const uint64_t address = 0x7ff7cf0055b3;

    const std::vector<uint8_t> code = { 
        0x48, 0x2B, 0x1D, 0x00, 0x00, 0x00, 0x00 
        //sub    rbx,QWORD PTR [rip+0x0]  
    };

    Program program(MachineMode::AMD64);
    x86::Assembler assembler(program);;

    Decoder decoder(program.getMode());
    Serializer serializer;

    size_t bytesDecoded = 0;

    while (bytesDecoded < code.size())
    {
        const auto curAddress = address + bytesDecoded;

        const auto decoderRes = decoder.decode(code.data() + bytesDecoded, code.size() - bytesDecoded, curAddress);

        const auto& instrInfo = *decoderRes;

        const auto instr = instrInfo.getInstruction();
        printf( "%s\n", formatter::toString(&instr).c_str());
        assembler.emit(instr);

        bytesDecoded += instrInfo.getLength();
    }

    const auto requiredSize = estimateCodeSize(program);
    void* pCodePage = allocatePage(requiredSize);
    Error error = serializer.serialize(program, reinterpret_cast<uint64_t>(pCodePage)); // Error

    if (error != Error::None){
        printf("error = %s\n", getErrorName(error));
    }
    system("pause");
    return 0;
}

output:

sub rbx, qword ptr ds:[rel 0x7ff7cf0055ba]
imm = 139215200671155 | 7E9D909555B3
error = Error::ImpossibleInstruction

https://github.com/zyantific/zydis/blob/ffde0f46398a86417c860462f6af0556331cb5bb/src/Encoder.c#L455
https://github.com/zyantific/zydis/blob/ffde0f46398a86417c860462f6af0556331cb5bb/src/Encoder.c#L1598

Better abstraction for multiple architectures

A few touches to make zasm useable for something

Lets say I want to use this write a jit, well in that case I need to be able to pass addresses in of library routines and to get addresses out of generated routines.

Ie, I need to be able to take the address of a label after serialization, and I need to be able to SET a constant address for some labels before serialization.

I don't see a way to do these things. There are certainly no examples that do them.

I made a fork to do this myself.

Please provide a example ready to build project.

I couldn't use it in my own project. It gives linker errors.

Build started... 1>------ Build started: Project: Test, Configuration: Release x64 ------ 1>Test.obj : error LNK2001: unresolved external symbol "public: enum zasm::Error __cdecl zasm::Serializer::serialize(class zasm::Program const &,__int64)" (?serialize@Serializer@zasm@@QEAA?AW4Error@2@AEBVProgram@2@_J@Z) 1>Test.obj : error LNK2001: unresolved external symbol "public: __cdecl zasm::Serializer::~Serializer(void)" (??1Serializer@zasm@@QEAA@XZ) 1>Test.obj : error LNK2001: unresolved external symbol "public: __cdecl zasm::Serializer::Serializer(void)" (??0Serializer@zasm@@QEAA@XZ) 1>Test.obj : error LNK2001: unresolved external symbol "public: enum zasm::Error __cdecl zasm::x86::Assembler::emit(class zasm::Instruction const &)" (?emit@Assembler@x86@zasm@@QEAA?AW4Error@3@AEBVInstruction@3@@Z) 1>Test.obj : error LNK2001: unresolved external symbol "public: virtual __cdecl zasm::x86::Assembler::~Assembler(void)" (??1Assembler@x86@zasm@@UEAA@XZ) 1>Test.obj : error LNK2001: unresolved external symbol "public: __cdecl zasm::x86::Assembler::Assembler(class zasm::Program &)" (??0Assembler@x86@zasm@@QEAA@AEAVProgram@2@@Z) 1>Test.obj : error LNK2001: unresolved external symbol "public: enum zasm::MachineMode __cdecl zasm::Program::getMode(void)const " (?getMode@Program@zasm@@QEBA?AW4MachineMode@2@XZ) 1>Test.obj : error LNK2001: unresolved external symbol "public: __cdecl zasm::Program::~Program(void)" (??1Program@zasm@@QEAA@XZ) 1>Test.obj : error LNK2001: unresolved external symbol "public: __cdecl zasm::Program::Program(enum zasm::MachineMode)" (??0Program@zasm@@QEAA@W4MachineMode@1@@Z) 1>Test.obj : error LNK2001: unresolved external symbol "public: class zasm::Expected<class zasm::Instruction,enum zasm::Error> __cdecl zasm::Decoder::decode(void const *,unsigned __int64,unsigned __int64)" (?decode@Decoder@zasm@@QEAA?AV?$Expected@VInstruction@zasm@@W4Error@2@@2@PEBX_K1@Z) 1>Test.obj : error LNK2001: unresolved external symbol "public: __cdecl zasm::Decoder::Decoder(enum zasm::MachineMode)" (??0Decoder@zasm@@QEAA@W4MachineMode@1@@Z)

More basic examples needed

I asked before for the ability to get the address of a label, and you supplied that.

Now I need a few other things:
Examples of basic things
a) actually calling code you assembled. There are no examples of calling code you created.
b) an example of calling into user or library code from assembly.
c) disassembling to human readable form. Getting the abi right for a platform is complex, and you pretty much need to be able to disassemble and see code that the compiler generates so that you can make templates from entry and exit code that's correct in every detail. Also it would be useful for making sure that the code you're generating is what you thought it was.
d) deallocating the result of an assembly - for a jit you not only create code, but you throw it away when you're done with it and are replacing it with altered code.

Edit I guess I can just use the debugger for disassembly.

Calculating offset of memory operand type relative to instruction

In the above example I need to know the offset of the uint (highlighted in grey) for memory operands for the purposes of fixing up rva's in a mutation engine.

Please can you suggest a way to either extract the size of the mnemonic and I can work it out that way, or even better would be to know the offset of the bytes of the operand relative to the bytes of the full instruction.

Thanks :)

Add align node type to align data and code

How to assemble more complex instructions?

How would I go about assembling the following instruction, as an example:

lea rsp, [rbp+50h]

Could you bind for golng?

Zasm wrong jump calculation

Hello,recently I have faced a problem with incorrect jump encoding.Do not think about the sense of the god it is just an example.
` Program program(MachineMode::I386);

x86::Assembler assembler(program);

auto label = assembler.createLabel();

ASSERT_EQ(assembler.jmp(label), Error::None);
for (int i = 0; i < 100; i++)
ASSERT_EQ(assembler.nop(), Error::None);
ASSERT_EQ(assembler.bind(label), Error::None);
ASSERT_EQ(assembler.int3(), Error::None);
ASSERT_EQ(assembler.align(Align::Type::Code, 10), Error::None);

Serializer serializer;
ASSERT_EQ(serializer.serialize(program, 0x0000000000401000), Error::None);`

My jmp should went to int3 instruction but it goes 3 bytes futher.I looked at the source code of zasm and noticed possible incorrect logic(I think).When jmp is encoding first time its size is 5.But on extrapass its value 2(cause now we already have bounded label and it changes from far jmp to short).So now ctx.drift is 3 and we should run pass one more time.But because of aligning at the end of the code drift becomes 0(3 - 3).And now zasm does not run the third pass (despite aligning happens not in range of our jmp)and he thinks that offset of int 3 is still 105 instead of 103.Am I doing something wrong or it is a bug?

P.S.Code is just an example dont take it as smth meaningful.

Calculating addresses of call, jmp etc.

With zydis, you can call ZydisCalcAbsoluteAddressEx, but there is no way to retreive the relevant data to perform this from the Decoder.

Can you suggest any way to do this please?

jmp imm bug

My target address is 0x347486 (0x224AB8A1486), but the generated assembly instruction's target address is 0x347485 (0x224AB8A1485). Is this a bug?

    zasm::Program program(zasm::MachineMode::AMD64);
    zasm::x86::Assembler assembler(program);

    assembler.jmp(zasm::Imm(instr.jmp_rva));

    zasm::Serializer serializer{};
    auto res = serializer.serialize(program, instr.rva);
    if (res == zasm::ErrorCode::None) {
      auto ptr = serializer.getCode();
      auto size = serializer.getCodeSize();
      std::memcpy(reinterpret_cast<void *>(base.module_base + instr.rva),
                  serializer.getCode(), serializer.getCodeSize());
     }

memory bug while deleting label with destroyNode

when creating a label we see the following code:

static Label createLabel_(detail::ProgramState& state, StringPool::Id nameId, StringPool::Id modId, LabelFlags flags)
    {
        const auto labelId = static_cast<Label::Id>(state.labels.size());

        auto& entry = state.labels.emplace_back();
        entry.id = labelId;
        entry.flags = flags;
        entry.nameId = nameId;
        entry.moduleId = modId;

        return Label{ labelId };
    }

here we are adding a label to the label list but when we want to destroy the label node using destroyNode method the label list is not decremented

static void destroyNode(detail::ProgramState& state, Node* node, bool quickDestroy)
    {
        // Keep index before destroying the object.
        const auto nodeIdx = static_cast<std::size_t>(node->getId());

        notifyObservers<true>(&Observer::onNodeDestroy, state.observer, node);

        // If this is called from clear or from destructor we can skip unlinking.
        if (!quickDestroy)
        {
            // Ensure node is not in the list anymore.
            detach_<false>(node, state);
        }

        // Release.
        auto* nodeToDestroy = detail::toInternal(node);
        state.nodePool.destroy(nodeToDestroy);

        if (!quickDestroy)
        {
            // Release memory, when quickDestroy is true the entire pool will be cleared at once.
            state.nodePool.deallocate(nodeToDestroy, 1);

            // Remove mapping.
            auto& nodeMap = state.nodeMap;
            assert(nodeIdx < nodeMap.size());

            // Null out the slot.
            nodeMap[nodeIdx] = nullptr;

            while (!nodeMap.empty() && nodeMap.back() == nullptr)
            {
                nodeMap.pop_back();
            }
        }
    }

    void Program::destroy(Node* node)
    {
        destroyNode(*_state, node, false);
    }

here we don't see a descending list of tags. Is this a bug or am I missing something?

Mem constructor for [base + index]

Should the Mem constructor that takes base and index registers have scale set to 1 instead of 0?

zasm/include/zasm/x86/memory.hpp

Line 27 in 012062f

    
           static constexpr Mem ptr(BitSize bitSize, const Gp& base, const Gp& index) noexcept

Succeeds:

a.mov(zasm::x86::al, zasm::x86::byte_ptr(zasm::x86::rsi, zasm::x86::rcx, 1, 0));

Fails impossible instruction:

a.mov(zasm::x86::al, zasm::x86::byte_ptr(zasm::x86::rsi, zasm::x86::rcx));

Likely cause:

    // ptr [base + index]
    // ex.: mov eax, ptr [ecx+edx]
    static constexpr Mem ptr(BitSize bitSize, const Gp& base, const Gp& index) noexcept
    {
        return Mem(bitSize, Seg{}, base, index, 0, 0);
    }

ZydisGetInstructionSegments() in zasm

I used to use Zydis for a while and recently decided to move to zasm.I am not able to find alternative feature in zasm.Is there is any?

Serialiser fail at 64bit relative addresses

Based on the provided example, trying to serialise a far jmp / call or r/w mem instructions results in a
ImpossibleInstruction error.
e.g.

    using namespace zasm;

    const uint64_t address = 0x00000001400019A4;
    const std::vector<uint8_t> code = {
      0xFF, 0x15, 0x73, 0x16, 0x00, 0x00 // CALL QWORD PTR DS:[0x0000000140003028]
};

    Program program(MachineMode::AMD64);
    x86::Assembler assembler(program);
    Decoder decoder(program.getMode());
    Serializer serializer;

    // Decode all bytes.
    size_t bytesDecoded = 0;
    while (bytesDecoded < code.size())
    {
        const auto curAddress = address + bytesDecoded;
        auto decoderRes = decoder.decode(code.data() + bytesDecoded, code.size() - bytesDecoded, curAddress);
        if (!decoderRes)
        {
            std::cout << "Failed to decode at " << std::hex << curAddress  << "\n";
            return;
        }
        const auto& instr = decoderRes.value();
        assembler.emit(instr);
        bytesDecoded += instr.getLength();
    }
    serializer.serialize(program, address); // Error
    const auto codeDump = getHexDump(serializer.getCode(), serializer.getCodeSize());
    std::cout << codeDump << "\n";
}

Implement `operator|=` and `operator&=` in `InstrCPUFlags`

Currently I have to do flags = flags | (modifiedFlags &~readFlags) instead of flags |= modifiedFlags & ~readFlags (nonsensical, but ran into this while implementing liveness analysis)

Improve error handling for serialization.

Currently its quite difficult to tell which node causes the error, there should be a way to get extended information about what went wrong and which node is causing it.

Performance problem.

I'm currently writing a program which will reencode nearly every instruction of a big problem(a game).
So there will be millions of calls on reencode callback. (see here.)

The problem is that if I declare the Program and Assembler as local variables, then there will be millions of object/memory allocations and deallocations which will be a big impact on speed.

But if I declare them as global variables, I didn't see any clear methods so the generated code will be accumulated on every call.

Is there any solutions to this?

Needs proper documentation!

The example code is useful but barely scratches the surface of any kind of practical application. Documentation definitely needed.

Serializer reports "Impossible instruction" error with valid instruction

Hello,

I compiled Zasm from source code 4 days ago using latest Visual Studio 2022 (CMake tools) on Windows 11.
When using the Assembler class, I noticed a strange behavior where serializing would fail with error "Impossible instruction" even though the instruction is valid.

This behavior should be reproducible with following code (using latest MSVC for compilation):

#include "zasm/formatter/formatter.hpp"
#include "zasm.hpp"

#include <stdio.h>
int main()
{
    using namespace zasm;
    using namespace zasm::x86;

    Program     Program(MachineMode::AMD64);
    Assembler   Assembler(Program);
    Serializer  Serializer;
    Error       ZasmError;

    Assembler.mov(ecx, Imm(0xFFFFFFFF));

    ZasmError = Serializer.serialize(Program, 0);

    if (ZasmError.getCode() != ErrorCode::None)
    {
        printf("%s\n", ZasmError.getErrorMessage()); // This line gets hit: "Error at node "mov ecx, 0xffffffff" with id 0: Impossible instruction"
        exit(1);
    }

    exit(0);
}

When setting the immediate value to a slightly lower value (for example 0x1FFFFFFF), there is no error.

Invalid relative instructions size estimation

Hello, first of all, thanks for all that huge work, but i've been getting some problems with program size estimation that I want to report and maybe you can help me in fixing it(if its even possible).

In the example below I've used the estimateCodeSize and allocatePage from the basic_jit example.

static std::size_t estimate_code_size(const zasm::Program& program) {
    std::size_t size = 0;
    for (auto* node = program.getHead(); node != nullptr; node = node->getNext()) {
        if (auto* nodeData = node->getIf<zasm::Data>(); nodeData != nullptr) {
            size += nodeData->getTotalSize();
        } else if (auto* nodeInstr = node->getIf<zasm::Instruction>(); nodeInstr != nullptr) {
            const auto& instrInfo = nodeInstr->getDetail(program.getMode());
            if (instrInfo.hasValue()) {
                size += instrInfo->getLength();
            } else {
                std::cout << "Error: Unable to get instruction info\n";
            }
        } else if (auto* nodeEmbeddedLabel = node->getIf<zasm::EmbeddedLabel>(); nodeEmbeddedLabel != nullptr) {
            const auto bitSize = nodeEmbeddedLabel->getSize();
            if (bitSize == zasm::BitSize::_32)
                size += 4;
            if (bitSize == zasm::BitSize::_64)
                size += 8;
        }
    }
    return size;
}

static void* allocate_page(std::size_t codeSize) {
    return VirtualAlloc(0, codeSize, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
}

int main(int argc, char* argv[]) {
    using namespace zasm;

    Program program(MachineMode::AMD64);
    x86::Assembler a(program);

    {
        auto test_cond = program.createLabel("test_cond");

        a.cmp(x86::rcx, 0x1337);
        a.jz(test_cond);
        a.ret();

        a.bind(test_cond);
        a.nop();
        a.ret();
    }

    const auto estimated_size = estimate_code_size(program);
    void* code_page = allocate_page(estimated_size);

    Serializer serializer;
    if (auto err = serializer.serialize(program, reinterpret_cast<int64_t>(code_page)); err != zasm::Error::None) {
        std::cout << "Serialization failure: " << zasm::getErrorName(err) << "\n";
        return EXIT_FAILURE;
    }

    const auto serialized_size = serializer.getCodeSize();
    std::memcpy(code_page, serializer.getCode(), serialized_size);

    auto fmt_args = disasm::c_instance::fmt_args_t{.data_ptr = (uint8_t*)code_page,
                                                   .address = code_page,
                                                   .dump_opcodes = true,
                                                   .dump_address = true,
                                                   .dump_offset = true};
    logger::info("Generated:\n{}", disasm::get().format_range(fmt_args, estimated_size));

    logger::warn<1>("estimated size: {} | serialized_size: {}", estimated_size, serialized_size);

    return 0;
}

The results of executing this function are:

As you can see, the estimated code size and serialized code size are different because of the JZ instruction, which produced the 5 bytes size instruction at the moment when we were estimating the code size, and it produced the 2 bytes size instruction at the moment when we were serializing the output, and because of that, the output of my format function produced 2 invalid instructions at the end.

The reason why this happens is that at the moment we are estimating the code size we are encoding instructions one by one and we don't know the addresses of the prev/next instructions, so we can't estimate the relative offset.

Originally I thought about just creating a PR where I fix this, but the more I think the more questions I get.
Surely we can fix this kind of behaviour for the JMPs/JCCs to the instruction label, but I am not entirely sure how we can calculate the right instruction size if we are dealing with immediate addresses.

Inserting a node before the head is currently impossible with assembler

Move Formatter out of Program

Formatter should be its own component and not be a part of Program