zyantific / zasm Goto Github PK
View Code? Open in Web Editor NEWx86-64 Assembler based on Zydis
License: MIT License
x86-64 Assembler based on Zydis
License: MIT License
Incorrect instruction coding. Faced the problem that I incorrectly encode ANY INSTRUCTION. After looking at the code for a bit, I sort of realized that it thinks that I am passing the wrong operand size to the instruction. It doesn't matter, just a preface. As I understood further when I looked at the function
i wanna to encode push 0xDEADC0DE ( 0x68 opcode), i got error because i use x64 library to encode x32 instructions and encoder thinks that my operand immediate value is signed but i use x64 bit application and thats why
static ZyanU8 ZydisGetSignedImmSize(ZyanI64 imm)
{
if (imm >= ZYAN_INT8_MIN && imm <= ZYAN_INT8_MAX)
{
return 8;
}
if (imm >= ZYAN_INT16_MIN && imm <= ZYAN_INT16_MAX)
{
return 16;
}
if (imm >= ZYAN_INT32_MIN && imm <= ZYAN_INT32_MAX)
{
return 32;
}
return 64;
}
all checks failed and i got 64 bit size, and thats wrong. I got it because of x64 application.
i need to pass to parameters value with 0xffffffff???????? if i wanna say function to encode 32bit value but it is not convenient way.
a.emit(ZYDIS_MNEMONIC_PUSH, Imm((int32_t)0xdeadc0de));
solved, but how can i do it in more convenient way?
Hello,it seems there is a bug that does not allow to open file descriptor in read&write mode despite we have StreamMode::ReadWrite enum value.The problem in the following line of FileStream::open function _wfopen_s(&fp, path.wstring().c_str(), mode == StreamMode::Read ? L"rb" : L"wb");
.The problem if we pass ReadWrite enum value to FileStream::open method or to constructor it open the file in wb mode and overwrites the content.It is not the problem for load and store func cause as I understood there is no sence to pass FileStream in ReadWrite value to save func cause it has to clear the content before writing bytes using his own patterns.But it is problem for user who wants to use FileStream as a wrapper to work with files cause he is not able to open file in rw mode using FileStream.I could create pull request but I can not firgure out how to fix it to not break load and store func.
P.S.As an option we can write something like this _ _wfopen_s(&fp, path.wstring().c_str(), mode == StreamMode::Read ? L"rb" : mode == StreamMode::Write ? L"wb" : L"r+b");
but it will break save func cause file will not be cleared before writing our Program;
#include "examples.common.hpp"
#include <iostream>
#include <windows.h>
static void* allocatePage(std::size_t codeSize)
{
#ifdef _WIN32
return VirtualAlloc(0, codeSize, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
#else
// TODO: mmap for Linux.
return nullptr;
#endif
}
static std::size_t estimateCodeSize(const zasm::Program& program)
{
std::size_t size = 0;
for (auto* node = program.getHead(); node != nullptr; node = node->getNext())
{
if (auto* nodeData = node->getIf<zasm::Data>(); nodeData != nullptr)
{
size += nodeData->getTotalSize();
}
else if (auto* nodeInstr = node->getIf<zasm::Instruction>(); nodeInstr != nullptr)
{
const auto& instrInfo = nodeInstr->getDetail(program.getMode());
if (instrInfo.hasValue())
{
size += instrInfo->getLength();
}
else
{
std::cout << "Error: Unable to get instruction info\n";
}
}
else if (auto* nodeEmbeddedLabel = node->getIf<zasm::EmbeddedLabel>(); nodeEmbeddedLabel != nullptr)
{
const auto bitSize = nodeEmbeddedLabel->getSize();
if (bitSize == zasm::BitSize::_32)
size += 4;
if (bitSize == zasm::BitSize::_64)
size += 8;
}
}
return size;
}
int main()
{
using namespace zasm;
const uint64_t address = 0x7ff7cf0055b3;
const std::vector<uint8_t> code = {
0x48, 0x2B, 0x1D, 0x00, 0x00, 0x00, 0x00
//sub rbx,QWORD PTR [rip+0x0]
};
Program program(MachineMode::AMD64);
x86::Assembler assembler(program);;
Decoder decoder(program.getMode());
Serializer serializer;
size_t bytesDecoded = 0;
while (bytesDecoded < code.size())
{
const auto curAddress = address + bytesDecoded;
const auto decoderRes = decoder.decode(code.data() + bytesDecoded, code.size() - bytesDecoded, curAddress);
const auto& instrInfo = *decoderRes;
const auto instr = instrInfo.getInstruction();
printf( "%s\n", formatter::toString(&instr).c_str());
assembler.emit(instr);
bytesDecoded += instrInfo.getLength();
}
const auto requiredSize = estimateCodeSize(program);
void* pCodePage = allocatePage(requiredSize);
Error error = serializer.serialize(program, reinterpret_cast<uint64_t>(pCodePage)); // Error
if (error != Error::None){
printf("error = %s\n", getErrorName(error));
}
system("pause");
return 0;
}
output:
sub rbx, qword ptr ds:[rel 0x7ff7cf0055ba]
imm = 139215200671155 | 7E9D909555B3
error = Error::ImpossibleInstruction
https://github.com/zyantific/zydis/blob/ffde0f46398a86417c860462f6af0556331cb5bb/src/Encoder.c#L455
https://github.com/zyantific/zydis/blob/ffde0f46398a86417c860462f6af0556331cb5bb/src/Encoder.c#L1598
Lets say I want to use this write a jit, well in that case I need to be able to pass addresses in of library routines and to get addresses out of generated routines.
Ie, I need to be able to take the address of a label after serialization, and I need to be able to SET a constant address for some labels before serialization.
I don't see a way to do these things. There are certainly no examples that do them.
I made a fork to do this myself.
I couldn't use it in my own project. It gives linker errors.
Build started... 1>------ Build started: Project: Test, Configuration: Release x64 ------ 1>Test.obj : error LNK2001: unresolved external symbol "public: enum zasm::Error __cdecl zasm::Serializer::serialize(class zasm::Program const &,__int64)" (?serialize@Serializer@zasm@@QEAA?AW4Error@2@AEBVProgram@2@_J@Z) 1>Test.obj : error LNK2001: unresolved external symbol "public: __cdecl zasm::Serializer::~Serializer(void)" (??1Serializer@zasm@@QEAA@XZ) 1>Test.obj : error LNK2001: unresolved external symbol "public: __cdecl zasm::Serializer::Serializer(void)" (??0Serializer@zasm@@QEAA@XZ) 1>Test.obj : error LNK2001: unresolved external symbol "public: enum zasm::Error __cdecl zasm::x86::Assembler::emit(class zasm::Instruction const &)" (?emit@Assembler@x86@zasm@@QEAA?AW4Error@3@AEBVInstruction@3@@Z) 1>Test.obj : error LNK2001: unresolved external symbol "public: virtual __cdecl zasm::x86::Assembler::~Assembler(void)" (??1Assembler@x86@zasm@@UEAA@XZ) 1>Test.obj : error LNK2001: unresolved external symbol "public: __cdecl zasm::x86::Assembler::Assembler(class zasm::Program &)" (??0Assembler@x86@zasm@@QEAA@AEAVProgram@2@@Z) 1>Test.obj : error LNK2001: unresolved external symbol "public: enum zasm::MachineMode __cdecl zasm::Program::getMode(void)const " (?getMode@Program@zasm@@QEBA?AW4MachineMode@2@XZ) 1>Test.obj : error LNK2001: unresolved external symbol "public: __cdecl zasm::Program::~Program(void)" (??1Program@zasm@@QEAA@XZ) 1>Test.obj : error LNK2001: unresolved external symbol "public: __cdecl zasm::Program::Program(enum zasm::MachineMode)" (??0Program@zasm@@QEAA@W4MachineMode@1@@Z) 1>Test.obj : error LNK2001: unresolved external symbol "public: class zasm::Expected<class zasm::Instruction,enum zasm::Error> __cdecl zasm::Decoder::decode(void const *,unsigned __int64,unsigned __int64)" (?decode@Decoder@zasm@@QEAA?AV?$Expected@VInstruction@zasm@@W4Error@2@@2@PEBX_K1@Z) 1>Test.obj : error LNK2001: unresolved external symbol "public: __cdecl zasm::Decoder::Decoder(enum zasm::MachineMode)" (??0Decoder@zasm@@QEAA@W4MachineMode@1@@Z)
I asked before for the ability to get the address of a label, and you supplied that.
Now I need a few other things:
Examples of basic things
a) actually calling code you assembled. There are no examples of calling code you created.
b) an example of calling into user or library code from assembly.
c) disassembling to human readable form. Getting the abi right for a platform is complex, and you pretty much need to be able to disassemble and see code that the compiler generates so that you can make templates from entry and exit code that's correct in every detail. Also it would be useful for making sure that the code you're generating is what you thought it was.
d) deallocating the result of an assembly - for a jit you not only create code, but you throw it away when you're done with it and are replacing it with altered code.
Edit I guess I can just use the debugger for disassembly.
In the above example I need to know the offset of the uint (highlighted in grey) for memory operands for the purposes of fixing up rva's in a mutation engine.
Please can you suggest a way to either extract the size of the mnemonic and I can work it out that way, or even better would be to know the offset of the bytes of the operand relative to the bytes of the full instruction.
Thanks :)
How would I go about assembling the following instruction, as an example:
lea rsp, [rbp+50h]
Hello,recently I have faced a problem with incorrect jump encoding.Do not think about the sense of the god it is just an example.
` Program program(MachineMode::I386);
x86::Assembler assembler(program);
auto label = assembler.createLabel();
ASSERT_EQ(assembler.jmp(label), Error::None);
for (int i = 0; i < 100; i++)
ASSERT_EQ(assembler.nop(), Error::None);
ASSERT_EQ(assembler.bind(label), Error::None);
ASSERT_EQ(assembler.int3(), Error::None);
ASSERT_EQ(assembler.align(Align::Type::Code, 10), Error::None);
Serializer serializer;
ASSERT_EQ(serializer.serialize(program, 0x0000000000401000), Error::None);`
My jmp should went to int3 instruction but it goes 3 bytes futher.I looked at the source code of zasm and noticed possible incorrect logic(I think).When jmp is encoding first time its size is 5.But on extrapass its value 2(cause now we already have bounded label and it changes from far jmp to short).So now ctx.drift is 3 and we should run pass one more time.But because of aligning at the end of the code drift becomes 0(3 - 3).And now zasm does not run the third pass (despite aligning happens not in range of our jmp)and he thinks that offset of int 3 is still 105 instead of 103.Am I doing something wrong or it is a bug?
P.S.Code is just an example dont take it as smth meaningful.
With zydis, you can call ZydisCalcAbsoluteAddressEx, but there is no way to retreive the relevant data to perform this from the Decoder
.
Can you suggest any way to do this please?
My target address is 0x347486 (0x224AB8A1486), but the generated assembly instruction's target address is 0x347485 (0x224AB8A1485). Is this a bug?
zasm::Program program(zasm::MachineMode::AMD64);
zasm::x86::Assembler assembler(program);
assembler.jmp(zasm::Imm(instr.jmp_rva));
zasm::Serializer serializer{};
auto res = serializer.serialize(program, instr.rva);
if (res == zasm::ErrorCode::None) {
auto ptr = serializer.getCode();
auto size = serializer.getCodeSize();
std::memcpy(reinterpret_cast<void *>(base.module_base + instr.rva),
serializer.getCode(), serializer.getCodeSize());
}
when creating a label we see the following code:
static Label createLabel_(detail::ProgramState& state, StringPool::Id nameId, StringPool::Id modId, LabelFlags flags)
{
const auto labelId = static_cast<Label::Id>(state.labels.size());
auto& entry = state.labels.emplace_back();
entry.id = labelId;
entry.flags = flags;
entry.nameId = nameId;
entry.moduleId = modId;
return Label{ labelId };
}
here we are adding a label to the label list but when we want to destroy the label node using destroyNode method the label list is not decremented
static void destroyNode(detail::ProgramState& state, Node* node, bool quickDestroy)
{
// Keep index before destroying the object.
const auto nodeIdx = static_cast<std::size_t>(node->getId());
notifyObservers<true>(&Observer::onNodeDestroy, state.observer, node);
// If this is called from clear or from destructor we can skip unlinking.
if (!quickDestroy)
{
// Ensure node is not in the list anymore.
detach_<false>(node, state);
}
// Release.
auto* nodeToDestroy = detail::toInternal(node);
state.nodePool.destroy(nodeToDestroy);
if (!quickDestroy)
{
// Release memory, when quickDestroy is true the entire pool will be cleared at once.
state.nodePool.deallocate(nodeToDestroy, 1);
// Remove mapping.
auto& nodeMap = state.nodeMap;
assert(nodeIdx < nodeMap.size());
// Null out the slot.
nodeMap[nodeIdx] = nullptr;
while (!nodeMap.empty() && nodeMap.back() == nullptr)
{
nodeMap.pop_back();
}
}
}
void Program::destroy(Node* node)
{
destroyNode(*_state, node, false);
}
here we don't see a descending list of tags. Is this a bug or am I missing something?
Should the Mem constructor that takes base and index registers have scale set to 1 instead of 0?
zasm/include/zasm/x86/memory.hpp
Line 27 in 012062f
Succeeds:
a.mov(zasm::x86::al, zasm::x86::byte_ptr(zasm::x86::rsi, zasm::x86::rcx, 1, 0));
Fails impossible instruction:
a.mov(zasm::x86::al, zasm::x86::byte_ptr(zasm::x86::rsi, zasm::x86::rcx));
Likely cause:
// ptr [base + index]
// ex.: mov eax, ptr [ecx+edx]
static constexpr Mem ptr(BitSize bitSize, const Gp& base, const Gp& index) noexcept
{
return Mem(bitSize, Seg{}, base, index, 0, 0);
}
I used to use Zydis for a while and recently decided to move to zasm.I am not able to find alternative feature in zasm.Is there is any?
Based on the provided example, trying to serialise a far jmp / call or r/w mem instructions results in a
ImpossibleInstruction
error.
e.g.
using namespace zasm;
const uint64_t address = 0x00000001400019A4;
const std::vector<uint8_t> code = {
0xFF, 0x15, 0x73, 0x16, 0x00, 0x00 // CALL QWORD PTR DS:[0x0000000140003028]
};
Program program(MachineMode::AMD64);
x86::Assembler assembler(program);
Decoder decoder(program.getMode());
Serializer serializer;
// Decode all bytes.
size_t bytesDecoded = 0;
while (bytesDecoded < code.size())
{
const auto curAddress = address + bytesDecoded;
auto decoderRes = decoder.decode(code.data() + bytesDecoded, code.size() - bytesDecoded, curAddress);
if (!decoderRes)
{
std::cout << "Failed to decode at " << std::hex << curAddress << "\n";
return;
}
const auto& instr = decoderRes.value();
assembler.emit(instr);
bytesDecoded += instr.getLength();
}
serializer.serialize(program, address); // Error
const auto codeDump = getHexDump(serializer.getCode(), serializer.getCodeSize());
std::cout << codeDump << "\n";
}
Currently I have to do flags = flags | (modifiedFlags &~readFlags)
instead of flags |= modifiedFlags & ~readFlags
(nonsensical, but ran into this while implementing liveness analysis)
Currently its quite difficult to tell which node causes the error, there should be a way to get extended information about what went wrong and which node is causing it.
I'm currently writing a program which will reencode nearly every instruction of a big problem(a game).
So there will be millions of calls on reencode callback. (see here.)
The problem is that if I declare the Program
and Assembler
as local variables, then there will be millions of object/memory allocations and deallocations which will be a big impact on speed.
But if I declare them as global variables, I didn't see any clear methods so the generated code will be accumulated on every call.
Is there any solutions to this?
The example code is useful but barely scratches the surface of any kind of practical application. Documentation definitely needed.
Hello,
I compiled Zasm from source code 4 days ago using latest Visual Studio 2022 (CMake tools) on Windows 11.
When using the Assembler class, I noticed a strange behavior where serializing would fail with error "Impossible instruction" even though the instruction is valid.
This behavior should be reproducible with following code (using latest MSVC for compilation):
#include "zasm/formatter/formatter.hpp"
#include "zasm.hpp"
#include <stdio.h>
int main()
{
using namespace zasm;
using namespace zasm::x86;
Program Program(MachineMode::AMD64);
Assembler Assembler(Program);
Serializer Serializer;
Error ZasmError;
Assembler.mov(ecx, Imm(0xFFFFFFFF));
ZasmError = Serializer.serialize(Program, 0);
if (ZasmError.getCode() != ErrorCode::None)
{
printf("%s\n", ZasmError.getErrorMessage()); // This line gets hit: "Error at node "mov ecx, 0xffffffff" with id 0: Impossible instruction"
exit(1);
}
exit(0);
}
When setting the immediate value to a slightly lower value (for example 0x1FFFFFFF), there is no error.
Hello, first of all, thanks for all that huge work, but i've been getting some problems with program size estimation that I want to report and maybe you can help me in fixing it(if its even possible).
In the example below I've used the estimateCodeSize
and allocatePage
from the basic_jit
example.
static std::size_t estimate_code_size(const zasm::Program& program) {
std::size_t size = 0;
for (auto* node = program.getHead(); node != nullptr; node = node->getNext()) {
if (auto* nodeData = node->getIf<zasm::Data>(); nodeData != nullptr) {
size += nodeData->getTotalSize();
} else if (auto* nodeInstr = node->getIf<zasm::Instruction>(); nodeInstr != nullptr) {
const auto& instrInfo = nodeInstr->getDetail(program.getMode());
if (instrInfo.hasValue()) {
size += instrInfo->getLength();
} else {
std::cout << "Error: Unable to get instruction info\n";
}
} else if (auto* nodeEmbeddedLabel = node->getIf<zasm::EmbeddedLabel>(); nodeEmbeddedLabel != nullptr) {
const auto bitSize = nodeEmbeddedLabel->getSize();
if (bitSize == zasm::BitSize::_32)
size += 4;
if (bitSize == zasm::BitSize::_64)
size += 8;
}
}
return size;
}
static void* allocate_page(std::size_t codeSize) {
return VirtualAlloc(0, codeSize, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
}
int main(int argc, char* argv[]) {
using namespace zasm;
Program program(MachineMode::AMD64);
x86::Assembler a(program);
{
auto test_cond = program.createLabel("test_cond");
a.cmp(x86::rcx, 0x1337);
a.jz(test_cond);
a.ret();
a.bind(test_cond);
a.nop();
a.ret();
}
const auto estimated_size = estimate_code_size(program);
void* code_page = allocate_page(estimated_size);
Serializer serializer;
if (auto err = serializer.serialize(program, reinterpret_cast<int64_t>(code_page)); err != zasm::Error::None) {
std::cout << "Serialization failure: " << zasm::getErrorName(err) << "\n";
return EXIT_FAILURE;
}
const auto serialized_size = serializer.getCodeSize();
std::memcpy(code_page, serializer.getCode(), serialized_size);
auto fmt_args = disasm::c_instance::fmt_args_t{.data_ptr = (uint8_t*)code_page,
.address = code_page,
.dump_opcodes = true,
.dump_address = true,
.dump_offset = true};
logger::info("Generated:\n{}", disasm::get().format_range(fmt_args, estimated_size));
logger::warn<1>("estimated size: {} | serialized_size: {}", estimated_size, serialized_size);
return 0;
}
The results of executing this function are:
As you can see, the estimated code size and serialized code size are different because of the JZ
instruction, which produced the 5 bytes
size instruction at the moment when we were estimating the code size, and it produced the 2 bytes
size instruction at the moment when we were serializing the output, and because of that, the output of my format function produced 2 invalid instructions at the end.
The reason why this happens is that at the moment we are estimating the code size we are encoding instructions one by one and we don't know the addresses of the prev/next instructions, so we can't estimate the relative offset.
Originally I thought about just creating a PR where I fix this, but the more I think the more questions I get.
Surely we can fix this kind of behaviour for the JMPs/JCCs to the instruction label, but I am not entirely sure how we can calculate the right instruction size if we are dealing with immediate addresses.
Formatter should be its own component and not be a part of Program
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.