Coder Social home page Coder Social logo

libxdc's People

Contributors

adrianherrera avatar eqv avatar schumilo avatar tklengyel avatar vient avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

libxdc's Issues

Support Visual Studio compiler?

I want to try this as a static library on Windows.My project is a VS project with libipt static library.I want to try libxdc to get better performance.How can i do?Thks.

API usage questions.

I'm trying to build my own fuzzer using the perf_event_open API for capturing the bitstream as a fun side project.
I've got a lot of questions regarding library usage, partially due to lacking background information. (I've sparsely read the Intel manual on the subject)

Let's start by library basic usage as far as I understand it.
Create decoder -> Run Decoder -> decoder fills bitmap according to the trace data. Where 1 means branch taken and 0 means not taken.
That much is clear to me. What is considered a branch is (what I presume) left to the capture configuration.

Though the arguments could be clarified a bit more as I needed to check the source what for what they meant.

libxdc_init:

  • Filter: Are these absolute and/or physical addresses or virtual?

    • What happens if you capture a guest VM from the host, is the address of the host or the guest?
    • What if you don't filter by address at record time but just by CR3, do you simply take the whole address range?
  • page_cache_fetch: The purpose of page_cache_fetch is to fetch data from the fuzzed target. However argument names, purpose of the arguments are left out, so I'm guessing what going on based on the tests.

    • void* self_ptr, uint64_t page, bool* success
      • self_ptr: opaque pointer
      • page: pointer to data it requests? (What is the size? Size of a page of the traced process?)
      • success: Was the memory fetch successful?
      • Return value: The address to access the data?

libxdc_decode: Pretty clear.

  • decoder: libxdc_init context
  • trace: The recorded intel pt trace bitstream
  • trace_size: Size of the trace (+ 1 for the 0x55 byte)
  • Return value: Status code

libxdc_register_bb_callback: called for each NEW basic block e.g. Function calls?

  • void* opaque_ptr,
  • uint64_t start_addr: Source address
  • uint64_t cofi_addr: Destination address

libxdc_register_edge_callback: Called on each branch e.g. If statements, switch case,?

  • void* opaque_ptr:
  • uint64_t src: Source address
  • uint64_t dst: Destination address

I made a simple test case but I haven't found any success so far
Which might be due to two things, in-proper capture set-up, in-proper library usage, both?
It's relatively small (200 ~LOC), I can post it if you want to take a look. ( Need to clean it up first though )

Thread safety

Is libxdc thread safe? It's not documented either way and I noticed that page_cache_lock and page_cache_unlock are empty functions -- which might or not be a complete red herring!

Truncated virtual address

I'm seeing a strange issue with libxdc reporting truncated addresses:

Starting trace from 0xffffffffc05f803c.
Writing 8 bytes of input to 0xffffffffc05fa010
Starting fuzz loop
[TRACER cpuid] RIP: 0xffffffffc05f808f
CPUID leaf 13371337
	 Harness signal on finish
Stopping fuzz loop.
PSB
MODE
MODE
FUP    	c05f803c (TNT: 0)
VMCS
PSBEND
PGE    	c05f803c (TNT: 0)
[IPT] 0xffffffffffffffff -> 0xc05f803c
FUP    	c05f803c (TNT: 0)
MODE


disasm(c05f803c,c05f803c)	TNT: 0
DIS mode 64
PTCOV fetch page of 0xc05f803c

I saved the buffer and ptdump works on it no problem

0000000000000000  psb
0000000000000010  pad
0000000000000011  pad
0000000000000012  pad
0000000000000013  mode.tsx
0000000000000015  mode.exec  cs.l
0000000000000017  fup        6: ffffffffc05f803c
0000000000000020  pad
0000000000000021  pad
0000000000000022  pad
0000000000000023  pad
0000000000000024  pad
0000000000000025  pad
0000000000000026  pip        1a2606000, nr             cr3  00000001a2606000
000000000000002e  pad
000000000000002f  pad
0000000000000030  pad
0000000000000031  pad
0000000000000032  pad
0000000000000033  pad
0000000000000034  pad
0000000000000035  pad
0000000000000036  vmcs       3db4bc000                 vmcs 00000003db4bc000
000000000000003d  pad
000000000000003e  pad
000000000000003f  pad
0000000000000040  cbr        29
0000000000000044  psbend
0000000000000046  pad
0000000000000047  tip.pge    6: ffffffffc05f803c
0000000000000050  pad
0000000000000051  pad
0000000000000052  pad
0000000000000053  pad
0000000000000054  pad
0000000000000055  pad
0000000000000056  pad
0000000000000057  fup        6: ffffffffc05f803c
0000000000000060  tip.pgd    0: ????????????????
0000000000000061  pad
0000000000000062  pad
0000000000000063  pad
0000000000000064  pad
0000000000000065  pad
0000000000000066  pad
0000000000000067  pad
0000000000000068  pad
0000000000000069  pad
000000000000006a  pad
000000000000006b  pad
000000000000006c  pad
000000000000006d  pad
000000000000006e  pad
000000000000006f  pad
0000000000000070  cbr        29
0000000000000074  pad
0000000000000075  mode.exec  cs.l
0000000000000077  tip.pge    6: ffffffffc05f803c
0000000000000080  pad
0000000000000081  pad
0000000000000082  pad
0000000000000083  pad
0000000000000084  pad
0000000000000085  pad
0000000000000086  pad
0000000000000087  fup        6: ffffffffc05f803c
0000000000000090  tip.pgd    0: ????????????????
0000000000000091  pad
0000000000000092  pad
0000000000000093  pad
0000000000000094  pad
0000000000000095  pad
0000000000000096  pad
0000000000000097  pad
0000000000000098  pad
0000000000000099  pad
000000000000009a  pad
000000000000009b  pad
000000000000009c  pad
000000000000009d  pad
000000000000009e  pad
000000000000009f  pad
00000000000000a0  cbr        29
00000000000000a4  pad
00000000000000a5  mode.exec  cs.l
00000000000000a7  tip.pge    6: ffffffffc05f803c
00000000000000b0  tnt.8      .!!
00000000000000b1  pad
00000000000000b2  pad
00000000000000b3  pad
00000000000000b4  pad
00000000000000b5  pad
00000000000000b6  pad
00000000000000b7  fup        6: ffffffffc05f808f
00000000000000c0  tip.pgd    0: ????????????????
00000000000000c1  pad
00000000000000c2  pad
00000000000000c3  pad
00000000000000c4  pad
00000000000000c5  pad
00000000000000c6  pad
00000000000000c7  pad
00000000000000c8  pad
00000000000000c9  pad
00000000000000ca  pad
00000000000000cb  pad
00000000000000cc  pad
00000000000000cd  pad
00000000000000ce  pad
00000000000000cf  pad

Strangely enough the same code works no problem on another machine. Only difference is this problem is on Ubuntu 20.04, vs Debian Buster where it works just fine. I tried switching to the same compiler on Ubuntu but no difference. Do you guys have any clue what might be the issue?

Low stability reported by AFL

While using libxdc to gather the basic-block information AFL reports low stability score. Currently I register a bb callback and feed the dst address received in that function to AFL as a location that was instrumented. Stability seems to be howering around ~18%. In my setup interrupts are blocked and the code being fuzzed is tiny with no external calls. The low stability score only pops up while using PT+libxdc, breakpoint based tracing yields stability in the ~95% range.

Decode error, no callbacks

Hey guys, so I'm running into an issue and I'm a bit stuck. I have a 64k PT buffer recorded by Xen and ptdump seems to be able to parse the buffer no problem.

00000000000010a0  psb
00000000000010b0  pad
00000000000010b1  pad
00000000000010b2  pad
00000000000010b3  mode.tsx
00000000000010b5  mode.exec  cs.l
00000000000010b7  fup        3: 00007f918d853264
00000000000010be  pad
00000000000010bf  pad
00000000000010c0  pad
00000000000010c1  pad
00000000000010c2  pad
00000000000010c3  pad
00000000000010c4  pad
00000000000010c5  pad
00000000000010c6  pip        b2619800, nr              cr3  00000000b2619800
00000000000010ce  pad
00000000000010cf  pad
00000000000010d0  pad
00000000000010d1  pad
00000000000010d2  pad
00000000000010d3  pad
00000000000010d4  pad
00000000000010d5  pad
00000000000010d6  vmcs       2b5b75000                 vmcs 00000002b5b75000
00000000000010dd  pad
00000000000010de  pad
00000000000010df  pad
00000000000010e0  pad
00000000000010e1  pad
00000000000010e2  pad
00000000000010e3  pad
00000000000010e4  pad
00000000000010e5  pad
00000000000010e6  tsc        412444672534
00000000000010ee  pad
00000000000010ef  pad
00000000000010f0  cbr        8
00000000000010f4  psbend
00000000000010f6  tnt.8      ..!.!.
:

I've enabled DEBUG_TRACES in libxdc but this is as far is gets:

PSB
MODE
MODE
FUP     7f918d853264 (TNT: 0)
VMCS

Afterwards libxdc_decode just returns with the value 4. I have the buffer saved to a file if that helps with debugging this further.

Pass active cr3 info to page_cache_fetch function

Currently the page_cache_fetch function only receives a virtual address to be retrieved. While this is sufficient for small traces where the target process is known, if we are tracing across processes or between kernel and userspace, we need to know what table to use to translate the virtual address with to grab the underlying page. As this information is carried in the PT buffer, having an "active pt" variable should be very low overhead on the libxdc side.

Can libxdc support to rebuild the Ptrix-type path coverage without disassemble?

libxdc is a great work in the decoders of Intel PT, particularly in the hardware-assisted fuzzing. I notice that the evaluation in this repo shows that libxdc is faster than Ptrix, which rebuilds the coverage without disassemble. Though libxdc utilizes many micro-optimization to accelerate the process of rebuilding coverage, I'm interesting in whether libxdc will be faster in rebuilding the Ptrix-type path coverage without disassembly.

Please add better documentation about what does page cache means in the context of this library

Hi,
You explain in the readme file that libxdc must receive a callback which helps to "request memory":

"To disassemble the target, a callback page_cache_fetch_fptrhas to be provided that allows libxdc to request memory"

This is not at all understandable at least for me. Do you mean the default page cache of the operation system? and if so, why do you need it? I would very appreciate if you insert detailed documentation about why you need it. The Intel PT decoder by intel (libipt) does not require that, so what's the bonus in here?

Wrong disassembly?

Hey guys,
so I'm trying to verify that the edge information I get from libxdc matches what I expected and so far it doesn't. The target code being traced flows like this when executed through MTF and disassembling each instruction with Capstone:

    0: 0xffffffffc03af03c      movsx [7, next: 0xffffffffc03af043] 0f be 0d cd 1f 00 00 bf  04 00 00 00 89 c8 99      ...............
    1: 0xffffffffc03af043        mov [5, next: 0xffffffffc03af048] bf 04 00 00 00 89 c8 99  f7 ff 83 fa 03 74 1f      .............t.
    2: 0xffffffffc03af048        mov [2, next: 0xffffffffc03af04a] 89 c8 99 f7 ff 83 fa 03  74 1f 83 fa 02 74 15      ........t....t.
    3: 0xffffffffc03af04a        cdq [1, next: 0xffffffffc03af04b] 99 f7 ff 83 fa 03 74 1f  83 fa 02 74 15 89 ce      ......t....t...
    4: 0xffffffffc03af04b       idiv [2, next: 0xffffffffc03af04d] f7 ff 83 fa 03 74 1f 83  fa 02 74 15 89 ce 40      .....t....t...@
    5: 0xffffffffc03af04d        cmp [3, next: 0xffffffffc03af050] 83 fa 03 74 1f 83 fa 02  74 15 89 ce 40 80 e6      ...t....t...@..
    6: 0xffffffffc03af050         je [2, next: 0xffffffffc03af052] 74 1f 83 fa 02 74 15 89  ce 40 80 e6 03 74 09      [email protected].
    7: 0xffffffffc03af052        cmp [3, next: 0xffffffffc03af055] 83 fa 02 74 15 89 ce 40  80 e6 03 74 09 ff ca      [email protected]...
    8: 0xffffffffc03af055         je [2, next: 0xffffffffc03af057] 74 15 89 ce 40 80 e6 03  74 09 ff ca 75 10 83      [email protected]..
    9: 0xffffffffc03af057        mov [2, next: 0xffffffffc03af059] 89 ce 40 80 e6 03 74 09  ff ca 75 10 83 c1 0c      [email protected]....
   10: 0xffffffffc03af059        and [4, next: 0xffffffffc03af05d] 40 80 e6 03 74 09 ff ca  75 10 83 c1 0c eb 0b      @...t...u......
   11: 0xffffffffc03af05d         je [2, next: 0xffffffffc03af05f] 74 09 ff ca 75 10 83 c1  0c eb 0b ff c1 eb 07      t...u..........
   12: 0xffffffffc03af068        inc [2, next: 0xffffffffc03af06a] ff c1 eb 07 6b c9 0c eb  02 ff c9 48 8b 05 96      ....k......H...
   13: 0xffffffffc03af06a        jmp [2, next: 0xffffffffc03af06c] eb 07 6b c9 0c eb 02 ff  c9 48 8b 05 96 1f 00      ..k......H.....
   14: 0xffffffffc03af073        mov [7, next: 0xffffffffc03af07a] 48 8b 05 96 1f 00 00 48  39 05 7f 1f 00 00 75      H......H9.....u
   15: 0xffffffffc03af07a        cmp [7, next: 0xffffffffc03af081] 48 39 05 7f 1f 00 00 75  07 89 0c 25 00 00 00      H9.....u...%...
   16: 0xffffffffc03af081        jne [2, next: 0xffffffffc03af083] 75 07 89 0c 25 00 00 00  00 b8 37 13 37 13 0f      u...%.....7.7..
   17: 0xffffffffc03af08a        mov [5, next: 0xffffffffc03af08f] b8 37 13 37 13 0f a2 31  f6 48 c7 c7 77 00 3b      .7.7...1.H..w.;
   18: 0xffffffffc03af08f      cpuid [2, next: 0xffffffffc03af091] 0f a2 31 f6 48 c7 c7 77  00 3b c0 e8 f4 56 d6      ..1.H..w.;...V.

The full decode log with the diassembly is:

PSB
MODE
MODE
FUP     ffffffffc03af03c (TNT: 0)
VMCS
PSBEND
PGE     ffffffffc03af03c (TNT: 0)
[IPT] 0xffffffffffffffff -> 0xffffffffc03af03c
FUP     ffffffffc03af03c (TNT: 0)


disasm(ffffffffc03af03c,ffffffffc03af03c)       TNT: 0
[IPT] Caching page 0xffffffffc03af
DISASM @ 0xffffffffc03af03e add byte ptr [rax], al
[IPT] Cached page found 0xffffffffc03af
DISASM @ 0xffffffffc03af045 cmp qword ptr [rip + 0x1f7f], rax
[IPT] Cached page found 0xffffffffc03af
DISASM @ 0xffffffffc03af047 jne 0xffffffffc03af04e
DISASM FOUND COFI
PGD     ffffffffc03af03c (TNT: 0)


disasm(ffffffffc03af03c,ffffffffc03af03c)       TNT: 0
[IPT] 0xffffffffc03af03c -> 0xffffffffffffffff
MODE
PGE     ffffffffc03af03c (TNT: 0)
[IPT] 0xffffffffffffffff -> 0xffffffffc03af03c
FUP     ffffffffc03af03c (TNT: 0)


disasm(ffffffffc03af03c,ffffffffc03af03c)       TNT: 0
PGD     ffffffffc03af03c (TNT: 0)


disasm(ffffffffc03af03c,ffffffffc03af03c)       TNT: 0
[IPT] 0xffffffffc03af03c -> 0xffffffffffffffff
MODE
PGE     ffffffffc03af03c (TNT: 0)
[IPT] 0xffffffffffffffff -> 0xffffffffc03af03c
TNT 16
FUP     ffffffffc03af08f (TNT: 3)


disasm(ffffffffc03af03c,ffffffffc03af08f)       TNT: 3
[IPT] 0xffffffffc03af045 -> 0xffffffffc03af047
[IPT] Cached page found 0xffffffffc03af
DISASM @ 0xffffffffc03af04e mov dword ptr [0], ecx
[IPT] Cached page found 0xffffffffc03af
DISASM @ 0xffffffffc03af053 mov eax, 0x13371337
[IPT] Cached page found 0xffffffffc03af
DISASM @ 0xffffffffc03af055 cpuid
[IPT] Cached page found 0xffffffffc03af
DISASM @ 0xffffffffc03af057 xor esi, esi
[IPT] Cached page found 0xffffffffc03af
DISASM @ 0xffffffffc03af05e mov rdi, -0x3fc4ff89
[IPT] Cached page found 0xffffffffc03af
DISASM @ 0xffffffffc03af063 call 0xffffffff81114757
DISASM FOUND COFI
[IPT] 0xffffffffc03af05e -> 0xffffffff81114757
[IPT] Caching page 0xffffffff81114

The disassembly looks off compared to what it should be. Just at the start it starts to disassemble:

disasm(ffffffffc03af03c,ffffffffc03af03c)       TNT: 0
[IPT] Caching page 0xffffffffc03af
DISASM @ 0xffffffffc03af03e

when there is no instruction at that location. Why does it start to disassemble from 0xffffffffc03af03e instead of ffffffffc03af03c?

ERR: TNT ...

A new issue I've encountered. The PT buffer is getting processed AFAICT but still no calls to the bb or page-cache callback functions.

This is my init code:

    uint64_t filter[4][2] = {0};
    void* bitmap = malloc(0x10000);
    libxdc_t* decoder = libxdc_init(filter, &page_cache_fetch, NULL, bitmap, 0x10000);
    libxdc_register_bb_callback(decoder, &trace_log, NULL);
    ret = libxdc_decode(decoder, buf, pt_buf_size);
    libxdc_free(decoder);
    free(bitmap);

The processing stops with an error message after a bit. With DEBUG_TRACES enabled I see this at the end.

disasm(ffffffff816f3b07,0)      TNT: 30270
TNT 5a
TIP     ffffffff8114160c (TNT: 30275)


disasm(ffffffff810e4403,0)      TNT: 30275
TNT 4
TIP     ffffffff811415be (TNT: 30276)


disasm(ffffffff8114160c,0)      TNT: 30276
TNT e
TIP     ffffffff811415e5 (TNT: 30278)


disasm(ffffffff811415be,0)      TNT: 30278

ERR:    TNT 30278

It seems to have gotten quite far into the trace. Any recommendation on how to further debug this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.