I'm seeing a strange issue with libxdc reporting truncated addresses: <div class="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

ptdump log <div class="snippet-clipboard-content notranslate position-relative ove

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Truncated virtual address,about nyx-fuzz/libxdc

Comments (51)

tklengyel commented on September 25, 2024 1

@schumilo I've got a patch that fixes it, just need to check if there is any paperwork I need to file before I send a PR. This code is indeed buggy but it's also buggy in simple-pt where this was taken from. But on top of that the "optimizations" that were added to the simple-pt decoder are pretty much nonsensical. Looks like it was just pure luck that it has worked so far ;)

from libxdc.

bamiaux commented on September 25, 2024 1

ptdump log

0000000000000000  psb
0000000000000010  psbend
0000000000000012  cbr        27
0000000000000016  pad
0000000000000017  mode.exec  cs.l
0000000000000019  tip.pge    3: 00007ff7e5de1280
0000000000000020  pad
0000000000000021  pad
0000000000000022  tip        1: ????????????3660
0000000000000025  tnt.8      ...!!!
0000000000000026  tnt.8      !!....
0000000000000027  tnt.8      ..!!.
0000000000000028  pad
0000000000000029  pad
000000000000002a  tip        1: ????????????264e

Your last patch does fix this issue for me here, thanks !

from libxdc.

schumilo commented on September 25, 2024 1

@tklengyel Well, it really depends on the target. If the trace target executes a bunch of indirect jumps in a hot loop (like for example libxdc does), then it will definitely hurt the decoding performance.

from libxdc.

vient commented on September 25, 2024 1

I guess speed is not that important during actual fuzzing but I definitely see some (small) speed improvements in libxdc_experiments tests with such changes.

The real reason why I made my own version is because I wanted to make it fully conformant. In your PR, ipc == 4 seems to be not handled correctly (may cause troubles if we have two distant trace ranges in kernel mode? upper bits will be set to 0), sign extension is always applied (not a problem with usual 4-level paging), last_ip is not set to 0 on PSB (may cause troubles if we trace both kernel and userspace simultaneously?). I wonder if I missed any other edge cases in documentation.

If we talk about performance, your version is slower, half of the time seems to be spent in for loop, copying 4 bytes: https://quick-bench.com/q/YWb5akmV1DjCEJquaoyaO0niDUk (hope I didn't break anything since I needed to replace all returns with if ... else).

from libxdc.

vient commented on September 25, 2024 1

I mean not copying topmost 16 bits from previous IP value. Looks like it may be a problem only when we use 5-level paging, which is a rather unusual case.

Edit: I said that it may show up when "we have two distant trace ranges in kernel mode" (more than 4GB apart) but in your current version it is mitigated by always sign extending, so this case is too reduced to "possible troubles when using 5-level paging".

from libxdc.

eqv commented on September 25, 2024

It's somewhat surprising that the trace starts with a FUP. We usually expect a TIP.PGE which we use to determine the disassembly mode (16/32/64 bits). Maybe the disassembler assumes 32 bits? Sergej mentioned, that starting the trace with a FUP could be due to the fact that you did not enable trace regions in the MSR.

from libxdc.

tklengyel commented on September 25, 2024

We do not have trace regions set in the MSR, correct. I checked the disassembly mode used and it's 64-bit.

from libxdc.

tklengyel commented on September 25, 2024

Also note that in the ptdump output there is a TIP.PGE before the FUP, so I'm unsure why libxdc would skip that

from libxdc.

eqv commented on September 25, 2024

I don't see it?

from libxdc.

eqv commented on September 25, 2024

Next best guess would be: the FUP is compressed, and address decompression fails because we didn't see the last address. though this is quite unlikely, as the FUP appears to be 9 byte in size. Still something you could look into...

from libxdc.

tklengyel commented on September 25, 2024

The TIP.PGE is after psbend. I checked the ptdump output on the machine where it works and there too the TIP.PGE is only after psbend.

from libxdc.

eqv commented on September 25, 2024

0000000000000000  psb
0000000000000010  pad
0000000000000011  pad
0000000000000012  pad
0000000000000013  mode.tsx
0000000000000015  mode.exec  cs.l
0000000000000017  fup        6: ffffffffc05f803c

I don't see the TIP.PGE?

from libxdc.

tklengyel commented on September 25, 2024

0000000000000044  psbend
0000000000000046  pad
0000000000000047  tip.pge    6: ffffffffc05f803c

from libxdc.

tklengyel commented on September 25, 2024

On the machine where it works the first FUP without TIP.PGE before is decoded correctly

Starting trace from 0xffffffffc03d603c.
Writing 4 bytes of input to 0xffffffffc03d8010
Starting fuzz loop
[TRACER cpuid] RIP: 0xffffffffc03d608fCF limit: 0/18446744073709551615
CPUID leaf 13371337
         Harness signal on finish
Stopping fuzz loop.
PSB
MODE
MODE
FUP     ffffffffc03d603c (TNT: 0)

from libxdc.

eqv commented on September 25, 2024

The problem starts before the PGE though, so the second one is irrelevant for us. Very weird... Does the same dumped trace work on the other machine?

from libxdc.

tklengyel commented on September 25, 2024

It's kinda hard to transfer the buffer from one machine to another to test a replay. Different VMs, different VAs, etc.

from libxdc.

eqv commented on September 25, 2024

so its the same testcase that works, but potentially with a different pt dump?

from libxdc.

tklengyel commented on September 25, 2024

Right, the code is the same, but the underlying hw/dom0/vm's are different.

from libxdc.

eqv commented on September 25, 2024

https://github.com/nyx-fuzz/libxdc/blob/master/src/decoder.c#L315 <- you might want to investigate if you are triggering this case.

from libxdc.

tklengyel commented on September 25, 2024

I've added a printf to that switch case and it doesn't trigger;

Here is the ptdump output from the system where it works:

0000000000000000  psb
0000000000000010  pad
0000000000000011  pad
0000000000000012  pad
0000000000000013  mode.tsx
0000000000000015  mode.exec  cs.l
0000000000000017  fup        3: ffffffffc03d603c
000000000000001e  pad
000000000000001f  pad
0000000000000020  pad
0000000000000021  pad
0000000000000022  pad
0000000000000023  pad
0000000000000024  pad
0000000000000025  pad
0000000000000026  pip        ae5d2000, nr              cr3  00000000ae5d2000
000000000000002e  pad
000000000000002f  pad
0000000000000030  pad
0000000000000031  pad
0000000000000032  pad
0000000000000033  pad
0000000000000034  pad
0000000000000035  pad
0000000000000036  vmcs       148d35000                 vmcs 0000000148d35000
000000000000003d  pad
000000000000003e  pad
000000000000003f  pad
0000000000000040  cbr        27
0000000000000044  psbend
0000000000000046  pad
0000000000000047  tip.pge    3: ffffffffc03d603c
000000000000004e  pad
000000000000004f  pad
0000000000000050  pad
0000000000000051  pad
0000000000000052  pad
0000000000000053  pad
0000000000000054  pad
0000000000000055  pad
0000000000000056  pad
0000000000000057  fup        3: ffffffffc03d603c
000000000000005e  pad
000000000000005f  pad
0000000000000060  tip.pgd    0: ????????????????
0000000000000061  pad
0000000000000062  pad
0000000000000063  pad
0000000000000064  pad
0000000000000065  pad
0000000000000066  pad
0000000000000067  pad
0000000000000068  pad
0000000000000069  pad
000000000000006a  pad
000000000000006b  pad
000000000000006c  pad
000000000000006d  pad
000000000000006e  pad
000000000000006f  pad
0000000000000070  cbr        27
0000000000000074  pad
0000000000000075  mode.exec  cs.l
0000000000000077  tip.pge    3: ffffffffc03d603c
000000000000007e  pad
000000000000007f  pad
0000000000000080  pad
0000000000000081  pad
0000000000000082  pad
0000000000000083  pad
0000000000000084  pad
0000000000000085  pad
0000000000000086  pad
0000000000000087  fup        3: ffffffffc03d603c
000000000000008e  pad
000000000000008f  pad
0000000000000090  tip.pgd    0: ????????????????
0000000000000091  pad
0000000000000092  pad
0000000000000093  pad
0000000000000094  pad
0000000000000095  pad
0000000000000096  pad
0000000000000097  pad
0000000000000098  pad
0000000000000099  pad
000000000000009a  pad
000000000000009b  pad
000000000000009c  pad
000000000000009d  pad
000000000000009e  pad
000000000000009f  pad
00000000000000a0  cbr        27
00000000000000a4  pad
00000000000000a5  mode.exec  cs.l
00000000000000a7  tip.pge    3: ffffffffc03d603c
00000000000000ae  pad
00000000000000af  pad
00000000000000b0  tnt.8      .!!
00000000000000b1  pad
00000000000000b2  pad
00000000000000b3  pad
00000000000000b4  pad
00000000000000b5  pad
00000000000000b6  pad
00000000000000b7  fup        3: ffffffffc03d608f
00000000000000be  pad
00000000000000bf  pad
00000000000000c0  tip.pgd    0: ????????????????
00000000000000c1  pad
00000000000000c2  pad
00000000000000c3  pad
00000000000000c4  pad
00000000000000c5  pad
00000000000000c6  pad
00000000000000c7  pad
00000000000000c8  pad
00000000000000c9  pad
00000000000000ca  pad
00000000000000cb  pad
00000000000000cc  pad
00000000000000cd  pad
00000000000000ce  pad
00000000000000cf  pad

from libxdc.

tklengyel commented on September 25, 2024

I'm thinking perhaps the compiler version matters? Behavior changed for some bitshifting op? I tried downgrading to gcc-7 on the Ubuntu machine but that didn't make a difference.

from libxdc.

tklengyel commented on September 25, 2024

Nope, compiler doesn't seem to matter, tried with clang and same behavior.

from libxdc.

eqv commented on September 25, 2024

ubsan maybe? -O0 vs -O3? Dunno...

from libxdc.

tklengyel commented on September 25, 2024

Looks like the bitshifting magic in get_ip_val is where the issue is. The variable aligned_pp has the correct full VA in it but what comes out at the end in aligned_last_ip is missing the upper 32-bits.

from libxdc.

tklengyel commented on September 25, 2024

Hm, and this seems to be because len is calculated to be 6, while on the machine where it works its 3. The bitshifting magic has a 4-len in it, which we evidently underflow.

from libxdc.

tklengyel commented on September 25, 2024

Yea, looks like this code is buggy in libxdc, it confuses ip compression mode and the packet length as compared to https://github.com/intel/libipt/blob/master/libipt/src/pt_packet.c#L145

from libxdc.

schumilo commented on September 25, 2024

Can you share the trace files? I would try to fix that.

from libxdc.

tklengyel commented on September 25, 2024

@schumilo Looks like this is going to take a while so if in the interim you want to take a crack at fixing it I can tell you whether it works now or not. ¯\_(ツ)_/¯

from libxdc.

tklengyel commented on September 25, 2024

Actually got the approval sooner then expected to open-source the fixes. Yay for that ;)

from libxdc.

schumilo commented on September 25, 2024

Thanks for the patch! The klockwork patch looks good to me. We will merge it. But is there any chance to get a small reproducer for the TIP compression issue? I would like to have a look first. In addition, it would be great to add another sample to our regression tests (https://github.com/nyx-fuzz/libxdc_experiments/tree/master/test_data).

from libxdc.

tklengyel commented on September 25, 2024

I can get you the PT log and the memory-page cache saved. It's unclear to me what format you guys use in those experiments for these but you are welcome to take my saves and format them to your liking.

from libxdc.

tklengyel commented on September 25, 2024

FYI https://github.com/nyx-fuzz/libxdc_experiments/pull/2/files

from libxdc.

bamiaux commented on September 25, 2024

FWIW, applying this patch break decoding on my trace. I get a truncated vaddr, while it is correct on master.
Maybe something is still incorrect ?

PGE     7ff7e5de1280 (TNT: 0)
TIP       7ff7e5de3660 (TNT: 0)

PGE    	7ff7e5de1280 (TNT: 0)
TIP    	3660 (TNT: 0)

My CPU is i7-8650U Kaby Lake

from libxdc.

tklengyel commented on September 25, 2024

Odd, works on my kaby lake. Can you run your trace buffer through libipt/ptdump and paste its output? Would be also interested to see what ipc mode you have at that truncated TIP.

from libxdc.

tklengyel commented on September 25, 2024

Can you check with this patch added: 09fc990? Seems like libxdc had an expectation to have short ip's padded up to the previous ip's length. With this the libxdc_experiments all run fine AFAICT.

from libxdc.

vient commented on September 25, 2024

Here is my attempt at fixing IP decoding issue. Sadly it is a bit slower (~5%) than current version but I tried to make it fully conformant.

diff --git a/src/decoder.c b/src/decoder.c
index d2dd500..96fa0d4 100644
--- a/src/decoder.c
+++ b/src/decoder.c
@@ -318,21 +318,29 @@ static inline void decoder_handle_fup(decoder_state_machine_t *self, uint64_t fu
 }
 
 static inline uint64_t get_ip_val(uint8_t **pp, uint64_t *last_ip){
-       register uint8_t len = (*(*pp)++ >> PT_PKT_TIP_SHIFT);
-       if(unlikely(!len))
-               return 0;
-       uint64_t aligned_last_ip, aligned_pp;
-       memcpy(&aligned_pp, *pp, sizeof(uint64_t));
-       memcpy(&aligned_last_ip, last_ip, sizeof(uint64_t));
-       
-       aligned_last_ip = ((int64_t)((uint64_t)( 
-               ((aligned_pp & (0xFFFFFFFFFFFFFFFF >> ((4-len)*16))) | (aligned_last_ip & (0xFFFFFFFFFFFFFFFF << ((len)*16))) ) 
-       )<< (64 - 48))) >> (64 - 48);
-       
-       memcpy(last_ip, &aligned_last_ip, sizeof(uint64_t));
-
-       *pp += (len*2);
-       return *last_ip;
+        const uint8_t type = (*(*pp)++ >> 5);
+        uint64_t aligned_last_ip, aligned_pp;
+        memcpy(&aligned_pp, *pp, sizeof(uint64_t));
+        memcpy(&aligned_last_ip, last_ip, sizeof(uint64_t));
+
+        if (type == 0) {
+            return 0;
+        }
+
+        const uint8_t len = 0xFF08FF06060402FFull >> (type * 8);
+        if (unlikely(len == 0xFF)) {
+            //assert(false);
+        }
+        if (type == 3) {
+            aligned_last_ip = (int64_t)(aligned_pp << 16) >> 16;
+        } else {
+            const uint8_t new_bits = len * 8;
+            const uint8_t old_bits = sizeof(aligned_last_ip) * 8 - new_bits;
+            aligned_last_ip = ((aligned_last_ip >> new_bits) << new_bits) | ((aligned_pp << old_bits) >> old_bits);
+        }
+        memcpy(last_ip, &aligned_last_ip, sizeof(uint64_t));
+        *pp += len;
+        return aligned_last_ip;
 }
 
 static inline uint64_t get_val(uint8_t **pp, uint8_t len){
@@ -845,6 +853,7 @@ __attribute__((hot)) decoder_result_t decode_buffer(decoder_t* self, uint8_t* ma
                                DISPATCH_L1();
 
                        case __extension__ 0b10000010:  /* PSB */
+                               self->last_tip = 0;
                                p += PT_PKT_PSB_LEN;
                                LOGGER("PSB\n");
                                #ifdef DECODER_LOG

from libxdc.

eqv commented on September 25, 2024

still needs testing / measurement but this should be an equivalent version that compiles to branchfree code: https://godbolt.org/z/cr3b1s6WK

from libxdc.

eqv commented on September 25, 2024

Damn, GCC just doesn't like to emit cmovs ^^

from libxdc.

vient commented on September 25, 2024

Looks like branchless version is slower ~~https://quick-bench.com/q/3U07RR8XYl0nSNMe5bkbyiFmpck~~ sorry, tested on type 0, here are proper tests for type 2 (1.2 times slower) and type 3 (5 times slower). There are jumps inserted by the benchmark but they should be predicted perfectly.

from libxdc.

eqv commented on September 25, 2024

that's a pretty cool page, TIL!

from libxdc.

eqv commented on September 25, 2024

on gcc it's even worse ^^ 14x.

from libxdc.

schumilo commented on September 25, 2024

I've just added the icelake traces to our regression tests.
The test will obviously fail with the current master, but works great with @vient's proposed patch.

@vient
As @eqv already mentioned, it still needs more testing and some additional performance measurements.
And I wonder if there is still some room for performance improvements. But still, awesome patch - Thanks!

from libxdc.

vient commented on September 25, 2024

I wonder if there is still some room for performance improvements

I looked a bit more and it seems that adding some bit magic does the trick: https://quick-bench.com/q/ghcyJBnMJiCWIEcvrhaVs-g-rmQ on type 2 (and, equally, on types 1, 4, 6) new versions are nearly as fast as current code. I've changed current version a bit for benchmarking, it shouldn't have slowed down anything. On types 0 and 3 new version should be faster.

Note that NewNew and NewNew2 are equal on assembly level so it's up to you which version to choose, if any. Changes are small and look correct to me, I'll test a bit later with libxdc_experiments if these versions are actually correct.

from libxdc.

tklengyel commented on September 25, 2024

FYI I don't see performance difference with @vient 's patch above and my PR branch when fuzzing the same target with AFL++. So doing tricks like const uint8_t len = 0xFF08FF06060402FFull >> (type * 8); don't really seem to make much difference ¯\_(ツ)_/¯

from libxdc.

tklengyel commented on September 25, 2024

ipc == 4 seems to be not handled correctly

What do you mean?

from libxdc.

tklengyel commented on September 25, 2024

That quick-bench site is cool

from libxdc.

vient commented on September 25, 2024

I tested with libxdc_experiments a bit and it occurred that all my patches were defective since they all perform shifts by 64 in case of type 6, somehow only the last test failed. Corrected version is 0-12% slower than current code - 12% is achieved on performance test 6, other ones show only single-digit percent slowdown. ~~Microbenchmark shows that it is 1.4 times slower than current version~~ these benchmarks seem to be too unreliable, while restarting same benchmark gives almost same results, deleting one function changes the proportion between other two a lot.

from libxdc.

eqv commented on September 25, 2024

@vient would you like to create pull request for your patch?

from libxdc.

vient commented on September 25, 2024

Yeah, I can do it a bit later.

from libxdc.

eqv commented on September 25, 2024

Sweet, we can obviously just merge your patch manually (we already have the code for eval purposes), but we'd like you to get proper credit :)

from libxdc.

schumilo commented on September 25, 2024

Thanks again for your patch! :)

from libxdc.

Truncated virtual address about libxdc HOT 51 CLOSED

Comments (51)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent