Comments (51)
@schumilo I've got a patch that fixes it, just need to check if there is any paperwork I need to file before I send a PR. This code is indeed buggy but it's also buggy in simple-pt where this was taken from. But on top of that the "optimizations" that were added to the simple-pt decoder are pretty much nonsensical. Looks like it was just pure luck that it has worked so far ;)
from libxdc.
ptdump log
0000000000000000 psb
0000000000000010 psbend
0000000000000012 cbr 27
0000000000000016 pad
0000000000000017 mode.exec cs.l
0000000000000019 tip.pge 3: 00007ff7e5de1280
0000000000000020 pad
0000000000000021 pad
0000000000000022 tip 1: ????????????3660
0000000000000025 tnt.8 ...!!!
0000000000000026 tnt.8 !!....
0000000000000027 tnt.8 ..!!.
0000000000000028 pad
0000000000000029 pad
000000000000002a tip 1: ????????????264e
Your last patch does fix this issue for me here, thanks !
from libxdc.
@tklengyel Well, it really depends on the target. If the trace target executes a bunch of indirect jumps in a hot loop (like for example libxdc does), then it will definitely hurt the decoding performance.
from libxdc.
I guess speed is not that important during actual fuzzing but I definitely see some (small) speed improvements in libxdc_experiments tests with such changes.
The real reason why I made my own version is because I wanted to make it fully conformant. In your PR, ipc == 4
seems to be not handled correctly (may cause troubles if we have two distant trace ranges in kernel mode? upper bits will be set to 0), sign extension is always applied (not a problem with usual 4-level paging), last_ip
is not set to 0 on PSB (may cause troubles if we trace both kernel and userspace simultaneously?). I wonder if I missed any other edge cases in documentation.
If we talk about performance, your version is slower, half of the time seems to be spent in for loop, copying 4 bytes: https://quick-bench.com/q/YWb5akmV1DjCEJquaoyaO0niDUk (hope I didn't break anything since I needed to replace all return
s with if ... else
).
from libxdc.
I mean not copying topmost 16 bits from previous IP value. Looks like it may be a problem only when we use 5-level paging, which is a rather unusual case.
Edit: I said that it may show up when "we have two distant trace ranges in kernel mode" (more than 4GB apart) but in your current version it is mitigated by always sign extending, so this case is too reduced to "possible troubles when using 5-level paging".
from libxdc.
It's somewhat surprising that the trace starts with a FUP. We usually expect a TIP.PGE which we use to determine the disassembly mode (16/32/64 bits). Maybe the disassembler assumes 32 bits? Sergej mentioned, that starting the trace with a FUP could be due to the fact that you did not enable trace regions in the MSR.
from libxdc.
We do not have trace regions set in the MSR, correct. I checked the disassembly mode used and it's 64-bit.
from libxdc.
Also note that in the ptdump output there is a TIP.PGE before the FUP, so I'm unsure why libxdc would skip that
from libxdc.
I don't see it?
from libxdc.
Next best guess would be: the FUP is compressed, and address decompression fails because we didn't see the last address. though this is quite unlikely, as the FUP appears to be 9 byte in size. Still something you could look into...
from libxdc.
The TIP.PGE is after psbend
. I checked the ptdump output on the machine where it works and there too the TIP.PGE is only after psbend.
from libxdc.
0000000000000000 psb
0000000000000010 pad
0000000000000011 pad
0000000000000012 pad
0000000000000013 mode.tsx
0000000000000015 mode.exec cs.l
0000000000000017 fup 6: ffffffffc05f803c
I don't see the TIP.PGE?
from libxdc.
0000000000000044 psbend
0000000000000046 pad
0000000000000047 tip.pge 6: ffffffffc05f803c
from libxdc.
On the machine where it works the first FUP without TIP.PGE before is decoded correctly
Starting trace from 0xffffffffc03d603c.
Writing 4 bytes of input to 0xffffffffc03d8010
Starting fuzz loop
[TRACER cpuid] RIP: 0xffffffffc03d608fCF limit: 0/18446744073709551615
CPUID leaf 13371337
Harness signal on finish
Stopping fuzz loop.
PSB
MODE
MODE
FUP ffffffffc03d603c (TNT: 0)
from libxdc.
The problem starts before the PGE though, so the second one is irrelevant for us. Very weird... Does the same dumped trace work on the other machine?
from libxdc.
It's kinda hard to transfer the buffer from one machine to another to test a replay. Different VMs, different VAs, etc.
from libxdc.
so its the same testcase that works, but potentially with a different pt dump?
from libxdc.
Right, the code is the same, but the underlying hw/dom0/vm's are different.
from libxdc.
https://github.com/nyx-fuzz/libxdc/blob/master/src/decoder.c#L315 <- you might want to investigate if you are triggering this case.
from libxdc.
I've added a printf to that switch case and it doesn't trigger;
Here is the ptdump output from the system where it works:
0000000000000000 psb
0000000000000010 pad
0000000000000011 pad
0000000000000012 pad
0000000000000013 mode.tsx
0000000000000015 mode.exec cs.l
0000000000000017 fup 3: ffffffffc03d603c
000000000000001e pad
000000000000001f pad
0000000000000020 pad
0000000000000021 pad
0000000000000022 pad
0000000000000023 pad
0000000000000024 pad
0000000000000025 pad
0000000000000026 pip ae5d2000, nr cr3 00000000ae5d2000
000000000000002e pad
000000000000002f pad
0000000000000030 pad
0000000000000031 pad
0000000000000032 pad
0000000000000033 pad
0000000000000034 pad
0000000000000035 pad
0000000000000036 vmcs 148d35000 vmcs 0000000148d35000
000000000000003d pad
000000000000003e pad
000000000000003f pad
0000000000000040 cbr 27
0000000000000044 psbend
0000000000000046 pad
0000000000000047 tip.pge 3: ffffffffc03d603c
000000000000004e pad
000000000000004f pad
0000000000000050 pad
0000000000000051 pad
0000000000000052 pad
0000000000000053 pad
0000000000000054 pad
0000000000000055 pad
0000000000000056 pad
0000000000000057 fup 3: ffffffffc03d603c
000000000000005e pad
000000000000005f pad
0000000000000060 tip.pgd 0: ????????????????
0000000000000061 pad
0000000000000062 pad
0000000000000063 pad
0000000000000064 pad
0000000000000065 pad
0000000000000066 pad
0000000000000067 pad
0000000000000068 pad
0000000000000069 pad
000000000000006a pad
000000000000006b pad
000000000000006c pad
000000000000006d pad
000000000000006e pad
000000000000006f pad
0000000000000070 cbr 27
0000000000000074 pad
0000000000000075 mode.exec cs.l
0000000000000077 tip.pge 3: ffffffffc03d603c
000000000000007e pad
000000000000007f pad
0000000000000080 pad
0000000000000081 pad
0000000000000082 pad
0000000000000083 pad
0000000000000084 pad
0000000000000085 pad
0000000000000086 pad
0000000000000087 fup 3: ffffffffc03d603c
000000000000008e pad
000000000000008f pad
0000000000000090 tip.pgd 0: ????????????????
0000000000000091 pad
0000000000000092 pad
0000000000000093 pad
0000000000000094 pad
0000000000000095 pad
0000000000000096 pad
0000000000000097 pad
0000000000000098 pad
0000000000000099 pad
000000000000009a pad
000000000000009b pad
000000000000009c pad
000000000000009d pad
000000000000009e pad
000000000000009f pad
00000000000000a0 cbr 27
00000000000000a4 pad
00000000000000a5 mode.exec cs.l
00000000000000a7 tip.pge 3: ffffffffc03d603c
00000000000000ae pad
00000000000000af pad
00000000000000b0 tnt.8 .!!
00000000000000b1 pad
00000000000000b2 pad
00000000000000b3 pad
00000000000000b4 pad
00000000000000b5 pad
00000000000000b6 pad
00000000000000b7 fup 3: ffffffffc03d608f
00000000000000be pad
00000000000000bf pad
00000000000000c0 tip.pgd 0: ????????????????
00000000000000c1 pad
00000000000000c2 pad
00000000000000c3 pad
00000000000000c4 pad
00000000000000c5 pad
00000000000000c6 pad
00000000000000c7 pad
00000000000000c8 pad
00000000000000c9 pad
00000000000000ca pad
00000000000000cb pad
00000000000000cc pad
00000000000000cd pad
00000000000000ce pad
00000000000000cf pad
from libxdc.
I'm thinking perhaps the compiler version matters? Behavior changed for some bitshifting op? I tried downgrading to gcc-7 on the Ubuntu machine but that didn't make a difference.
from libxdc.
Nope, compiler doesn't seem to matter, tried with clang and same behavior.
from libxdc.
ubsan maybe? -O0 vs -O3? Dunno...
from libxdc.
Looks like the bitshifting magic in get_ip_val
is where the issue is. The variable aligned_pp
has the correct full VA in it but what comes out at the end in aligned_last_ip
is missing the upper 32-bits.
from libxdc.
Hm, and this seems to be because len
is calculated to be 6
, while on the machine where it works its 3
. The bitshifting magic has a 4-len in it, which we evidently underflow.
from libxdc.
Yea, looks like this code is buggy in libxdc
, it confuses ip compression mode and the packet length as compared to https://github.com/intel/libipt/blob/master/libipt/src/pt_packet.c#L145
from libxdc.
Can you share the trace files? I would try to fix that.
from libxdc.
@schumilo Looks like this is going to take a while so if in the interim you want to take a crack at fixing it I can tell you whether it works now or not. ¯\_(ツ)_/¯
from libxdc.
Actually got the approval sooner then expected to open-source the fixes. Yay for that ;)
from libxdc.
Thanks for the patch! The klockwork patch looks good to me. We will merge it. But is there any chance to get a small reproducer for the TIP compression issue? I would like to have a look first. In addition, it would be great to add another sample to our regression tests (https://github.com/nyx-fuzz/libxdc_experiments/tree/master/test_data).
from libxdc.
I can get you the PT log and the memory-page cache saved. It's unclear to me what format you guys use in those experiments for these but you are welcome to take my saves and format them to your liking.
from libxdc.
FYI https://github.com/nyx-fuzz/libxdc_experiments/pull/2/files
from libxdc.
FWIW, applying this patch break decoding on my trace. I get a truncated vaddr, while it is correct on master.
Maybe something is still incorrect ?
PGE 7ff7e5de1280 (TNT: 0)
TIP 7ff7e5de3660 (TNT: 0)
vs
PGE 7ff7e5de1280 (TNT: 0)
TIP 3660 (TNT: 0)
My CPU is i7-8650U Kaby Lake
from libxdc.
Odd, works on my kaby lake. Can you run your trace buffer through libipt/ptdump and paste its output? Would be also interested to see what ipc mode you have at that truncated TIP.
from libxdc.
Can you check with this patch added: 09fc990? Seems like libxdc had an expectation to have short ip's padded up to the previous ip's length. With this the libxdc_experiments all run fine AFAICT.
from libxdc.
Here is my attempt at fixing IP decoding issue. Sadly it is a bit slower (~5%) than current version but I tried to make it fully conformant.
diff --git a/src/decoder.c b/src/decoder.c
index d2dd500..96fa0d4 100644
--- a/src/decoder.c
+++ b/src/decoder.c
@@ -318,21 +318,29 @@ static inline void decoder_handle_fup(decoder_state_machine_t *self, uint64_t fu
}
static inline uint64_t get_ip_val(uint8_t **pp, uint64_t *last_ip){
- register uint8_t len = (*(*pp)++ >> PT_PKT_TIP_SHIFT);
- if(unlikely(!len))
- return 0;
- uint64_t aligned_last_ip, aligned_pp;
- memcpy(&aligned_pp, *pp, sizeof(uint64_t));
- memcpy(&aligned_last_ip, last_ip, sizeof(uint64_t));
-
- aligned_last_ip = ((int64_t)((uint64_t)(
- ((aligned_pp & (0xFFFFFFFFFFFFFFFF >> ((4-len)*16))) | (aligned_last_ip & (0xFFFFFFFFFFFFFFFF << ((len)*16))) )
- )<< (64 - 48))) >> (64 - 48);
-
- memcpy(last_ip, &aligned_last_ip, sizeof(uint64_t));
-
- *pp += (len*2);
- return *last_ip;
+ const uint8_t type = (*(*pp)++ >> 5);
+ uint64_t aligned_last_ip, aligned_pp;
+ memcpy(&aligned_pp, *pp, sizeof(uint64_t));
+ memcpy(&aligned_last_ip, last_ip, sizeof(uint64_t));
+
+ if (type == 0) {
+ return 0;
+ }
+
+ const uint8_t len = 0xFF08FF06060402FFull >> (type * 8);
+ if (unlikely(len == 0xFF)) {
+ //assert(false);
+ }
+ if (type == 3) {
+ aligned_last_ip = (int64_t)(aligned_pp << 16) >> 16;
+ } else {
+ const uint8_t new_bits = len * 8;
+ const uint8_t old_bits = sizeof(aligned_last_ip) * 8 - new_bits;
+ aligned_last_ip = ((aligned_last_ip >> new_bits) << new_bits) | ((aligned_pp << old_bits) >> old_bits);
+ }
+ memcpy(last_ip, &aligned_last_ip, sizeof(uint64_t));
+ *pp += len;
+ return aligned_last_ip;
}
static inline uint64_t get_val(uint8_t **pp, uint8_t len){
@@ -845,6 +853,7 @@ __attribute__((hot)) decoder_result_t decode_buffer(decoder_t* self, uint8_t* ma
DISPATCH_L1();
case __extension__ 0b10000010: /* PSB */
+ self->last_tip = 0;
p += PT_PKT_PSB_LEN;
LOGGER("PSB\n");
#ifdef DECODER_LOG
from libxdc.
still needs testing / measurement but this should be an equivalent version that compiles to branchfree code: https://godbolt.org/z/cr3b1s6WK
from libxdc.
Damn, GCC just doesn't like to emit cmovs ^^
from libxdc.
Looks like branchless version is slower https://quick-bench.com/q/3U07RR8XYl0nSNMe5bkbyiFmpck sorry, tested on type 0, here are proper tests for type 2 (1.2 times slower) and type 3 (5 times slower). There are jumps inserted by the benchmark but they should be predicted perfectly.
from libxdc.
that's a pretty cool page, TIL!
from libxdc.
on gcc it's even worse ^^ 14x.
from libxdc.
I've just added the icelake traces to our regression tests.
The test will obviously fail with the current master, but works great with @vient's proposed patch.
@vient
As @eqv already mentioned, it still needs more testing and some additional performance measurements.
And I wonder if there is still some room for performance improvements. But still, awesome patch - Thanks!
from libxdc.
I wonder if there is still some room for performance improvements
I looked a bit more and it seems that adding some bit magic does the trick: https://quick-bench.com/q/ghcyJBnMJiCWIEcvrhaVs-g-rmQ on type 2 (and, equally, on types 1, 4, 6) new versions are nearly as fast as current code. I've changed current version a bit for benchmarking, it shouldn't have slowed down anything. On types 0 and 3 new version should be faster.
Note that NewNew and NewNew2 are equal on assembly level so it's up to you which version to choose, if any. Changes are small and look correct to me, I'll test a bit later with libxdc_experiments if these versions are actually correct.
from libxdc.
FYI I don't see performance difference with @vient 's patch above and my PR branch when fuzzing the same target with AFL++. So doing tricks like const uint8_t len = 0xFF08FF06060402FFull >> (type * 8);
don't really seem to make much difference ¯\_(ツ)_/¯
from libxdc.
ipc == 4 seems to be not handled correctly
What do you mean?
from libxdc.
That quick-bench site is cool
from libxdc.
I tested with libxdc_experiments a bit and it occurred that all my patches were defective since they all perform shifts by 64 in case of type 6, somehow only the last test failed. Corrected version is 0-12% slower than current code - 12% is achieved on performance test 6, other ones show only single-digit percent slowdown. Microbenchmark shows that it is 1.4 times slower than current version these benchmarks seem to be too unreliable, while restarting same benchmark gives almost same results, deleting one function changes the proportion between other two a lot.
from libxdc.
@vient would you like to create pull request for your patch?
from libxdc.
Yeah, I can do it a bit later.
from libxdc.
Sweet, we can obviously just merge your patch manually (we already have the code for eval purposes), but we'd like you to get proper credit :)
from libxdc.
Thanks again for your patch! :)
from libxdc.
Related Issues (11)
- Fatal error: can't create build/cfg.o: No such file or directory HOT 1
- API usage questions. HOT 6
- Please add better documentation about what does page cache means in the context of this library
- Thread safety
- Decode error, no callbacks HOT 3
- ERR: TNT ... HOT 8
- Pass active cr3 info to page_cache_fetch function HOT 3
- Wrong disassembly? HOT 1
- Low stability reported by AFL HOT 6
- Support Visual Studio compiler? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from libxdc.