Comments (40)
Thanks so much for the sample! I'll look at it later today and fix it, it indeed seems that a block is probably skipped.
from evtx.
Should be fixed with #11.
This was an interesting samples due to the issue I've described in the PR, where it has more record blocks than specified in the header.
Feel free to reopen if you think there is still a problem.
from evtx.
released 0.1.7 to solve this!
Just a heads up: 0.2.0 is coming with JSON support, it will be released this week or the next if you'd like to try it :)
from evtx.
JSONL output would be incredible. Would make ingestion into NoSQL solutions the easiest task.
from evtx.
@devgc it's indeed coming very soon! I'd love some feedback on the format when it's out :)
from evtx.
@omerbenamram Lets re-open this issue. I found more EVTX files where data is being skipped. I am working on getting these files uploaded so you can debug against them.
You can ignore the filename, I only kept the one from the the EvtxECmd output.
from evtx.
Here are the files.
https://www.dropbox.com/s/hwcpgmde1vtqrxt/Issues-20190418.zip?dl=0
(link expires 2019/06/01 or will be removed after resolution)
from evtx.
@devgc reopened, I'll have a look at all of them, thanks for the samples again.
from evtx.
@devgc just letting you know, these files are missing some parts of the spec I've yet to receive samples that contain them (they immediately error on "unimplemented" code branches).
It should be trivial to fix :) I'll have it fixed soon, and I'll cover some additional branches while at it.
from evtx.
@devgc all of your samples are working as expected with #14 .
I'm still adding some extra features (arrays of f64, f32) while at it, and improve tests by adding some snapshot testing.
Question: Regarding XML output - I've seen that Zimmer's parser escapes them, while metz parser doesn't.
I'm not sure about the correct thing to do here, I think I like the unescaped output better.
xml-rs
can obviously do both, what do you think?
from evtx.
@devgc released 0.1.9
please close this after running your checks :)
from evtx.
Still finding differences in another file:
evtxecmd - 2-system-Microsoft-Windows-GroupPolicy%4Operational.evtx.evtxecmd.xml -> 3028 Events
evtx_dump - 2-system-Microsoft-Windows-GroupPolicy%4Operational.evtx.evtx_dump.xml -> 2931 Events
Here is the file (link expires 2019/06/01 or will be removed after resolution):
https://www.dropbox.com/s/5eolz6e7ek215rf/Issues-20190419.zip?dl=0
These appear to be the records missing:
[2562, 536, 1570, 1583, 562, 564, 569, 1597, 2827, 1609, 2132, 605, 2669, 634, 141, 654, 143, 148, 661, 663, 668, 157, 1694, 159, 1696, 164, 1701, 182, 188, 193, 202, 204, 1742, 1744, 209, 1749, 2776, 2778, 2783, 1767, 1773, 241, 1778, 247, 1274, 1787, 252, 1789, 1794, 262, 2824, 267, 2829, 273, 2834, 1812, 2327, 1818, 1308, 286, 1823, 292, 297, 306, 1843, 308, 1845, 313, 1850, 1350, 1368, 349, 351, 356, 365, 367, 372, 1398, 2427, 1415, 2199, 403, 405, 410, 2717, 2992, 435, 2998, 3003, 1474, 3012, 3014, 3019, 2011, 1516, 1518, 1523]
from evtx.
@devgc all of your samples are working as expected with #14 .
I'm still adding some extra features (arrays of f64, f32) while at it, and improve tests by adding some snapshot testing.Question: Regarding XML output - I've seen that Zimmer's parser escapes them, while metz parser doesn't.
I'm not sure about the correct thing to do here, I think I like the unescaped output better.
xml-rs
can obviously do both, what do you think?
IMO it should be escaped. Doing post processing on it generally brakes XML parsers. If a binary value contains XML i think it should be handled independently post parsing.
For example, after running evtx_dump and evtxecmd. I then made a script to create JSONL output from the XML. I get many errors running it on evtx_dump output because data is unescaped. I got 1 error out of evtxecmd.
from evtx.
@devgc this is actually quite strage.
The reason the parser was failing is that it was too strict regarding boolean values.
Other parsers just check assume that if the value is non-zero than it's true
.
For these particular records I was getting 8 or 16 which is strage, since the spec specifies it should have been strictly 0 or 1.
I've changed this to emit a warning, and to fallback to true
, since I'm not sure what else is possible to do here.
Metz:
https://github.com/libyal/libfvalue/blob/master/libfvalue/libfvalue_integer.c#L1108
from evtx.
@devgc 0.2.0 was released with this fix, and with JSON output :)
Let me know what you think
from evtx.
Here are more issues. See the differences.tsv file.
(link expires 2019/06/01 or will be removed after resolution)
https://www.dropbox.com/s/iq8z1pujuz39xfh/ProblemEvents_201904220133.zip?dl=0
from evtx.
Many of these files are from Volume Shadow Snapshots and were extracted using libvshadow (dfvfs via pyvshadow). Not all of these files maintain the same hash extracting using libvshadow vs Windows API via symlinks. It would appear that using libvshadow can result in "dirtier" files than Windows API.
from evtx.
@devgc there are still places with differences, but after #20 and #22 we are a lot closer / have more events.
You can give it a try with cargo install --git https://github.com/omerbenamram/evtx.git --branch bugfix/missing-events-2-empty-chunks
(that branch contains both fixes).
For now this is the scoreboard:
evtxcmd evtx_dump diff (less is better)
0 79 -79 2-system-Microsoft-Windows-PowerShell%4Operational.evtx
1749 1863 -114 2-vss_0-Microsoft-Windows-AppXDeployment%4Operational.evtx
0 1910 -1910 2-vss_0-Microsoft-Windows-RemoteDesktopServices-RdpCoreTS%4Operational.evtx
0 387 -387 2-vss_0-Microsoft-Windows-Security-LessPrivilegedAppContainer%4Operational.evtx
1677 1774 -97 2-vss_0-Microsoft-Windows-TerminalServices-RemoteConnectionManager%4Operational.evtx
2 2 0 2-vss_0-Microsoft-Windows-User Device Registration%4Admin.evtx
0 977 -977 2-vss_0-Microsoft-Windows-Windows Defender%4Operational.evtx
0 1864 -1864 2-vss_1-Microsoft-Windows-AppXDeployment%4Operational.evtx
0 1911 -1911 2-vss_1-Microsoft-Windows-RemoteDesktopServices-RdpCoreTS%4Operational.evtx
0 387 -387 2-vss_1-Microsoft-Windows-Security-LessPrivilegedAppContainer%4Operational.evtx
1765 1796 -31 2-vss_1-Microsoft-Windows-TerminalServices-RemoteConnectionManager%4Operational.evtx
2 2 0 2-vss_1-Microsoft-Windows-User Device Registration%4Admin.evtx
0 1034 -1034 2-vss_1-Microsoft-Windows-Windows Defender%4Operational.evtx
1865 1979 -114 2-vss_2-Microsoft-Windows-AppXDeployment%4Operational.evtx
0 1906 -1906 2-vss_2-Microsoft-Windows-RemoteDesktopServices-RdpCoreTS%4Operational.evtx
28308 28308 0 2-vss_2-Microsoft-Windows-Store%4Operational.evtx
1782 1813 -31 2-vss_2-Microsoft-Windows-TerminalServices-RemoteConnectionManager%4Operational.evtx
0 1060 -1060 2-vss_2-Microsoft-Windows-Windows Defender%4Operational.evtx
0 33813 -33813 2-vss_2-Security.evtx
0 1864 -1864 2-vss_3-Microsoft-Windows-AppXDeployment%4Operational.evtx
0 1900 -1900 2-vss_3-Microsoft-Windows-RemoteDesktopServices-RdpCoreTS%4Operational.evtx
28308 28308 0 2-vss_3-Microsoft-Windows-Store%4Operational.evtx
0 1103 -1103 2-vss_3-Microsoft-Windows-Windows Defender%4Operational.evtx
0 33925 -33925 2-vss_3-Security.evtx
0 1864 -1864 2-vss_4-Microsoft-Windows-AppXDeployment%4Operational.evtx
0 1943 -1943 2-vss_4-Microsoft-Windows-RemoteDesktopServices-RdpCoreTS%4Operational.evtx
0 1160 -1160 2-vss_4-Microsoft-Windows-Windows Defender%4Operational.evtx
0 1943 -1943 2-vss_5-Microsoft-Windows-RemoteDesktopServices-RdpCoreTS%4Operational.evtx
3208 3208 0 2-vss_7-Application.evtx
571 570 1 2-vss_7-Microsoft-Client-Licensing-Platform%4Admin.evtx
1053 1052 1 2-vss_7-Microsoft-Windows-AppModel-Runtime%4Admin.evtx
270 270 0 2-vss_7-Microsoft-Windows-AppXDeployment%4Operational.evtx
2249 2043 206 2-vss_7-Microsoft-Windows-AppXDeploymentServer%4Operational.evtx
1533 1013 520 2-vss_7-Microsoft-Windows-AppxPackaging%4Operational.evtx
28 28 0 2-vss_7-Microsoft-Windows-Audio%4PlaybackManager.evtx
395 394 1 2-vss_7-Microsoft-Windows-DeviceSetupManager%4Operational.evtx
2437 2340 97 2-vss_7-Microsoft-Windows-GroupPolicy%4Operational.evtx
666 666 0 2-vss_7-Microsoft-Windows-Kernel-PnP%4Configuration.evtx
294 269 25 2-vss_7-Microsoft-Windows-LiveId%4Operational.evtx
388 387 1 2-vss_7-Microsoft-Windows-Security-LessPrivilegedAppContainer%4Operational.evtx
873 873 0 2-vss_7-Microsoft-Windows-Security-Mitigations%4KernelMode.evtx
681 681 0 2-vss_7-Microsoft-Windows-Storage-Storport%4Operational.evtx
23372 23372 0 2-vss_7-Microsoft-Windows-Store%4Operational.evtx
437 436 1 2-vss_7-Microsoft-Windows-TaskScheduler%4Maintenance.evtx
1200 424 776 2-vss_7-Microsoft-Windows-WMI-Activity%4Operational.evtx
1003 379 624 2-vss_7-Microsoft-Windows-Windows Defender%4Operational.evtx
244 244 0 2-vss_7-Microsoft-Windows-WindowsUpdateClient%4Operational.evtx
617 616 1 2-vss_7-Microsoft-Windows-Winlogon%4Operational.evtx
119 53 66 2-vss_7-Security.evtx
1160 1160 0 2-vss_7-System.evtx
from evtx.
I created some new comparisons. I extracted the files out using Kape (Windows API) which leads to cleaner EVTX files.
You can see the tool output and comparisons here: https://www.dropbox.com/s/pzgcsnnbw3jb6bh/DEFCON_2018_DESKTOP_EVTX_COMPARISON_201004241442.zip?dl=0
A comparison cart can be found in DEFCON_2018_DESKTOP_EVTX_COMPARISON.zip\output_comparison.xlsx. These numbers are generated by counting <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
in the output and is not a guarantee of completeness.
You can find the raw files used for parsing here: https://www.dropbox.com/s/0vejq9lsjq1cskq/DEFCON_2018_DESKTOP_KAPE_EVTX_SET.zip?dl=0
This is a script I made to compare missing unique record ids (you may find it useful):
import re
import sys
XML_CHUNK = re.compile(r"<Event (.*?)</Event>", re.S)
def main():
source_file_1 = sys.argv[1]
source_file_2 = sys.argv[2]
source_events_1 = []
source_events_2 = []
with open(source_file_1, "rb") as fh:
mm = fh.read()
c = 0
for match in re.finditer(r"<Event (.*?)<EventRecordID>(\d{1,})</EventRecordID>(.*?)</Event>", mm, re.S):
source_events_1.append(
int(match.group(2), 10)
)
c += 1
with open(source_file_2, "rb") as fh:
mm = fh.read()
c = 0
for match in re.finditer(r"<Event (.*?)<EventRecordID>(\d{1,})</EventRecordID>(.*?)</Event>", mm, re.S):
source_events_2.append(
int(match.group(2), 10)
)
c += 1
event_diff = list(set(source_events_1) - set(source_events_2))
print(event_diff)
if __name__ == "__main__":
main()
from evtx.
I updated with new numbers using a new build of EvtxECmd.exe: https://www.dropbox.com/s/pzgcsnnbw3jb6bh/DEFCON_2018_DESKTOP_EVTX_COMPARISON_201004241442.zip?dl=0
I added new columns for valid Record IDs. It looks like some of the records that are being output are empty. This generally appears to be the last record, but not always.
EvtxECmd.exe - Total Count: 452453
evtx_dump.exe - Total Count: 447382
from evtx.
Also, you can find the script I am using to generate the numbers here: https://gist.github.com/devgc/c8db383e1340e629aafb35e86b03eebf
from evtx.
@devgc a lot of this has improved in #28
edit: now version 0.2.4
from evtx.
Its looking good.
Here is an EVTX where evtx_dump is missing two records that EvtxECmd is retrieving.
https://www.dropbox.com/s/ff92c6isb7rog1h/EvtxIssue_161-968.zip?dl=0
The missing EventRecordIDs are 161 and 968.
It almost looks like it is an issue when there is the <RenderingInfo> tags?
from evtx.
It was an issue with multiple inner EOF markers.
It only occurred with the XML output, JSON was OK though.
I've added a test, and relaxed the parser in #31
from evtx.
Here is another file I found that evtx_dump was unable to grab records from. EvtxECmd was able to parse 12.
https://www.dropbox.com/s/4c4qxthcc803a3q/EvtxIssue002.zip?dl=0
from evtx.
@forensicmatt this is an interesting one.
It's has some really messed up things inside the fragments that will need additional recovery methods in the parser.
libevtx
actually exports 24 records from this sample, I'll work on it.
from evtx.
@forensicmatt I've added a small fix that allows parsing those same 12 records.
They are incomplete in both this parser and Evtxecmd
, it seems that libevtx
is able to produce a more complete document.
So I'm still looking into it
from evtx.
@forensicmatt I've tracked down the problem.
It's related to usage of sized binary xml in substitutions.
EvtxECmd was actually printing malformed records.
It's fixed in #33, and I will merge soon.
The parser now returns 335 records, with 5 still left as errors because of ProcessingInstructionTarget
tokens which are yet unimplemented.
from evtx.
@forensicmatt released 0.3.0 with this fix!
from evtx.
so i would say you should be emitting valid XML, that is, xml that an xml parser will not complain about. If you cannot take the XML and drop it into something that can parse it without error, its wrong, so if the libevtx project is just printing what LOOKS like XML,it would be incorrect and should be escaped.
my parser will fail the record if the XML cannot be loaded by the .net xmldocument class (fail early, fail often) so we know when there is some goofy character or escape that needs to happen in there
i would love to have the goofy files to improve my parser as well. there are certainly some strange things in some logs that are edge cases, but as we find them they can be addressed.
from evtx.
@forensicmatt as you test these updates please let me know where things arent lining up so we can keep moving the ball forward on these
from evtx.
@forensicmatt I've tracked down the problem.
It's related to usage of sized binary xml in substitutions.
EvtxECmd was actually printing malformed records.It's fixed in #33, and I will merge soon.
The parser now returns 335 records, with 5 still left as errors because of
ProcessingInstructionTarget
tokens which are yet unimplemented.
can i see example of what was being output here?
from evtx.
@EricZimmerman what I meant that is that it had emitted data which an incorrect representation of the binary xml. Not that it's not malformed XML structure. EVTXECmd was skipping a part of the inner template, my parser was a bit stricter originally and didn't emit anything.
Please look at this sample: https://github.com/omerbenamram/evtx/blob/master/samples/E_Windows_system32_winevt_logs_Microsoft-Windows-CAPI2%254Operational.evtx
The first record should look like:
<?xml version="1.0" encoding="utf-8"?>
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-Windows-CAPI2" Guid="{5bbca4a8-b209-48dc-a8c7-b23d3e5216fb}">
</Provider>
<EventID>80</EventID>
<Version>0</Version>
<Level>4</Level>
<Task>80</Task>
<Opcode>1</Opcode>
<Keywords>0x4000000000000040</Keywords>
<TimeCreated SystemTime="2017-05-19 02:02:36 UTC">
</TimeCreated>
<EventRecordID>1</EventRecordID>
<Correlation>
</Correlation>
<Execution ProcessID="1396" ThreadID="2132">
</Execution>
<Channel>Microsoft-Windows-CAPI2/Operational</Channel>
<Computer>WIN-M5327EF98B9</Computer>
<Security UserID="S-1-5-21-1223297778-3299746493-1462173606-500">
</Security>
</System>
<UserData>
<WinVerifyTrustStart>
<EventAuxInfo ProcessName="Setup.exe">
</EventAuxInfo>
<CorrelationAuxInfo TaskId="{1CB1FE4B-D685-48FC-A3FA-42893E4C1717}" SeqNumber="1">
</CorrelationAuxInfo>
</WinVerifyTrustStart>
</UserData>
</Event>
from evtx.
oh all the capi logs are goofy for me. thats on my TODO for sure. is this an inner template with more binary xml? ill have to walk that through my IDE, but all the CAPI files look to be the same structure.
from evtx.
@EricZimmerman yes. You probably had the same bug I've had with the inner templates, look at this
from evtx.
ah, ok. event logs are the worst format! hehe
from evtx.
can you link me to the place in your code where you handle this situation?
from evtx.
@EricZimmerman https://github.com/omerbenamram/evtx/blob/master/src/binxml/tokens.rs#L218,
notice I keep an extra bit of state if coming from inner templates.
from evtx.
The numbers look good. I think at this point I am okay with closing this issue and if I find any more missing records I will open individual issues.
from evtx.
@forensicmatt thanks again for everything!
from evtx.
Related Issues (20)
- pip3 install evtx lead to "ERROR: Command errored out with exit status 1:" HOT 2
- how to use "descending order" parser
- Parser fails if last_event_record_id and free_space_offset are set wrong in the Chunk Header
- # in JSON field name prevents import in GCP Bigquery HOT 2
- Invalid behaviour when parsing Evtx from Windows Event Forwarding HOT 8
- Error while parsing .evtx files with unknown file header flags and chunk flags
- 5111875 is an unknown value for bool, coercing to `true`
- [Question] Alter JSON output HOT 5
- thread 'main' panicked at 'invalid or out-of-range date' HOT 1
- macos 0.7.2 HOT 1
- Warnings will become errors
- Command line flag to skip printing "#attributes" while taking output as JSON
- thread 'main' panicked at 'attempt to subtract with overflow' in `src/binxml/tokens.rs:98:24`
- Problems parsing evtx files originating from NetApp HOT 2
- error on evtx files for header and hexdump
- tailing HOT 2
- Any options to exclude the record # and xml version lines? HOT 1
- RecordId should be public
- Passing a file via stdin?
- wrong ordering in records returned by records() iterator HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from evtx.