williballenthin / process-forest Goto Github PK
View Code? Open in Web Editor NEWReconstruct process trees from event logs
License: Apache License 2.0
Reconstruct process trees from event logs
License: Apache License 2.0
I have a few evtx files were the record_xml
being passed to to_lxml()
is unicode, with no odd characters. LXML (3.7.3) spits the following out:
$ python process-forest/src/process_forest.py SYSMON%4OPERATIONAL.EVTX summary
INFO:process-forest.global:using evtx log file
DEBUG:Evtx.Evtx:FILE HEADER at 0x0.
DEBUG:Evtx.Evtx:CHUNK HEADER at 0x1000.
DEBUG:Evtx.Evtx:Record at 0x1200.
Traceback (most recent call last):
File "process-forest/src/process_forest.py", line 483, in <module>
main()
File "process-forest/src/process_forest.py", line 461, in main
analyzer.analyze(get_entries_with_eids(evtx, set([4688, 4689, 1, 5])))
File "process-forest/src/process_forest.py", line 211, in analyze
for entry in entries:
File "process-forest/src/process_forest.py", line 193, in get_entries_with_eids
for entry in get_entries(evtx):
File "process-forest/src/process_forest.py", line 183, in get_entries
yield Entry(xml, record)
File "process-forest/src/process_forest.py", line 73, in __init__
self._node = to_lxml(self._xml)
File "process-forest/src/process_forest.py", line 25, in to_lxml
record_xml.replace("xmlns=\"http://schemas.microsoft.com/win/2004/08/events/event\"", ""))
File "src/lxml/lxml.etree.pyx", line 3213, in lxml.etree.fromstring (src/lxml/lxml.etree.c:79010)
File "src/lxml/parser.pxi", line 1843, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:118282)
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
One possible solution, at least for this instance, is to wrap return in to_lxml()
, catch the ValueError & manually set the etree.XMLParser
to use utf-8
encoding and manually encode record_lxml.replace()
as well
...
except ValueError:
utf8_parser = etree.XMLParser(encoding='utf-8')
return etree.fromstring("<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"yes\" ?>%s" % record_xml.replace("xmlns=\"http://schemas.microsoft.com/win/2004/08/events/event\"", "").encode('utf-8'), parser=utf8_parser)
or just manually encode w/o defining a new parser
...
return etree.fromstring("<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"yes\" ?>%s" % record_xml.replace("xmlns=\"http://schemas.microsoft.com/win/2004/08/events/event\"", "").encode('utf-8'))
(I have made some other changes to the code before testing this, so I am not 100% sure this is correct, but it should be)
When main
is asked to serialize a file, it creates a file with "wb"
mode and gives it to ProcessTreeAnalyzer.serialize
as an argument. But ProcessTreeAnalyzer.serialize
tries to write the result of json.dumps
to it, while (according to the documentation) it returns a string, which results in the following error:
Traceback (most recent call last):
File "process-forest/src/process_forest.py", line 504, in <module>
main()
File "process-forest/src/process_forest.py", line 498, in main
analyzer.serialize(f)
File "process-forest/src/process_forest.py", line 339, in serialize
f.write(s)
TypeError: a bytes-like object is required, not 'str'
Dear,
With the minute patch in place where a b is inserted on line 25 i now run into unexpected output.
INFO:process-forest.global:using evtx log file
DEBUG:Evtx.Evtx:FILE HEADER at 0x0.
DEBUG:Evtx.Evtx:CHUNK HEADER at 0x1000.
DEBUG:Evtx.Evtx:Record at 0x1200.
DEBUG:Evtx.Evtx:Record at 0x1c10.
DEBUG:Evtx.Evtx:Record at 0x20b8.
DEBUG:Evtx.Evtx:Record at 0x2370.
DEBUG:Evtx.Evtx:Record at 0x2a80.
....
DEBUG:Evtx.Evtx:Record at 0x10fc8.
DEBUG:Evtx.Evtx:CHUNK HEADER at 0x11000.
DEBUG:Evtx.Evtx:Record at 0x11200.
.....
this with the latest master version of process-forest
It would be nice to see a setup.py and possible see this published to the python package repo. At the very least it would make it install-able via pip using git+ssh.
Thanks.
Please add pytz to requirements.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.