Comments (6)
Not sure if you got a chance to look at this but a look through the code and some debugs it looks like the slowdown is in asyncio; more specifically the stream reader below. Are you seeing the same? Thanks
new_data = yield From(stream.read(self.DEFAULT_BATCH_SIZE))
from pyshark.
I've backed down to using older rev of pyShark (0.2.6), prior to asyncio, and see vastly better performance although some other things seem broken like frame_info, display filter on capture and getting certain field matches to work. I was able to process 420K packets in under 8 minutes. Thanks
-- Packet Processing Progress Report --
Runtime: 0:07:21.040105 -> Packet Count: [420000]
Processing Loop Time (Single Packet): 0:00:00.001523
BPDU Accounting Dictionary Size: [6463] entries
from pyshark.
Yes, it seems asyncio severely reduced performance (on unix anyway, I'm
oddly seeing good performance on windows).
I'll try optimizing it over the weekend hopefully, and if push comes to
shove ill remove asyncio. Conceptually it should not have lower
performance, but we'll see.
On Wednesday, December 17, 2014, chrconlo [email protected] wrote:
I've backed down to using older rev of pyShark (0.2.6), prior to asyncio,
and see vastly better performance although some other things seem broken
like frame_info, display filter on capture and getting certain field
matches to work. I was able to process 420K packets in under 8 minutes.
Thanks-- Packet Processing Progress Report --
Runtime: 0:07:21.040105 -> Packet Count: [420000]
Processing Loop Time (Single Packet): 0:00:00.001523
BPDU Accounting Dictionary Size: [6463] entries—
Reply to this email directly or view it on GitHub
#48 (comment).
from pyshark.
I've (probably) isolated the problem to: https://github.com/KimiNewt/pyshark/blob/master/src/pyshark/capture/capture.py#L147
What seems to be happening is that a large amount of data (tshark XML) is in the subprocess stdout pipe. We read that one packet at a time, and the XML grows faster and larger as time goes on.
That line copies over what might be a very large string. That took ~40ms on a large cap file I tried.
The solution is probably to extract ALL the packets at once from the data received instead of one at-a-time (we can't use lxml for this as it does not support parsing partial XMLs). I'll try finding a solution for that.
from pyshark.
Interesting. Any luck finding a solution for this? Thanks
from pyshark.
Fixed by PR #66
from pyshark.
Related Issues (20)
- FileCapture: accept slicing
- Capture File only_summaries=True consistently yields a single packet despite multiple packets HOT 2
- Terminate tshark instead of killing HOT 1
- Error while reading DVB-S2 layers
- tshark export-objects command wrapper
- Unable to read the captured Pcap Files
- field showname is none
- Printing a packet with use_json=True fails
- Why can't pyshark get file content from the ftp-data package? HOT 1
- `apply_on_packets` not terminating after `packet_count` HOT 2
- data._all_fields is not working with use_ek HOT 1
- Cannot disable Thrift protocol
- No captures or accompanying errors in Ubuntu HOT 1
- Windows: no running event loop, when using pyshark.FileCapture() HOT 3
- Difference between Packet length and captured_length attributes
- Unable to access certain parts in Wi-Fi6
- Lack of proper Documentatuon HOT 2
- Getting UnknownInterfaceException in docker while tshark works
- get all dns answers from a dns packet HOT 2
- Wireshark to Neo4j for analysis
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyshark.