Coder Social home page Coder Social logo

Comments (6)

chrconlo avatar chrconlo commented on July 21, 2024

Not sure if you got a chance to look at this but a look through the code and some debugs it looks like the slowdown is in asyncio; more specifically the stream reader below. Are you seeing the same? Thanks

new_data = yield From(stream.read(self.DEFAULT_BATCH_SIZE))

from pyshark.

chrconlo avatar chrconlo commented on July 21, 2024

I've backed down to using older rev of pyShark (0.2.6), prior to asyncio, and see vastly better performance although some other things seem broken like frame_info, display filter on capture and getting certain field matches to work. I was able to process 420K packets in under 8 minutes. Thanks

-- Packet Processing Progress Report --
Runtime: 0:07:21.040105 -> Packet Count: [420000]
Processing Loop Time (Single Packet): 0:00:00.001523
BPDU Accounting Dictionary Size: [6463] entries

from pyshark.

KimiNewt avatar KimiNewt commented on July 21, 2024

Yes, it seems asyncio severely reduced performance (on unix anyway, I'm
oddly seeing good performance on windows).
I'll try optimizing it over the weekend hopefully, and if push comes to
shove ill remove asyncio. Conceptually it should not have lower
performance, but we'll see.

On Wednesday, December 17, 2014, chrconlo [email protected] wrote:

I've backed down to using older rev of pyShark (0.2.6), prior to asyncio,
and see vastly better performance although some other things seem broken
like frame_info, display filter on capture and getting certain field
matches to work. I was able to process 420K packets in under 8 minutes.
Thanks

-- Packet Processing Progress Report --
Runtime: 0:07:21.040105 -> Packet Count: [420000]
Processing Loop Time (Single Packet): 0:00:00.001523
BPDU Accounting Dictionary Size: [6463] entries


Reply to this email directly or view it on GitHub
#48 (comment).

from pyshark.

KimiNewt avatar KimiNewt commented on July 21, 2024

I've (probably) isolated the problem to: https://github.com/KimiNewt/pyshark/blob/master/src/pyshark/capture/capture.py#L147
What seems to be happening is that a large amount of data (tshark XML) is in the subprocess stdout pipe. We read that one packet at a time, and the XML grows faster and larger as time goes on.
That line copies over what might be a very large string. That took ~40ms on a large cap file I tried.

The solution is probably to extract ALL the packets at once from the data received instead of one at-a-time (we can't use lxml for this as it does not support parsing partial XMLs). I'll try finding a solution for that.

from pyshark.

chrconlo avatar chrconlo commented on July 21, 2024

Interesting. Any luck finding a solution for this? Thanks

from pyshark.

KimiNewt avatar KimiNewt commented on July 21, 2024

Fixed by PR #66

from pyshark.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.