Coder Social home page Coder Social logo

fd leak about libevent HOT 6 CLOSED

libevent avatar libevent commented on September 23, 2024
fd leak

from libevent.

Comments (6)

nmathewson avatar nmathewson commented on September 23, 2024

I'm probably missing something, but I don't see any libevent code in that example. What is making you conclude that this is a libevent bug? This all looks like Thrift code to me.

from libevent.

zmldndx avatar zmldndx commented on September 23, 2024

oh, yes, the most simple code of thrift code using nonblocking server, and nonblocking server uses libevent, maybe it's thrift-0.8.0's bug, i cannot figure out.

from libevent.

seregasheypak avatar seregasheypak commented on September 23, 2024

We have the same effect, 31000 opened descriptors, all of them are
java 36452 flume 4274u sock 0,6 0t0 943856105 can't identify protocol
We use thrift server in flume

from libevent.

zmldndx avatar zmldndx commented on September 23, 2024

i figure out it's thrift's bug, please see apache/thrift#70

from libevent.

logicouter avatar logicouter commented on September 23, 2024

I have resolved this problem by increasing the number of io threads and worker threads.

from libevent.

errzey avatar errzey commented on September 23, 2024

usleep is, and always will be THE BIGGEST HACK EVER when it comes to non-blocking code. I mean, the lack of sleep is the whole reason an application is called "non-blocking".

How about nanosleep, OMG, that fixes it too!

The reality is: at a sustained (bursts) of data, check out top(1), your application and threads sitting in the "D" state, and your CPU's with a high amount of wait states.. This is not a fix, this is a hack of hacks, bound to backfire on you at any time. 👎.

If this was a client application with that "sleep" fix, and even if 1024-65535 was your ephemeral setting (which is never the case in any OS's by default), you will easily max them all out, using every single port.

If I still worked at the company that shall not be named, who maintained thrift, and saw such a thing, I would... Well I wouldn't do anything.... I wasn't allowed to work on Libevent..

My old coworker wrote an entire article on this after pointing out, and schooling her

https://medium.com/@fun_cuddles/opening-files-in-node-js-considered-harmful-d7de566d499f#.2k8z5v9a0

I reiterate

SLEEP_IS_NEVER_THE_SOLUTION

But I also maintain

Thrift is awesome cool, but when reading through the source code (FB version), there were a lot of "what in the..?"'s. The version I (attempted) to work on still relied on libevent 1.0.x, and a huge clustr-f of wrappers to make it do strange things that could have been easily done in 2.x,

Then again this was the company that rejected a small patch to thrift that increased server performane 150%: https://gist.github.com/ellzey/01150710fe6001dfe9c5 to which the answer was "We don't care".

If your system is unable to maintain high levels of connections, and the box becomes finicky, write code to fix it:

  • use rate-limiting per-connction.
  • use rate-limiting groups.
  • do not blindly accept() new connections, you don't have to, let them wait when the system is overwhelmed..
  • Turn off EV_READ if a single connection has a massive amount of pending data. Use Libevent's write_cb to inform you when everything has been successfully written, then go ahead and enable EV_READ again.
  • Muck around with sysctl settings.
  • Just because you set a backlog of 532423423 doesn't mean everything will work.
  • Starting up a bunch of new threads is just going to delay the time in which all of the bad things happen (on a default install) once again, not a fix, but a bandaid.
  • Dynamically tune the way your application works, even if that means adapting the nice levels at the tpid / parent level, could result in good numbers, and a lower footprint.

When I was writing Mandiant's reverse proxy (https://github.com/mandiant/RProxy/tree/feature/ratelimiting/src), an external (very skilled) security auditing company made the assertion that they were, after many attempts, unable to take the server down (otherwise known as DoS) due to the many methods I used to stop "fast client, slow server", and "fast server, slow client" conditions. The "unable to DoS" part just so happened to be a side-effect of that.

There is only so much memory the kernel will give you for both RX and TX, even with the "sleep" fix, check out netstat, checkout the first two columns: those are the pending recv, and send queues. $5 says they all have big numbers.

But then again, the first thing I always do is make sure that file descriptors never leaked in forked or threaded code is set O_CLOEXEC on each file descriptor. . If you need to share a file descriptor across autonomous processes every so often, there is always CMSG. But this, just like that patch, is a sign of poor design, unless you are doing privsep or something.

As for the syscall tuning i mentioned early, this really is al about disabling things like nagle (if you ar expecting small polling type IO), increasing kerne buffer sizes, and most importantly in this case, lowering the FIN_WAIT timers.

That "fix" you posted makes me die a little inside.

from libevent.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.