Coder Social home page Coder Social logo

Memleak in v0.9.3-v0.9.4 about edgevpn HOT 10 CLOSED

mudler avatar mudler commented on July 17, 2024
Memleak in v0.9.3-v0.9.4

from edgevpn.

Comments (10)

vyzo avatar vyzo commented on July 17, 2024 1

I am not sure what could be causing a memleak in libp2p.

The main new feature in 0.18 (btw, rc4 is out) is the resource manager, which is a very desirable feature.
Note that the default rcmgr limits are rather conservative, so you may have to adjust them.

Still, there shouldn't be any memory leaks there.

from edgevpn.

vyzo avatar vyzo commented on July 17, 2024 1

can you take a goroutine dump and see if there is runaway goroutine buildup?

from edgevpn.

mudler avatar mudler commented on July 17, 2024

v0.9.5 reverts to libp2p 0.17, gonna check out if seeing this issue again with that one

from edgevpn.

mudler avatar mudler commented on July 17, 2024

thanks a lot for the inputs @vyzo ! going to check out on my end. downgrading temp to v0.17.x made the issue disappear, so I'm going to give it a deeper look and let you know

from edgevpn.

mudler avatar mudler commented on July 17, 2024

I've re-tested with 0.18 rc4 and the issue seems still to persist, I've also made sure here all streams are closed after each connection to be sure that is not something on that end. Besides gossip pub/sub that's the only code path which creates streams.

I can still see a constant memory growth which is not being released, the pprof stack points me to quic-go, so I'm trying to figure out if quic-go/quic-go@v0.24.0...v0.25.0 might had any impact.

I'm collecting pprof runs for a few to trigger hopefully OOM

from edgevpn.

mudler avatar mudler commented on July 17, 2024

can you take a goroutine dump and see if there is runaway goroutine buildup?

mmh I'm not sure if that's expected, but looks like swarm has created a lot of goroutines and parking them:

❯ go tool pprof -top collect/goroutine631
File: edgevpn
Type: goroutine
Time: Feb 20, 2022 at 3:31pm (CET)
Showing nodes accounting for 12085, 100% of 12086 total
Dropped 202 nodes (cum <= 60)
      flat  flat%   sum%        cum   cum%
     12085   100%   100%      12085   100%  runtime.gopark
         0     0%   100%        257  2.13%  github.com/libp2p/go-libp2p-discovery.Advertise.func1
         0     0%   100%        108  0.89%  github.com/libp2p/go-libp2p-quic-transport.(*conn).AcceptStream
         0     0%   100%        158  1.31%  github.com/libp2p/go-libp2p-swarm.(*Conn).start.func1
         0     0%   100%      11031 91.27%  github.com/libp2p/go-libp2p-swarm.(*Swarm).dialWorkerLoop
         0     0%   100%        156  1.29%  github.com/libp2p/go-libp2p/p2p/protocol/identify.(*peerHandler).loop
         0     0%   100%        111  0.92%  github.com/lucas-clemente/quic-go.(*client).dial.func1
         0     0%   100%        108  0.89%  github.com/lucas-clemente/quic-go.(*incomingBidiStreamsMap).AcceptStream
         0     0%   100%        111  0.92%  github.com/lucas-clemente/quic-go.(*sendQueue).Run
         0     0%   100%        108  0.89%  github.com/lucas-clemente/quic-go.(*session).AcceptStream
         0     0%   100%        111  0.92%  github.com/lucas-clemente/quic-go.(*session).run
         0     0%   100%        111  0.92%  github.com/lucas-clemente/quic-go.(*session).run.func1
         0     0%   100%        108  0.89%  github.com/lucas-clemente/quic-go.(*streamsMap).AcceptStream
         0     0%   100%         64  0.53%  internal/poll.(*pollDesc).wait
         0     0%   100%         60   0.5%  internal/poll.(*pollDesc).waitRead
         0     0%   100%         64  0.53%  internal/poll.runtime_pollWait
         0     0%   100%         64  0.53%  runtime.netpollblock
         0     0%   100%      11992 99.22%  runtime.selectgo
~/prof
❯ go tool pprof -top collect/goroutine626
File: edgevpn
Type: goroutine
Time: Feb 20, 2022 at 3:29pm (CET)
Showing nodes accounting for 11474, 100% of 11475 total
Dropped 152 nodes (cum <= 57)
      flat  flat%   sum%        cum   cum%
     11474   100%   100%      11474   100%  runtime.gopark
         0     0%   100%        255  2.22%  github.com/libp2p/go-libp2p-discovery.Advertise.func1
         0     0%   100%         59  0.51%  github.com/libp2p/go-libp2p-quic-transport.(*conn).AcceptStream
         0     0%   100%         74  0.64%  github.com/libp2p/go-libp2p-swarm.(*Conn).start.func1
         0     0%   100%      10833 94.41%  github.com/libp2p/go-libp2p-swarm.(*Swarm).dialWorkerLoop
         0     0%   100%         73  0.64%  github.com/libp2p/go-libp2p/p2p/protocol/identify.(*peerHandler).loop
         0     0%   100%         59  0.51%  github.com/lucas-clemente/quic-go.(*client).dial.func1
         0     0%   100%         59  0.51%  github.com/lucas-clemente/quic-go.(*incomingBidiStreamsMap).AcceptStream
         0     0%   100%         59  0.51%  github.com/lucas-clemente/quic-go.(*sendQueue).Run
         0     0%   100%         59  0.51%  github.com/lucas-clemente/quic-go.(*session).AcceptStream
         0     0%   100%         59  0.51%  github.com/lucas-clemente/quic-go.(*session).run
         0     0%   100%         59  0.51%  github.com/lucas-clemente/quic-go.(*session).run.func1
         0     0%   100%         59  0.51%  github.com/lucas-clemente/quic-go.(*streamsMap).AcceptStream
         0     0%   100%      11428 99.59%  runtime.selectgo

profile002

Note I'm taking snaps from the same process that I left there while collecting mem pprofs, here are the pprofs (there are also the goroutine included too):
heaps.zip

from edgevpn.

mudler avatar mudler commented on July 17, 2024

I have to note that the issue seems more noticeable when the node is reachable from outside via public IP. When the node is behind NAT it seems the memory growth happens more slowly

from edgevpn.

vyzo avatar vyzo commented on July 17, 2024

There are 11k dialWorkerLoop goroutines... not good.

cc @marten-seemann seems we have a problem here.

from edgevpn.

mudler avatar mudler commented on July 17, 2024

Ok, seems it's enough to run it for few minutes to notice the accumulation going on:

❯ go tool pprof -top collect/goroutine2
File: edgevpn
Type: goroutine
Time: Feb 20, 2022 at 4:09pm (CET)
Showing nodes accounting for 74, 100% of 74 total
      flat  flat%   sum%        cum   cum%
        72 97.30% 97.30%         72 97.30%  runtime.gopark
         1  1.35% 98.65%          7  9.46%  internal/poll.(*pollDesc).wait
         1  1.35%   100%          1  1.35%  runtime/pprof.runtime_goroutineProfileWithLabels
         0     0%   100%          1  1.35%  github.com/ipfs/go-log/writer.(*MirrorWriter).logRoutine
         0     0%   100%          4  5.41%  github.com/jbenet/goprocess.(*process).Go.func1
         0     0%   100%          1  1.35%  github.com/jbenet/goprocess/context.CloseAfterContext.func1
         0     0%   100%          1  1.35%  github.com/labstack/echo/v4.(*Echo).ServeHTTP
         0     0%   100%          1  1.35%  github.com/labstack/echo/v4.(*Echo).Start
         0     0%   100%          1  1.35%  github.com/labstack/echo/v4.(*Echo).add.func1
         0     0%   100%          1  1.35%  github.com/labstack/echo/v4.tcpKeepAliveListener.Accept
         0     0%   100%          1  1.35%  github.com/libp2p/go-libp2p-connmgr.(*BasicConnMgr).background
         0     0%   100%          1  1.35%  github.com/libp2p/go-libp2p-connmgr.(*decayer).process
         0     0%   100%          1  1.35%  github.com/libp2p/go-libp2p-discovery.Advertise.func1
         0     0%   100%          1  1.35%  github.com/libp2p/go-libp2p-kad-dht.(*IpfsDHT).fixLowPeersRoutine
         0     0%   100%          1  1.35%  github.com/libp2p/go-libp2p-kad-dht.(*IpfsDHT).persistRTPeersInPeerStore
         0     0%   100%          1  1.35%  github.com/libp2p/go-libp2p-kad-dht.(*IpfsDHT).rtPeerLoop
         0     0%   100%          1  1.35%  github.com/libp2p/go-libp2p-kad-dht.(*subscriberNotifee).subscribe
         0     0%   100%          1  1.35%  github.com/libp2p/go-libp2p-kad-dht/providers.(*ProviderManager).run
         0     0%   100%          1  1.35%  github.com/libp2p/go-libp2p-kad-dht/providers.NewProviderManager.func1
         0     0%   100%          1  1.35%  github.com/libp2p/go-libp2p-kad-dht/rtrefresh.(*RtRefreshManager).loop
         0     0%   100%          2  2.70%  github.com/libp2p/go-libp2p-peerstore/pstoremem.(*memoryAddrBook).background
         0     0%   100%          8 10.81%  github.com/libp2p/go-libp2p-pubsub.(*GossipSubRouter).connector
         0     0%   100%          1  1.35%  github.com/libp2p/go-libp2p-pubsub.(*GossipSubRouter).heartbeatTimer
         0     0%   100%          1  1.35%  github.com/libp2p/go-libp2p-pubsub.(*PubSub).processLoop
         0     0%   100%          1  1.35%  github.com/libp2p/go-libp2p-pubsub.(*Subscription).Next
         0     0%   100%          8 10.81%  github.com/libp2p/go-libp2p-pubsub.(*validation).validateWorker
         0     0%   100%          4  5.41%  github.com/libp2p/go-libp2p-quic-transport.(*reuse).gc
         0     0%   100%          1  1.35%  github.com/libp2p/go-libp2p-resource-manager.(*resourceManager).background
         0     0%   100%          2  2.70%  github.com/libp2p/go-libp2p-swarm.(*DialBackoff).background
         0     0%   100%          3  4.05%  github.com/libp2p/go-libp2p-swarm.(*Swarm).AddListenAddr.func2
         0     0%   100%          3  4.05%  github.com/libp2p/go-libp2p-transport-upgrader.(*listener).Accept
         0     0%   100%          3  4.05%  github.com/libp2p/go-libp2p-transport-upgrader.(*listener).handleIncoming
         0     0%   100%          1  1.35%  github.com/libp2p/go-libp2p/p2p/discovery/mdns_legacy.(*mdnsService).pollForEntries
         0     0%   100%          1  1.35%  github.com/libp2p/go-libp2p/p2p/host/autonat.(*AmbientAutoNAT).background
         0     0%   100%          1  1.35%  github.com/libp2p/go-libp2p/p2p/host/autonat.(*autoNATService).background
         0     0%   100%          1  1.35%  github.com/libp2p/go-libp2p/p2p/host/autorelay.(*AutoRelay).background
         0     0%   100%          1  1.35%  github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).background
         0     0%   100%          1  1.35%  github.com/libp2p/go-libp2p/p2p/host/pstoremanager.(*PeerstoreManager).background
         0     0%   100%          1  1.35%  github.com/libp2p/go-libp2p/p2p/protocol/circuitv2/client.(*Listener).Accept
         0     0%   100%          1  1.35%  github.com/libp2p/go-libp2p/p2p/protocol/holepunch.(*Service).watchForPublicAddr
         0     0%   100%          1  1.35%  github.com/libp2p/go-libp2p/p2p/protocol/identify.(*ObservedAddrManager).worker

After few mins:

File: edgevpn                                                                                                                                                                   [107/1795]
Type: goroutine
Time: Feb 20, 2022 at 4:33pm (CET)
Showing nodes accounting for 1473, 99.86% of 1475 total
Dropped 140 nodes (cum <= 7)
      flat  flat%   sum%        cum   cum%
      1473 99.86% 99.86%       1473 99.86%  runtime.gopark
         0     0% 99.86%         27  1.83%  bufio.(*Reader).Read
         0     0% 99.86%         21  1.42%  github.com/libp2p/go-libp2p-discovery.Advertise.func1
         0     0% 99.86%         27  1.83%  github.com/libp2p/go-libp2p-noise.(*secureSession).Read
         0     0% 99.86%         27  1.83%  github.com/libp2p/go-libp2p-noise.(*secureSession).readNextInsecureMsgLen
         0     0% 99.86%          8  0.54%  github.com/libp2p/go-libp2p-pubsub.(*GossipSubRouter).connector
         0     0% 99.86%          8  0.54%  github.com/libp2p/go-libp2p-pubsub.(*validation).validateWorker
         0     0% 99.86%         57  3.86%  github.com/libp2p/go-libp2p-quic-transport.(*conn).AcceptStream
         0     0% 99.86%         84  5.69%  github.com/libp2p/go-libp2p-swarm.(*Conn).start.func1
         0     0% 99.86%          9  0.61%  github.com/libp2p/go-libp2p-swarm.(*Stream).Read
         0     0% 99.86%       1020 69.15%  github.com/libp2p/go-libp2p-swarm.(*Swarm).dialWorkerLoop
         0     0% 99.86%         27  1.83%  github.com/libp2p/go-libp2p-yamux.(*conn).AcceptStream
         0     0% 99.86%         82  5.56%  github.com/libp2p/go-libp2p/p2p/protocol/identify.(*peerHandler).loop
         0     0% 99.86%         27  1.83%  github.com/libp2p/go-yamux/v3.(*Session).AcceptStream
         0     0% 99.86%         27  1.83%  github.com/libp2p/go-yamux/v3.(*Session).recv
         0     0% 99.86%         27  1.83%  github.com/libp2p/go-yamux/v3.(*Session).recvLoop
         0     0% 99.86%         27  1.83%  github.com/libp2p/go-yamux/v3.(*Session).send
         0     0% 99.86%         27  1.83%  github.com/libp2p/go-yamux/v3.(*Session).sendLoop
         0     0% 99.86%         57  3.86%  github.com/lucas-clemente/quic-go.(*client).dial.func1
         0     0% 99.86%          9  0.61%  github.com/lucas-clemente/quic-go.(*closedLocalSession).run
         0     0% 99.86%         57  3.86%  github.com/lucas-clemente/quic-go.(*incomingBidiStreamsMap).AcceptStream
         0     0% 99.86%         57  3.86%  github.com/lucas-clemente/quic-go.(*sendQueue).Run
         0     0% 99.86%         57  3.86%  github.com/lucas-clemente/quic-go.(*session).AcceptStream
         0     0% 99.86%         57  3.86%  github.com/lucas-clemente/quic-go.(*session).run
         0     0% 99.86%         57  3.86%  github.com/lucas-clemente/quic-go.(*session).run.func1
         0     0% 99.86%         57  3.86%  github.com/lucas-clemente/quic-go.(*streamsMap).AcceptStream
         0     0% 99.86%          8  0.54%  github.com/mudler/edgevpn/pkg/vpn.connectionWorker
         0     0% 99.86%          9  0.61%  github.com/multiformats/go-varint.ReadUvarint
         0     0% 99.86%         29  1.97%  internal/poll.(*FD).Read
         0     0% 99.86%         35  2.37%  internal/poll.(*pollDesc).wait
         0     0% 99.86%         35  2.37%  internal/poll.(*pollDesc).waitRead (inline)
         0     0% 99.86%         35  2.37%  internal/poll.runtime_pollWait
         0     0% 99.86%         27  1.83%  io.ReadAtLeast
         0     0% 99.86%         27  1.83%  io.ReadFull (inline)
         0     0% 99.86%         28  1.90%  net.(*conn).Read
         0     0% 99.86%         28  1.90%  net.(*netFD).Read
         0     0% 99.86%         17  1.15%  runtime.chanrecv
         0     0% 99.86%         11  0.75%  runtime.chanrecv2
         0     0% 99.86%         35  2.37%  runtime.netpollblock

collect2.zip

from edgevpn.

mudler avatar mudler commented on July 17, 2024

Closing this, it was fixed upstream in libp2p/go-libp2p-swarm#315. master now is at libp2p 0.18.0-rc5

from edgevpn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.