Coder Social home page Coder Social logo

Comments (6)

orsenthil avatar orsenthil commented on June 5, 2024

Thanks for the report and the Pull Request. Have you done any measurements with and without this change? Could you share the differences?

from amazon-vpc-cni-k8s.

GnatorX avatar GnatorX commented on June 5, 2024

Not yet. Will update once I have tested this

from amazon-vpc-cni-k8s.

GnatorX avatar GnatorX commented on June 5, 2024

@orsenthil I am wondering if it make sense to even cache nodes. K8s caches which usesList + watches on startup are extremely expensive calls. The CNI only cares about the node it is running on and calls with node name is index from k8s side which is relatively fast. Rather than filtering, why not just use non-cached calls get that information?

The availability difference isn't that high, watches vs a call.

from amazon-vpc-cni-k8s.

GnatorX avatar GnatorX commented on June 5, 2024

I took a pprof of the issue.
Screenshot 2024-04-29 at 10 30 58 AM

It seems like the issue is with the stream watcher is consuming memory during cluster size increase. It seems to require quite a bit of memory in order to process all nodes and store it in the memory. Even though the memory consumption isn't very high, its still unnecessary to store all node information in cache.

I need to re-test this with my change however I do believe the real solution is to avoid performing list watch against all nodes and only watch for node events specific to the CNI.

from amazon-vpc-cni-k8s.

orsenthil avatar orsenthil commented on June 5, 2024

K8s caches which usesList + watches on startup are extremely expensive calls

Even though the memory consumption isn't very high, its still unnecessary to store all node information in cache.

I do believe the real solution is to avoid performing list watch against all nodes and only watch for node events specific to the CNI.

It is pretty standard for k8s client calls to use the cached client. It will be good to measure difference in the memory usage and the performance of the various operations in the large clusters before we decide to not use the cache.

With your changes, if you see any different in both memory and performance, please share an update here.

from amazon-vpc-cni-k8s.

GnatorX avatar GnatorX commented on June 5, 2024

It is pretty standard for k8s client calls to use the cached client. It will be good to measure difference in the memory usage and the performance of the various operations in the large clusters before we decide to not use the cache.

Agreed.

When I tested my changes, it didn't yield significant difference in memory utilization. I believe, as shown in the pprof, the memory usage is because of the stream watcher attempting unmarshal incoming data. I think rather than using a informer cache and raw watch against the node itself may be more efficient(?).

I can close to issue for now since I likely don't have time to look into writing a direct watcher instead and I think the memory spike isn't large enough to be a concern.

from amazon-vpc-cni-k8s.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.