Coder Social home page Coder Social logo

Comments (12)

debugloop avatar debugloop commented on July 25, 2024

Hi, I've some hints on my usage, no concrete answers, sorry :)

So far, goflow has been handling any amount of flows I've directed at it with a single instance, which is not saying much since thats been 1:32 sampled NFv9 of a regional research and education provider with a dozen ASR9k routers. Still, I am using https://github.com/sleinen/samplicator (developed at another research network) which is a kind of UDP multiplexer that supports source address spoofing for the datagrams. You'll need to set cap_net_raw+eip on it's binary to allow and maybe increase the host's receive buffer sysctl net.core.rmem_max. This would however not help you if you'll try to do some kind of round robin multiplexing on a single router's Netflow, as the samplicator only supports directing spoofed UDP by it's source address. I'm using it with a single goflow instance anyways, the multiplexing is for other Netflow collectors.

If you really have just the one router and flows from a single source address, goflow will probably be fine. If you're still concerned about goflows performance, this router most likely has different flow exporting interfaces, or even just different exporter maps on a single interface. These will use (speaking for Cisco here, but it'll be similar for other vendors) distinct source ports while having the same source address, making some kind of round robin routing thingie possible using iptables. I've never done that before, just saying you should be able to use the source port and save the nginx (which will need to spoof it's datagrams, dunno how to do that).

Edit: Just for completeness' sake: You could of course configure the routers to use different targets for the flows, but I figure you've opened the issue because that's not possible in your setting?

from goflow.

lspgn avatar lspgn commented on July 25, 2024

Responding a bit late:

You may put a load-balancer but since NetFlow (v9)/IPFIX are stateful and using the source IP in the UDP packet.
Ideally for that case, you want the samples from a same router arriving on the same collector. This leads to hashed load-balancing which may not always be perfectly equal.

What I'd suggest would be: benchmarking the decoding times GoFlow is taking using the metric flow_summary_decoding_time_us (Prometheus endpoint is /metrics). The decoding can be parallelized on multiple cores using a thread pool (you can pass the -workers N CLI argument to use N threads).
Let's say the decoding of one sample is 30ยตS this mean you can decode around 33k samples per second with one worker. If you have 8 cores, configured 8 workers, that's around 250k samples per second.

I have not tried @debugloop's suggestions but you can tune your OS receive buffers to handle sudden increase in flows without dropping them or dedicate more CPU to those tasks.

Potential solutions or suggestions:

  • Look into sFlow: stateless, embeds router address inside the sample.
  • Have ECMP doing the load-balancing (this is what we use, with the same service IPs present on multiple devices).
  • Extend the code in order to push the UDP packets into Kafka and decouple decoding from the real-time reception required (also useful to recreate duplication without the UDP overhead).
  • Use different collectors addresses for different routers (Daniel's suggestion, not so advised to avoid "snowflakes" in the configuration)

Let me know if that helps.
The solutions listed above may not necessarily apply to your case @MIKNOTAURO but we will require more information/limitations to answer more accurately.

from goflow.

debugloop avatar debugloop commented on July 25, 2024

I actually have a service address configured on my goflow instance, but I've been reluctant to add a second instance. Will goflow keep flow records for which no template was received yet long enough? Or will some flows drop in the event of a change of routing from a router to another goflow instance?

Still, I'll spin up some more goflow instances at some point for redundancy, if not with equal cost then with different metrics as hot standby. ECMP will be hard in our case anyways, as we're BGP-only.

I talked about this with a colleague from the IP team just now, and he's had this idea of running goflow on the router, as our chassis apparently support spawning containers. He wasn't being serious tho...

Also I'd like to contest the "snowflakes" argument ๐Ÿ˜„ The router configurations will all be the same, pointing their exporter map to either a samplicator service which multiplexes/spoofs UDP or a host running iptables. The goflow configurations may also be equal on any hosts receiving Netflow from either multiplexing variant. Granted, iptables will be kind of fiddly, and the samplicator is my first choice only because we require the raw Netflow for other collectors.

from goflow.

lspgn avatar lspgn commented on July 25, 2024

Interesting use-case.
What do you mean by BGP only?

At the moment, it's just dropping the samples as it generates an error (template not found).
I would be possible with some extensive modifications (send the flow to be resampled later on, or processing to be triggered on a template received).

I thought about that in the past to avoid the 10 minutes of cold start.
There's an http endpoint for /templates to access the definition of samples.
I thought about some kind of synchronization as well from a static JSON dump of the templates but never had enough incentive to code it.

I spoke too fast by saying "snowflake configurations", also taking my use-case where we try to have the configurations that are the same everywhere. Nothing that's impossible to manage with good automation :) .

from goflow.

MIKNOTAURO avatar MIKNOTAURO commented on July 25, 2024

Thanks for your time and answers guys.

For now is not exactly a problem. It works just fine (several routers send flows to one collector, decode and save to kafka cluster) but thinking in high availability, I believe we need to have a UDP Load Balancer infront of collectors or something like this for failover.

My problem starts here.
We put a UDP LB (nginx) infront of goflow instances (2 right now) as I mentioned above, but when we parse the flows from kafka, what we get as a "SamplerAddress" is the load balancer ip.

My original problem.
We need to aggregate client's data to each flow in order to provide a network traffic report per client. So, we have a registry of local ip's but since we have many routers, local ip is not enough to match the flow data (SrcAddr) with a client so, we need to know from wich router (SamplerAddress) the flow comes (we also have a registry of routers).

I can not use sFlow because our routers (Mikrotik) do not have this option to export flows (just IPFIX and Netflow 5/9).

I also want to say that our application is based on python so, we build pb/flow.pro for this language and when we parse the flow some fields do not exist as README file says... for example SampleAddress is RouterAddr (I think)

Now I'm trying with this version goflow-v3.2.0-linux-x86_64

I didn't know about ECMP (but finding out) Meanwhile if you have any advice/recommendations would be appreciated

from goflow.

MIKNOTAURO avatar MIKNOTAURO commented on July 25, 2024

This is what we are trying to achieve

Diagrama3

from goflow.

lspgn avatar lspgn commented on July 25, 2024

That is unfortunately a problem with NetFlow/IPIFIX due to the agent address being the source IP of the packet.
In order to do load-balancing, IP-level will work (ECMP), or techniques that allow to keep the source IP.
Quickly searched and found an article on nginx: https://www.nginx.com/blog/ip-transparency-direct-server-return-nginx-plus-transparent-proxy/

SamplerAddress is the source address of the NetFlow/IPFIX packet and Agent IP in an sFlow packet. From v2 to v3, some field names changed.

from goflow.

debugloop avatar debugloop commented on July 25, 2024

What do you mean by BGP only?

We do not run OSPF in our DC, and as I understand it, having equal cost with BGP is kind of involved.

@MIKNOTAURO funny how originally, traffic accounting was also the primary use case for the project I'm working on. I think your options are:

  1. UDP multiplexing: either using nginx with IP transparency (direct server return does not matter, Netflow is not a conversation) or samplicator with spoofing
  2. IP routing: using ECMP or some more basic form of service IP routing (for instance tie break by hop distance between equal announcements)

As for the IP adress matching, I am using this module https://github.com/bwNetFlow/ip_prefix_trie within an enrichment tool which takes the flow ingress topic in Kafka and adds some more data while copying to a more advanced topic.
I won't recommend you to use my stuff just yet, but the algortihm for fast IP matching might be useful to you. I also have some version in Python I think, if that would help you more, but I'll have to open source that first. A colleague ported my code to Python some time ago as the ipaddress module is very slow too.

from goflow.

lspgn avatar lspgn commented on July 25, 2024

@debugloop that's a cool module! I will take a look. Thanks for sharing.
Internally, we also use the Kentik Patricia/prefix-trie library: https://github.com/kentik/patricia
for everything that's mapping IP ranges to ASN, countries, plans... But for routers, a simple hash map is good.
Worth to point, Clickhouse can do this through dictionaries.

from goflow.

debugloop avatar debugloop commented on July 25, 2024

Interesting, I've hadn't had kentik/patricia on my radar yet. Too bad it hasn't been around when I got started with my own trie, but I'll look into it now I guess. Main differences would be that I've not considered GC at all, and that I've not tried to make the trie skip sparse nodes.

We're actually using it to tag flows with customer IDs only, which is about the same trick as tagging ASN except our customers are largely not an AS in their own right. For countries we're using some Maxmind dataset and the default API it provides, which is working ok.

I've been wanting to look into Clickhouse too since I've read about it in goflow's Readme.

from goflow.

MIKNOTAURO avatar MIKNOTAURO commented on July 25, 2024

@debugloop That would be really appreciated! and sorry if I'm asking silly questions, I'm new in this field, but actually, my pipe line works ( Routers + Goflow + Kafka + Faust/python + InfluxDB/elasticsearch).

I have another questions about these types of architectures, but since is not related to goflow, if you can help me, I think it would be better to use something like slack, what do you think guys? @lspgn

from goflow.

debugloop avatar debugloop commented on July 25, 2024

@MIKNOTAURO I had to butcher some coworkers larger script, but here it is. It's quite simple really, but much more efficient than iterating over all your subnets as ipaddress objects and checking membership against them.

I don't have slack, but you can mail me using my GH profile;s email or find me on IRC, same nick as here, for instance on freenode.

from goflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.