Coder Social home page Coder Social logo

f2k's Introduction

Build Status Coverage Status

Flow 2 Kafka (f2k)

Netflow to Json/Kafka collector.

Setup

To use it, you only need to do a typical ./configure && make && make install

Usage

Basic usage

The most important configuration parameters are:

  • Output parameters:

    • --kafka=127.0.0.1@rb_flow, broker@topic to send netflow
  • Input parameters: Can be either UDP port or Kafka topic

    • --collector-port=2055, Collector port to listen netflow
    • --kafka-netflow-consumer=kafka@rb_flow_pre, Kafka host/topic to listen for netflow
  • Configuration

    • --rb-config=/opt/rb/etc/f2k/config.json, File with sensors config (see Sensor config)

Sensors config

You need to specify each sensor you want to read netflow from in a JSON file:

{
	"sensors_networks": {
		"4.3.2.1":{
			"observations_id": {
				"1":{
					"enrichment":{
						"sensor_ip":"4.3.2.1",
						"sensor_name":"flow_test",
						"observation_id":1
					}
				}
			}
		}
	}
}

With this file, you will be listening for netflow coming from 4.3.2.1 (this could be a network too, 4.3.2.0/24), and the JSON output will be sent with that sensor_ip, sensor_name and observation_id keys.

Others configuration parameters

Multi-thread

--num-threads=N can be used to specify the number of netflow processing threads.

Long flow separation

Use --separate-long-flows if you want to divide flow with duration>60s into minutes. For example, if the flow duration is 1m30s, f2k will send 1 message containing 2/3 of bytes and pkts for the minute, and 1/3 of bytes and pkts to the last 30 seconds, like if we had received 2 different flows.

(see Test 0017 for more information about how flow are divided)

Geo information

f2k can add geographic information if you specify Maxmind GeoLite Databases location using:

  • --as-list=/opt/rb/share/GeoIP/asn.dat,
  • --country-list=/opt/rb/share/GeoIP/country.dat,

Names resolution

You can include more flow information, like many object names, with the option --hosts-path=/opt/rb/etc/objects/. This folder needs to have files with the provided names in order to f2k read them.

Mac vendor information (mac_vendor)

With --mac-vendor-list=mac_vendors f2k can translate flow source and destination macs, and they will be sending in JSON output as in_src_mac_name, out_src_mac_name, and so on.

The file mac_vendors should be like:

FCF152|Sony Corporation
FCF1CD|OPTEX-FA CO.,LTD.
FCF528|ZyXEL Communications Corporation

And you can generate it using make manuf, that will obtain it automatically from IANA Registration Authority.

Applications/engine ID (applications, engines)

f2k can translate applications and engine ID if you specify a list with them, like:

  • <hosts-path>/engines

    None            0
    IANA-L3         1
    PANA-L3         2
    IANA-L4         3
    PANA-L4         4
    ...
    
  • <hosts-path>/applications

    3com-amp3                 50332277
    3com-tsmux                50331754
    3pc                       16777250
    914c/g                    50331859
    ...
    

Hosts, domains, vlan (hosts, http_domains, vlans)

You can include more information about the flow source and destination ( src_name and dst_name) using a hosts list, using the same format as /etc/hosts. The same can be used with files vlan, domains, macs.

Netflow probe nets

You can specify per netflow probe home nets, so they will be taking into account when solving client/target IP.

You could specify them using home_nets:

"sensors_networks": { "4.3.2.0/24":{ "2055":{
	"sensor_name":"test1",
	"sensor_ip":"",
	"home_nets": [
	        {"network":"10.13.30.0/16", "network_name":"users" },
	        {"network":"2001:0428:ce00:0000:0000:0000:0000:0000/48",
	        				"network_name":"users6"}
	],
}}}

DNS

f2k can make reverse DNS in order to obtain some hosts names. To enable them, you must use:

  • enable-ptr-dns, general enable
  • dns-cache-size-mb, DNS cache to not repeat PTR queries
  • dns-cache-timeout-s, Entry cache timeout

Template cache

Using folder

You can specify a folder to save/load templates using --template-cache=/opt/rb/var/f2k/templates.

If you want to use zookeeper to share templates between f2k instances, you can specify zookeeper host using --zk-host=127.0.0.1:2181 and a proper timeout to read them with --zk-timeout=30. Note that you need to compile f2k using --enable-zookeeper.

librdkafka options

All librdkafka options. can be used using -X (flow producer), Y (flow consumer), or Z (flow discarder) parameter. The argument will be passed directly to librdkafka config, so you can use whatever config you need.

Example:

--kafka-discarder=kafka@rb_flow_discard     # Define host and topic
--kafka-netflow-consumer=kafka@rb_flow_pre  # Define host and topic
-X=socket.max.fails=3                       # Define configuration for flow producer
-X=retry.backoff.ms=100                     # Define configuration for flow producer
-Y=group.id=f2k                             # Define configuration for consumer
-Z=group.id=f2k                             # Define configuration for discard producer

Recommended options are:

  • socket.max.fails=3,
  • delivery.report.only.error=true,

f2k's People

Contributors

arodriguezdlc avatar bigomby avatar eugpermar avatar manegron avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

f2k's Issues

Discard Netflow to a Kafka topic

f2k has an internal database with the IP addresses of the sensors (this is readed from the configuration file). When Netflow data is received, f2k checks if the data comes from the IP address of a sensor stored on this database. The data received from unknowns IP is ignored.

It's necessary to implemente a way to dump this ignored data from unknown sensors to a Kafka topic so it can be processed.

Errors in Make manuf

I have detect some errors in script called by make manuf, which is tools/manuf.py. Shebang is incorrect, and when script tries to write mac_vendors file fails due to incorrect file codifications.

I have fixed this errors and I have create the following pull request: #39

Coverage is broken

Currently builds on Travis CI are not reporting coverage to Coveralls (always send 0%).

Allow tests to be skipped

Integration tests involves some services that may be not available during tests, so they should be disabled by default and could be enabled using:

./configure --enable-integration-tests

Integration tests

Add integration tests. This should be done by running:

  • Zookeeper
  • Kafka
  • f2k

Then send NetFlow traffic to f2k and check data readed from Kafka.

Segmentation fault when executing f2k without config files on centos 7

I have compiled an f2k and installed dependencies on a centos 7 system. When I execute it I get the following output:

./f2k
10/Jan/2017 11:18:22 [f2k.c:1614] Welcome to f2k v.6.13.170110 (1.0.1-61-g2e1b16) for Linux
10/Jan/2017 11:18:22 [f2k.c:2307] Welcome to f2k v.6.13.170110 for Linux
10/Jan/2017 11:18:22 [f2k.c:2356] Flows ASs will not be computed
10/Jan/2017 11:18:22 [util.c:828] nProbe changed user to 'nobody'
Violación de segmento

lots of erros while executing step "configure && make && make install"

I am trying to install f2k in my ubuntu14 machine by cloning the git repo and executing the the step "./configure && make && make install". However, I'm getting lots of "no such header file errors". Please find below the snippet that I got after executing the step. There were other errors too, which I removed by removing some of dependent packages. But, for below errors, I'm not able to find any proper solution.

root@ubuntu:/home/devops/f2k# ./configure
checking for OS or distribution... ok (Ubuntu)
checking for C compiler from CC env... failed
checking for gcc (by command)... ok
checking executable ld... ok
checking executable nm... ok
checking executable objdump... ok
checking executable strip... ok
checking for pkgconfig (by command)... ok
checking for install (by command)... ok
checking for __atomic_32 (by compile)... ok
checking for __atomic_64 (by compile)... ok
checking for socket (by compile)... ok
checking for librd (by pkg-config)... failed
checking for librd (by compile)... failed (fail)
checking for pcap (by pkg-config)... failed
checking for pcap (by compile)... failed (fail)
checking for librdkafka (by pkg-config)... failed
checking for librdkafka (by compile)... ok
checking for rb_mac_vendor (by pkg-config)... failed
checking for rb_mac_vendor (by compile)... failed (fail)
checking for geoip (by pkg-config)... ok
checking for zookeeper (by pkg-config)... failed
checking for zookeeper (by compile)... ok
checking for udns (by pkg-config)... failed
checking for udns (by compile)... failed (fail)
checking for HAVE_JSON (by pkg-config)... failed
checking for HAVE_JSON (by compile)... ok
checking for optreset (by compile)... failed (disable)
checking for pthread (by pkg-config)... failed
checking for pthread (by compile)... ok
checking for pthread_setaffinity_np (by compile)... failed (disable)
checking for sin6_len (by compile)... failed (disable)
checking for netfilter (by pkg-config)... failed
checking for netfilter (by compile)... failed (disable)
checking for sctp (by compile)... failed (disable)
checking for pcap_next_ex (by compile)... failed (disable)
checking for pf_ring (by pkg-config)... failed
checking for pf_ring (by compile)... failed (disable)
librd ()
module: f2k
action: fail
reason:
compile check failed:
CC: CC
flags: -lrd -lpthread -lz -lrt
gcc -Wno-missing-field-initializers -Wall -Wsign-compare -Wfloat-equal -Wpointer-arith -O2 -g -Wcast-qual -Wunused -Wextra -Wdisabled-optimization -Wshadow -Wmissing-declarations -Wundef -Wswitch-default -Wmissing-include-dirs -Wstrict-overflow=5 -Winit-self -Wlogical-op -Wcast-align -Wdisabled-optimization -DNDEBUG -D_GNU_SOURCE -DFORTIFY_SOURCE=2 -Wall -Werror -lrd -lpthread -lz -lrt _mkltmp8AkgWk.c -o _mkltmp8AkgWk.c.o :
_mkltmp8AkgWk.c:1:22: fatal error: librd/rd.h: No such file or directory
#include <librd/rd.h>
^
compilation terminated.
source: #include <librd/rd.h>
pcap ()
module: f2k
action: fail
reason:
compile check failed:
CC: CC
flags: -lpcap
gcc -Wno-missing-field-initializers -Wall -Wsign-compare -Wfloat-equal -Wpointer-arith -O2 -g -Wcast-qual -Wunused -Wextra -Wdisabled-optimization -Wshadow -Wmissing-declarations -Wundef -Wswitch-default -Wmissing-include-dirs -Wstrict-overflow=5 -Winit-self -Wlogical-op -Wcast-align -Wdisabled-optimization -DNDEBUG -D_GNU_SOURCE -DFORTIFY_SOURCE=2 -Wall -Werror -lpcap _mkltmpDRRB09.c -o _mkltmpDRRB09.c.o :
/usr/bin/ld: cannot find -lpcap
collect2: error: ld returned 1 exit status
source:
rb_mac_vendor (HAVE_RB_MAC_VENDORS)
module: f2k
action: fail
reason:
compile check failed:
CC: CC
flags: -lrb_mac_vendors
gcc -Wno-missing-field-initializers -Wall -Wsign-compare -Wfloat-equal -Wpointer-arith -O2 -g -Wcast-qual -Wunused -Wextra -Wdisabled-optimization -Wshadow -Wmissing-declarations -Wundef -Wswitch-default -Wmissing-include-dirs -Wstrict-overflow=5 -Winit-self -Wlogical-op -Wcast-align -Wdisabled-optimization -DNDEBUG -D_GNU_SOURCE -DFORTIFY_SOURCE=2 -Wall -Werror -lrb_mac_vendors _mkltmpmtvaLo.c -o _mkltmpmtvaLo.c.o :
_mkltmpmtvaLo.c:1:28: fatal error: rb_mac_vendors.h: No such file or directory
#include <rb_mac_vendors.h>
compilation terminated.
source: #include <rb_mac_vendors.h>
udns (HAVE_UDNS)
module: f2k
action: fail
reason:
compile check failed:
CC: CC
flags: -ludns
gcc -I/usr/include/ -Wno-missing-field-initializers -Wall -Wsign-compare -Wfloat-equal -Wpointer-arith -O2 -g -Wcast-qual -Wunused -Wextra -Wdisabled-optimization -Wshadow -Wmissing-declarations -Wundef -Wswitch-default -Wmissing-include-dirs -Wstrict-overflow=5 -Winit-self -Wlogical-op -Wcast-align -Wdisabled-optimization -DNDEBUG -D_GNU_SOURCE -DFORTIFY_SOURCE=2 -Wall -Werror -ludns _mkltmpNkVGfP.c -o _mkltmpNkVGfP.c.o :
/tmp/ccA5w4FZ.o: In function f': /home/devops/f2k/_mkltmpNkVGfP.c:2: undefined reference to dns_init'
collect2: error: ld returned 1 exit status
source: #include <udns.h>
void *f();void *f(){return dns_init;}

Travis-CI integration

The repository should be integrated with Travis CI for automated tests, builds and coverage.

segmentation fault

we are attempting to run f2k on centos 7.

f2k immediately segfault on start. this is not entirely deterministic as it will continue to run every once in a while:

16/Oct/2017 16:33:35 [rb_listener.c:154] Creating listening socket in port 9996
16/Oct/2017 16:33:35 [rb_kafka.c:58] Applying socket.keepalive.enable=true to rdkafka
16/Oct/2017 16:33:35 [rb_kafka.c:58] Applying socket.max.fails=3 to rdkafka
16/Oct/2017 16:33:35 [rb_kafka.c:58] Applying socket.keepalive.enable=true to rdkafka
16/Oct/2017 16:33:35 [rb_kafka.c:58] Applying socket.max.fails=3 to rdkafka
Segmentation fault (core dumped)

the segfault is due to readOnlyGlobals.rb_databases.sensors_info being null:

#0  get_sensor (database=0x0, ip=181607651) at rb_sensor.c:1329
#1  0x00000000004136e7 in netFlowCollectLoop0 (collector=<optimized out>, collector=<optimized out>) at rb_listener.c:87
#2  netFlowCollectLoop (_port_collector=0x625460) at rb_listener.c:136
#3  0x00007ffff7bc6dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ffff671d28d in clone () from /lib64/libc.so.6

the commandline we use:

f2k --debug --kafka=10.212.225.48:[email protected] --collector-port=9996 --rb-config=config_basic.json -b99 --template ../templates/ --event-log=test.log

any help advise appreciated.

Parse options for discard topic

Parse the option to specify a kafka broker and topic to dump the discarded flow.

  • --kafka-discarder=<kafka-broker-ip>@<kafka-topic>

Helgrind errors on tests

The following error is detected by Helgrind on an external library:

==29300== Thread #2: pthread_cond_{timed}wait called with un-held mutex
==29300==    at 0x4C35954: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==29300==    by 0x4C35A19: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==29300==    by 0x4E43C5C: rd_fifoq_pop0 (in /usr/local/lib/librd.so)
==29300==    by 0x410C55: popPacketFromQueue_timedwait (util.h:90)
==29300==    by 0x410C55: netFlowConsumerLoop (collect.c:1456)
==29300==    by 0x4C34DB6: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==29300==    by 0x50556F9: start_thread (pthread_create.c:333)
==29300==    by 0x6471B5C: clone (clone.S:109)

This would be a possible suppression:

{
   librd rdqueue 
   Helgrind:Misc
   obj:/usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so
   obj:/usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so
   fun:rd_fifoq_pop0
   fun:popPacketFromQueue_timedwait
   fun:netFlowConsumerLoop
   obj:/usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so
   fun:start_thread
   fun:clone
}

Add Kafka consumer to collect.c

It's necessary to implement a Kafka consumer to be able to get raw netflow data from a Kafka queue instead from an UDP socket.

Test memory issues

It's necessary to recreate memory errors to tests the application robustness.

Allow sensors with dynamic IP address

The application should be able to allow unregistered IP address if the sensors send an option template with a valid serial number.

Use case (current behavior):

  1. An UDP netflow packet arrives from a registered IP (IP is on the config file).
  2. The listener gets the message and forwards it to a worker.
  3. The worker parses the message and send a JSON with the data to Kafka.

Use case:

  1. An UDP netflow packet arrives from an unregistered IP (No such IP exists on the config file).
  2. The listener gets the message and forwards it to a worker with unknown_ip=1.
  3. The worker parses the message but does not send anything to Kafka.
  4. The worker checks if an option template exists with a serial number field.
  5. The worker checks if the serial number is registered (SN exists on the config file).
  6. If the SN is registered, register the IP address.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.