Coder Social home page Coder Social logo

utoni / ndpid Goto Github PK

View Code? Open in Web Editor NEW
53.0 4.0 11.0 113.49 MB

Tiny nDPI based deep packet inspection daemons / toolkit.

License: GNU General Public License v3.0

C 81.83% CMake 7.72% Shell 9.27% Makefile 1.01% Dockerfile 0.17%
dpi ndpi daemon json-serialization libndpi toolkit linux json

ndpid's Introduction

Build Gitlab-CI Circle-CI Lines of Code Code Smells Bugs Vulnerabilities Reliability Rating Quality Gate Status Docker Automated build

References

ntop Webinar 2022 ntopconf 2023

Disclaimer

Please respect & protect the privacy of others.

The purpose of this software is not to spy on others, but to detect network anomalies and malicious traffic.

Abstract

nDPId is a set of daemons and tools to capture, process and classify network traffic. Its minimal dependencies (besides a half-way modern C library and POSIX threads) are libnDPI (>=4.9.0 or current github dev branch) and libpcap.

The daemon nDPId is capable of multithreading for packet processing, but w/o mutexes for performance reasons. Instead, synchronization is achieved by a packet distribution mechanism. To balance the workload to all threads (more or less) equally, a unique identifier represented as hash value is calculated using a 3-tuple consisting of: IPv4/IPv6 src/dst address; IP header value of the layer4 protocol; and (for TCP/UDP) src/dst port. Other protocols e.g. ICMP/ICMPv6 lack relevance for DPI, thus nDPId does not distinguish between different ICMP/ICMPv6 flows coming from the same host. This saves memory and performance, but might change in the future.

nDPId uses libnDPI's JSON serialization interface to generate a JSON messages for each event it receives from the library and which it then sends out to a UNIX-socket (default: /tmp/ndpid-collector.sock ). From such a socket, nDPIsrvd (or other custom applications) can retrieve incoming JSON-messages and further proceed working/distributing messages to higher-level applications.

Unfortunately, nDPIsrvd does not yet support any encryption/authentication for TCP connections (TODO!).

Architecture

This project uses a kind of microservice architecture.

                connect to UNIX socket [1]        connect to UNIX/TCP socket [2]                
_______________________   |                                 |   __________________________
|     "producer"      |___|                                 |___|       "consumer"       |
|---------------------|      _____________________________      |------------------------|
|                     |      |        nDPIsrvd           |      |                        |
| nDPId --- Thread 1 >| ---> |>           |             <| ---> |< example/c-json-stdout |
| (eth0) `- Thread 2 >| ---> |> collector | distributor <| ---> |________________________|
|        `- Thread N >| ---> |>    >>> forward >>>      <| ---> |                        |
|_____________________|  ^   |____________|______________|   ^  |< example/py-flow-info  |
|                     |  |                                   |  |________________________|
| nDPId --- Thread 1 >|  `- send serialized data [1]         |  |                        |
| (eth1) `- Thread 2 >|                                      |  |< example/...           |
|        `- Thread N >|         receive serialized data [2] -'  |________________________|
|_____________________|                                                                   

where:

  • nDPId capture traffic, extract traffic data (with libnDPI) and send a JSON-serialized output stream to an already existing UNIX-socket;

  • nDPIsrvd:

    • create and manage an "incoming" UNIX-socket (ref [1] above), to fetch data from a local nDPId;
    • apply a buffering logic to received data;
    • create and manage an "outgoing" UNIX or TCP socket (ref [2] above) to relay matched events to connected clients
  • consumers are common/custom applications being able to receive selected flows/events, via both UNIX-socket or TCP-socket.

JSON stream format

JSON messages streamed by both nDPId and nDPIsrvd are presented with:

  • a 5-digit-number describing (as decimal number) the entire JSON message including the newline \n at the end;
  • the JSON messages
[5-digit-number][JSON message]

as with the following example:

01223{"flow_event_id":7,"flow_event_name":"detection-update","thread_id":12,"packet_id":307,"source":"wlan0",[...]}
00458{"packet_event_id":2,"packet_event_name":"packet-flow","thread_id":11,"packet_id":324,"source":"wlan0",[...]]}
00572{"flow_event_id":1,"flow_event_name":"new","thread_id":11,"packet_id":324,"source":"wlan0",[...]}

The full stream of nDPId generated JSON-events can be retrieved directly from nDPId, without relying on nDPIsrvd, by providing a properly managed UNIX-socket.

Technical details about the JSON-message format can be obtained from the related .schema file included in the schema directory

Events

nDPId generates JSON messages whereby each string is assigned to a certain event. Those events specify the contents (key-value-pairs) of the JSON message. They are divided into four categories, each with a number of subevents.

Error Events

They are 17 distinct events, indicating that layer2 or layer3 packet processing failed or not enough flow memory available:

  1. Unknown datalink layer packet
  2. Unknown L3 protocol
  3. Unsupported datalink layer
  4. Packet too short
  5. Unknown packet type
  6. Packet header invalid
  7. IP4 packet too short
  8. Packet smaller than IP4 header:
  9. nDPI IPv4/L4 payload detection failed
  10. IP6 packet too short
  11. Packet smaller than IP6 header
  12. nDPI IPv6/L4 payload detection failed
  13. TCP packet smaller than expected
  14. UDP packet smaller than expected
  15. Captured packet size is smaller than expected packet size
  16. Max flows to track reached
  17. Flow memory allocation failed

Detailed JSON-schema is available here

Daemon Events

There are 4 distinct events indicating startup/shutdown or status events as well as a reconnect event if there was a previous connection failure (collector):

  1. init: nDPId startup
  2. reconnect: (UNIX) socket connection lost previously and was established again
  3. shutdown: nDPId terminates gracefully
  4. status: statistics about the daemon itself e.g. memory consumption, zLib compressions (if enabled)

Detailed JSON-schema is available here

Packet Events

There are 2 events containing base64 encoded packet payloads either belonging to a flow or not:

  1. packet: does not belong to any flow
  2. packet-flow: belongs to a flow e.g. TCP/UDP or ICMP

Detailed JSON-schema is available here

Flow Events

There are 9 distinct events related to a flow:

  1. new: a new TCP/UDP/ICMP flow seen which will be tracked
  2. end: a TCP connection terminates
  3. idle: a flow timed out, because there was no packet on the wire for a certain amount of time
  4. update: inform nDPIsrvd or other apps about a long-lasting flow, whose detection was finished a long time ago but is still active
  5. analyse: provide some information about extracted features of a flow (Experimental; disabled per default, enable with -A)
  6. guessed: libnDPI was not able to reliably detect a layer7 protocol and falls back to IP/Port based detection
  7. detected: libnDPI sucessfully detected a layer7 protocol
  8. detection-update: libnDPI dissected more layer7 protocol data (after detection already done)
  9. not-detected: neither detected nor guessed

Detailed JSON-schema is available here. Also, a graphical representation of Flow Events timeline is available here.

Flow States

A flow can have three different states while it is been tracked by nDPId.

  1. skipped: the flow will be tracked, but no detection will happen to safe memory. See command line argument -I and -E
  2. finished: detection finished and the memory used for the detection is freed
  3. info: detection is in progress and all flow memory required for libnDPI is allocated (this state consumes most memory)

Build (CMake)

nDPId build system is based on CMake

git clone https://github.com/utoni/nDPId.git
[...]
cd ndpid
mkdir build
cd build
cmake ..
[...]
make

see below for a full/test live-session

Based on your build environment and/or desiderata, you could need:

mkdir build
cd build
ccmake ..

or to build with a staticially linked libnDPI:

mkdir build
cd build
cmake .. -DSTATIC_LIBNDPI_INSTALLDIR=[path/to/your/libnDPI/installdir]

If you use the latter, make sure that you've configured libnDPI with ./configure --prefix=[path/to/your/libnDPI/installdir] and remember to set the all-necessary CMake variables to link against shared libraries used by your nDPI build.

e.g.:

mkdir build
cd build
cmake .. -DSTATIC_LIBNDPI_INSTALLDIR=[path/to/your/libnDPI/installdir] -DNDPI_WITH_GCRYPT=ON -DNDPI_WITH_PCRE=OFF -DNDPI_WITH_MAXMINDDB=OFF

Or let a shell script do the work for you:

mkdir build
cd build
cmake .. -DBUILD_NDPI=ON

The CMake cache variable -DBUILD_NDPI=ON builds a version of libnDPI residing as a git submodule in this repository.

run

As mentioned above, in order to run nDPId, a UNIX-socket needs to be provided in order to stream our related JSON-data.

Such a UNIX-socket can be provided by both the included nDPIsrvd daemon, or, if you simply need a quick check, with the ncat utility, with a simple ncat -U /tmp/listen.sock -l -k. Remember that OpenBSD netcat is not able to handle multiple connections reliably.

Once the socket is ready, you can run nDPId capturing and analyzing your own traffic, with something similar to: sudo nDPId -c /tmp/listen.sock If you're using OpenBSD netcat, you need to run: sudo nDPId -c /tmp/listen.sock -o max-reader-threads=1 Make sure that the UNIX socket is accessible by the user (see -u) to whom nDPId changes to, default: nobody.

Of course, both ncat and nDPId need to point to the same UNIX-socket (nDPId provides the -c option, exactly for this. By default, nDPId refers to /tmp/ndpid-collector.sock, and the same default-path is also used by nDPIsrvd for the incoming socket).

Give nDPId some real-traffic. You can capture your own traffic, with something similar to:

socat -u UNIX-Listen:/tmp/listen.sock,fork - # does the same as `ncat`
sudo chown nobody:nobody /tmp/listen.sock # default `nDPId` user/group, see `-u` and `-g`
sudo ./nDPId -c /tmp/listen.sock -l

nDPId supports also UDP collector endpoints:

nc -d -u 127.0.0.1 7000 -l -k
sudo ./nDPId -c 127.0.0.1:7000 -l

or you can generate a nDPId-compatible JSON dump with:

./nDPId-test [path-to-a-PCAP-file]

You can also automatically fire both nDPId and nDPIsrvd automatically, with:

Daemons:

make -C [path-to-a-build-dir] daemon

Or a manual approach with:

./nDPIsrvd -d
sudo ./nDPId -d

or for a usage printout:

./nDPIsrvd -h
./nDPId -h

And why not a flow-info example?

./examples/py-flow-info/flow-info.py

or

./nDPIsrvd-json-dump

or anything below ./examples.

nDPId tuning

It is possible to change nDPId internals w/o recompiling by using -o subopt=value. But be careful: changing the default values may render nDPId useless and is not well tested.

Suboptions for -o:

Format: subopt (unit, comment): description

  • max-flows-per-thread (N, caution advised): affects max. memory usage
  • max-idle-flows-per-thread (N, safe): max. allowed idle flows whose memory gets freed after flow-scan-interval
  • max-reader-threads (N, safe): amount of packet processing threads, every thread can have a max. of max-flows-per-thread flows
  • daemon-status-interval (ms, safe): specifies how often daemon event status is generated
  • compression-scan-interval (ms, untested): specifies how often nDPId scans for inactive flows ready for compression
  • compression-flow-inactivity (ms, untested): the shortest period of time elapsed before nDPId considers compressing a flow that neither sent nor received any data
  • flow-scan-interval (ms, safe): min. amount of time after which nDPId scans for idle or long-lasting flows
  • generic-max-idle-time (ms, untested): time after which a non TCP/UDP/ICMP flow times out
  • icmp-max-idle-time (ms, untested): time after which an ICMP flow times out
  • udp-max-idle-time (ms, caution advised): time after which an UDP flow times out
  • tcp-max-idle-time (ms, caution advised): time after which a TCP flow times out
  • tcp-max-post-end-flow-time (ms, caution advised): a TCP flow that received a FIN or RST waits this amount of time before flow tracking stops and the flow memory is freed
  • max-packets-per-flow-to-send (N, safe): max. packet-flow events generated for the first N packets of each flow
  • max-packets-per-flow-to-process (N, caution advised): max. amount of packets processed by libnDPI
  • max-packets-per-flow-to-analyze (N, safe): max. packets to analyze before sending an analyse event, requires -A
  • error-event-threshold-n (N, safe): max. error events to send until threshold time has passed
  • error-event-threshold-time (N, safe): time after which the error event threshold resets

test

The recommended way to run regression / diff tests:

mkdir build
cd build
cmake .. -DBUILD_NDPI=ON
make nDPId-test test

Alternatively you can run some integration tests manually:

./test/run_tests.sh [/path/to/libnDPI/root/directory] [/path/to/nDPId-test]

e.g.:

./test/run_tests.sh [${HOME}/git/nDPI] [${HOME}/git/nDPId/build/nDPId-test]

Remember that all test results are tied to a specific libnDPI commit hash as part of the git submodule. Using test/run_tests.sh for other commit hashes will most likely result in PCAP diffs.

Why not use examples/py-flow-dashboard/flow-dash.py to visualize nDPId's output.

Contributors

Special thanks to Damiano Verzulli (@verzulli) from GARRLab for providing server and test infrastructure.

ndpid's People

Contributors

abalkin avatar aparcar avatar baskerville avatar benbe avatar crondaemon avatar dependabot[bot] avatar drbitboy avatar elelay avatar frnknstn avatar ghane avatar goriy avatar ivankravets avatar lyokha avatar macauleycheng avatar macgritsch avatar pks-t avatar pt300 avatar simonsj avatar systemcrash avatar utoni avatar verzulli avatar zlolik avatar zserge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ndpid's Issues

OpenWRT: Unexpected flood of daemon events ("daemon_event_name":"status") right after start...

After upgrading nDPId on my OpenWRT box (built on commit 36f1786 ), I'm registering an unexpected behaviour: right after launching the daemon with:

/usr/sbin/nDPId-testing -i br-lan -c 192.168.0.128:9999 -o max-packets-per-flow-to-send=0

where 192.168.0.128 is my local notebook, properly listening on UDP/9999, I receive a STORM of daemon-events.
In 5 to 10 seconds, I got 16358 JSONs, and only 13 of them where related to "flows". Other 16345 where daemon-related, with ~99% of them "status".
Is this an expected behaviour?

I'm attached the above 16358 JSONs, should you need to check them. Again: they have been received in ~10 seconds (maybe less)
raw_json.zip

nDPId: incomplete "flow_event_schema.json" schema definition

During analysis of the UDP-stream generated by nDPId (as for ref), I got following JSON, related to an HTTPS request:

  {
    "flow_event_id": 7,
    "flow_event_name": "detection-update",
    "flow_id": 54994,
    "flow_state": "info",
    "flow_packets_processed": 6,
    [...]
    "l3_proto": "ip4",
    "src_ip": "192.168.0.128",
    "dst_ip": "***.***.***.***",
    "src_port": 45396,
    "dst_port": 443,
    "l4_proto": "tcp",
    "ndpi": {
      "flow_risk": {
        "15": {
          "risk": "TLS (probably) Not Carrying HTTPS",
          "severity": "Low",
          "risk_score": { "total": 760, "client": 680, "server": 80 }
        }
      },
      "confidence": { "6": "DPI" },
      "proto": "TLS",
      "breed": "Safe",
      "category": "Web"
    },
    "tls": {
      "version": "TLSv1.2",
      "client_requested_server_name": "www.********.com",
      "ja3": "398430069e0a8ecfbc8db0778d658d77",
      "ja3s": "fbe78c619e7ea20046131294ad087f05",
      "unsafe_cipher": 0,
      "cipher": "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256",
      "tls_supported_versions": "TLSv1.3,TLSv1.2"
    }
  }

Such a JSON contains following confidence object:

  "confidence": { "6": "DPI" }

In the example JSON schema file included in the nDPId sources, the very same confidence attribute is declared this way:

  "confidence": {
      "type": "string",
      "enum": [
          "0",
          "1",
          "2",
          "3",
          "4"
      ]
  }

and it's missing the value "6" is missing.

Schema definition should be updated to include also "6" value, as well as others missing (5?, 7?)

Improve collectd example.

The current collectd example that can be used together with the exec plugin prints bogus information about flows.
This issue is caused by the memset in c-collectd.c:512. The desired behavior should be that the executable does not only print statistics between collectd time intervals and resets them afterwards no matter if the flow is still active (e.g. did neither end nor timed out).

Instead, it should also process idle and end events to validate if a flow is still active and modify the statistics printed to stdout in a proper way.

OpenWRT: Malformed JSON-UDP stream (when JSON string length is longer than 1024 bytes)

I just built and installed the OpenWRT package of nDPId built on commit 36f1786. I upgraded my previous nDPId, as I understood that the JSON format could have been upgraded (as for this PR #1725 on the nDPI project)

Unfortunately, some of the JSON coming out from current release are malformed: they seems to be "truncated" to 1024 chars. As such, if they are longer.. and gets truncated, they actually comes "joined" with next one, generating a malformed unique JSON (due to missing double quotes, commas and, for sure, the final }.
I'm attaching a JSON-file obtained via a raw netcat, redirecting the output on a file. As you can see, for every row beginning with a number bigger than 1024... there is a problem.

Can you confirm the problem is on the source side? Or should I have been done something on the openwrt-package-building side?

Thanks,
DV

a.zip

Is it possible to carry the DPI information to the flow_event_end event?

In my case, I want to one event of one flow. I try to do such thing

 static void free_workflow(struct nDPId_workflow ** const workflow);
 static void serialize_and_send(struct nDPId_reader_thread * const reader_thread);
 static void jsonize_flow_event(struct nDPId_reader_thread * const reader_thread,
-                               struct nDPId_flow_extended * const flow_ext,
+                               struct nDPId_flow * const flow,
                                enum flow_event event);
 static void jsonize_flow_detection_event(struct nDPId_reader_thread * const reader_thread,
                                          struct nDPId_flow * const flow,
@@ -1788,11 +1788,11 @@ static void process_idle_flow(struct nDPId_reader_thread * const reader_thread,

                 if (flow->flow_extended.flow_basic.tcp_fin_rst_seen != 0)
                 {
-                    jsonize_flow_event(reader_thread, &flow->flow_extended, FLOW_EVENT_END);
+                    jsonize_flow_event(reader_thread, flow, FLOW_EVENT_END);
                 }
                 else
                 {
-                    jsonize_flow_event(reader_thread, &flow->flow_extended, FLOW_EVENT_IDLE);
+                    jsonize_flow_event(reader_thread, flow, FLOW_EVENT_IDLE);
                 }
                 break;
             }
@@ -1843,11 +1843,11 @@ static void process_idle_flow(struct nDPId_reader_thread * const reader_thread,
                 }
                 if (flow->flow_extended.flow_basic.tcp_fin_rst_seen != 0)
                 {
-                    jsonize_flow_event(reader_thread, &flow->flow_extended, FLOW_EVENT_END);
+                    jsonize_flow_event(reader_thread, flow, FLOW_EVENT_END);
                 }
                 else
                 {
-                    jsonize_flow_event(reader_thread, &flow->flow_extended, FLOW_EVENT_IDLE);
+                    jsonize_flow_event(reader_thread, flow, FLOW_EVENT_IDLE);
                 }
                 break;
             }
@@ -1897,11 +1897,11 @@ static void ndpi_flow_update_scan_walker(void const * const A, ndpi_VISIT which,
             case FS_INFO:
             {
                 struct nDPId_flow_extended * const flow_ext = (struct nDPId_flow_extended *)flow_basic;
-
+                struct nDPId_flow * const flow = (struct nDPId_flow *)flow_basic;
                 if (is_flow_update_required(workflow, flow_ext) != 0)
                 {
                     workflow->total_flow_updates++;
-                    jsonize_flow_event(reader_thread, flow_ext, FLOW_EVENT_UPDATE);
+                    jsonize_flow_event(reader_thread, flow, FLOW_EVENT_UPDATE);
                     flow_ext->last_flow_update = workflow->last_thread_time;
                 }
                 break;
@@ -2644,11 +2644,12 @@ static void jsonize_packet_event(struct nDPId_reader_thread * const reader_threa

 /* I decided against ndpi_flow2json as it does not fulfill my needs. */
 static void jsonize_flow_event(struct nDPId_reader_thread * const reader_thread,
-                               struct nDPId_flow_extended * const flow_ext,
+                               struct nDPId_flow * const flow,
                                enum flow_event event)
 {
     struct nDPId_workflow * const workflow = reader_thread->workflow;
     char const ev[] = "flow_event_name";
+    struct nDPId_flow_extended * const flow_ext = &flow->flow_extended;

     ndpi_serialize_string_int32(&workflow->ndpi_serializer, "flow_event_id", event);
     if (event > FLOW_EVENT_INVALID && event < FLOW_EVENT_COUNT)
@@ -4086,7 +4087,7 @@ static void ndpi_process_packet(uint8_t * const args,
                     flow_to_process->flow_extended.last_flow_update = workflow->last_thread_time;
         flow_to_process->flow_extended.max_l4_payload_len[direction] = l4_payload_len;
         flow_to_process->flow_extended.min_l4_payload_len[direction] = l4_payload_len;
-        jsonize_flow_event(reader_thread, &flow_to_process->flow_extended, FLOW_EVENT_NEW);
+        jsonize_flow_event(reader_thread, flow_to_process, FLOW_EVENT_NEW);
     }

     if (nDPId_options.enable_data_analysis != 0 && flow_to_process->flow_extended.flow_analysis != NULL &&
@@ -4114,7 +4115,7 @@ static void ndpi_process_packet(uint8_t * const args,

         if (total_flow_packets == nDPId_options.max_packets_per_flow_to_analyse)
         {
-            jsonize_flow_event(reader_thread, &flow_to_process->flow_extended, FLOW_EVENT_ANALYSE);
+            jsonize_flow_event(reader_thread, flow_to_process, FLOW_EVENT_ANALYSE);
             free_analysis_data(&flow_to_process->flow_extended);
         }
     }

first try to using nDPId_flow instead nDPId_flow_extended

Then add the ndpi_dpi2json in jsonize_flow_event

+            if (event == FLOW_EVENT_END){
+                if (ndpi_dpi2json(workflow->ndpi_struct,
+                                  &flow->info.detection_data->flow,
+                                  flow->flow_extended.detected_l7_protocol,
+                                  &workflow->ndpi_serializer) != 0)
+                {
+                    logger(1,
+                           "[%8llu, %4llu] ndpi_dpi2json failed for detected/detection-update flow",
+                           workflow->packets_captured,
+                           flow->flow_extended.flow_id);
+                }
+            }
+

It will got segment fault. Even I comment free_detection_data. It still got segment.
I don't want to add such if (event == FLOW_EVENT_END && flow->info.detection_completed == 1) line. Because I want the end event can contains the DPI info.

I am wondering the data struct need by ndpi_dpi2json was freed by which function? freed by nDPId or freed by nDPI ?
What I want to achieve is: When the segment solved, I will do not serialize other event.
Do you have some suggestion?

centos7 support?

with cmake installed, nDPId seem can't support centos 7 default build system while libndpi is ok.
have such error

/home/jiamo/diting_ndpid/dependencies/nDPIsrvd.h: In function 'nDPIsrvd_get_next_token':
/home/jiamo/diting_ndpid/dependencies/nDPIsrvd.h:920:5: error: 'for' loop initial declarations are only allowed in C99 mode
     for (int i = *next_index + 1; i < sock->jsmn.tokens_found; ++i)

But if I cmake like cmake -DBUILD_NDPI=off -DCMAKE_C_FLAGS="-std=c99" .. got something like

/home/jiamo/diting_ndpid/examples/c-captured/c-captured.c:663:34: error: 'optarg' undeclared (first use in this function)
                 pidfile = strdup(optarg);
                                  ^

I know gcc in centos 7 is too low. Is there some good method can make centos7 work ?

I can not change source of ndpi

I need to change name of field "proto_by_ip" field in json log, and to that end i go to
nDPId/libnDPI/src/lib/ndpi_utils.c

in ndpi_serialize_proto function :

ndpi_serialize_string_string(serializer, "protocol_by_ip", ndpi_get_proto_name(ndpi_struct, l7_protocol.protocol_by_ip));

After i change this line I run command make in
nDPId/libnDPI
but nothing changes when I run nDPId

where i am going wrong ?

Can not connect to socket in FreeBSD

          After working with ndpid i discovered that some logs are missing and the reason is some threads can not connect to nDPIsrvd Collector at my unix socket.

The error is as following :

nDPId [error]: Thread 10: Could not connect to nDPIsrvd Collector at /tmp/listen.socket, will try again later. Error: Connection refused
nDPId [error]: Thread 11: Could not connect to nDPIsrvd Collector at /tmp/listen.socket, will try again later. Error: Connection refused
nDPId [error]: Thread 15: Could not connect to nDPIsrvd Collector at /tmp/listen.socket, will try again later. Error: Connection refused
nDPId [error]: Thread 14: Could not connect to nDPIsrvd Collector at /tmp/listen.socket, will try again later. Error: Connection refused
nDPId [error]: Thread 13: Could not connect to nDPIsrvd Collector at /tmp/listen.socket, will try again later. Error: Connection refused
nDPId [error]: Thread 12: Could not connect to nDPIsrvd Collector at /tmp/listen.socket, will try again later. Error: Connection refused
nDPId [error]: Thread 9: Could not connect to nDPIsrvd Collector at /tmp/listen.socket, will try again later. Error: Connection refused

The command i use to make a socket :
nc -lkU /tmp/listen.socket

can you help me fix this

Originally posted by @fateme81 in #27 (comment)

nDPId: Add simil-netflow, UDP-based outgoing stream support

I'm interested in the possibility, for nDPId, to directly send out the JSON-stream of events, via UDP, to a remote host.

I'm thinking to a behaviour much similar to NetFlow/IPFIX, that I'm succesfully using in OpenWRT (at home, inside my WDR4300 wireless router), collecting flows with softflowd and relaying them to a remote location, via UDP. As such, I'm able to off-load/enrich netflow analysis, with no technical constraint. Indeed: at my remote location, I'm enriching received flows with geo-referential data (provided by MaxMind free library ) and pushing them to an opensearch instance.

I'm trying to further enrich my data with high-level protocol information (provided by lib-nDPI), and nDPId fit perfectly in such a role. The only missing bit is the possibility to stream out flows, directly.

Of course I can run a local "gateway" (fetching from nDPId and writing to remote location) but this is not easy, as the whole stuff need to be run inside OpenWRT boxes, that are VERY resource-constrained (BTW: libndpi and softflowd are already packaged for OpenWRT) and... I'm lacking C/C++ knowledge :-(

Is this an interesting feature, for the nDPId project?

BTW: thanks for developing nDPId

Info regarding detection of **VERY_LONG** lasting connections

While analyzing flow-event data received by nDPId I noticed that for an OpenVpn connection that I usually launch at startup on my notebook and last for several days... I receive:

  • a NEW flow event;
  • a GUESSED flow event (OpenVPN, based on UDP/1194);
  • a DETECTED flow event (OpenVPN)
  • ...and PLENTY of UPDATE event. Currently I count 612 (six hundreds!) UPDATE, each one sent ~50 seconds.

Is it a normal behaviour? Are there some "expiring" long-lasting flows feature?

Of course, I can handle the "long-lasting 'live' flows" on my side, within my analyzer.... But I'm curious if there is something I'm missing regarding nDPId

Thanks!

Details about "Flow STATES" and "Flow EVENTS"

As for "Flow STATES" and "Flow EVENTS" (whose description is reported in the README), I'm trying to better understand what exactly they means.

After running for ~24 hours my realtime analyzer receiving the UDP-stream of an nDPId instance running at the border of a small set of VPSs, I got these numbers:

numbers

please note that I'm interested ONLY on "flows" tracking.

As you can see, I got 1.735.579 messages ("in"), succesfully processed as JSONs ("ok"), with zero errors ("err").

From those JSONs the analyzer skipped the 2111 JSONs NOT related to flows, and focused to the others 1.733.468.

From those 1.733.468 flow JSONs, it extracted "flow_state" and "flow_event_name", combining them in a string and counting related groups.

With a show counter I got the numbers of occurrencies of those strings and, as you can see, I got:

  • info/new: 460232
  • info/detected: 429441
  • info/detection-update: 344050
  • info/not-detected: 28158
  • finished/end: 82356
  • info/guessed: 3112
  • info/end: 26788
  • finished/idle: 188938
  • finished/update: 7307
  • info/idle: 161786
  • info/update: 1300

whose sum is exactly 1.733.468

I'm trying to figure out the state-diagram used by nDPI, to understand exactly what's the event (and the state) that signal the termination of the activities performed by nDPI. I guess it's "finished/end".... but a "info/end" makes me in trouble :-(

I scratched down following diagram:

state_diagram

Could you be so kind to explain me WHICH EVENT I should focus, to let me know when exactly nDPI will finish processing flow... so that I can expect that no other events, related to that flow, will be received by my analizer?

At the moment, I'm keeping track of "everything", with an always-increasing memory-map of EVERY flows. What I want to achieve is it EXTRACT "completed flows" from such a table and forward them to next processing stage

Sorry if this sounds a bit cumbersome: I understand I'm not exactly clear with this request.... :-(

DFI over DPI?

Hi ,
I assuming you are still actively associated in the ndpi enhancement. Do you think there is a need for DFI (deep flow inspection) along with the existing DPI (where the dissectors mostly checks packet payload patterns or payload length.) to detect application accurately?
I was reading below paper and wants to discuss with you before posting it to ndpi repo issue.
https://reader.elsevier.com/reader/sd/pii/S187770581730276X?token=74B2C8BC7E1E9DEFCC8A8992234ED823EF2A7B8F4BAEA2C547AC049837EEE74362C1D8737D0C18B3CE68F82CA659FDB1&originRegion=eu-west-1&originCreation=20220103053518

In my understanding, if ndpi fails to get info from sni or http etc parsing i.e. upto L5, it goes for pattern matching based on some reverse engineering methods learned from pcap files which may produce false positives in case encrypted traffic. But the paper shows that dissectors made of flow based model gives more accuracy than packet payload based matching. Any comment on this?

Thanks

Could not get netmask for pcap device vmx0: No such file or directory

The command i use :
./nDPId -u root -g root -l -c 127.0.0.1:7000

And the result is :

nDPId: version 1.5-199-g29904cfb

nDPI version: 4.7.0-4260-1f693c3f
 API version: 8445
pcap version: 1.10.4

gcrypt version: 1.8.6internal

nDPId: Capturing packets from default device: vmx0
nDPId: vmx0 IPv4 address netmask subnet: 192.168.162.61 255.255.255.0 192.168.162.0
nDPId [error]: Could not get netmask for pcap device vmx0: No such file or directory

and the result of ifconfig :

vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	description: port1
	options=8000b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM>
	ether 00:50:56:96:4c:e6
	inet6 fe80::250:56ff:fe96:4ce6%vmx0 prefixlen 64 scopeid 0x1
	inet 192.168.162.61 netmask 0xffffff00 broadcast 192.168.162.255
	media: Ethernet autoselect
	status: active
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
enc0: flags=0<> metric 0 mtu 1536
	groups: enc
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
	options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
	inet6 ::1 prefixlen 128
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
	inet 127.0.0.1 netmask 0xff000000
	groups: lo
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
pflog0: flags=100<PROMISC> metric 0 mtu 33160
	groups: pflog
pfsync0: flags=0<> metric 0 mtu 1500
	groups: pfsync

can you help me fix this ?

On the meaning of several Flow-Event JSON attributes

While analyzing flow-event data received by nDPId I'm having some trouble understanding the gory detail of some JSON attributes.

I'm going to raise some questions, here below, with a temptative answer I guessed from my analysis. Please, check them as well.

Note that I'm also available to put them on a specific FAQ page, once their answer will be defined.


Q1: Where I can get the detailed structure of the Flow-Event JSONs I'll be sent by nDPId?

A1: A related schema file can be retrieved in the schema subfolder

--

Q2: What about flow_first_seen, flow_src_last_pkt_time and flow_dst_last_pkt_time timestamp attributes? Which timestamp they refer to?

A2: flow_first_seen is the timestamp registered by nDPId when it saw the very first packet originating this new flow. On the contrary, flow_src_last_pkt_time and flow_dst_last_pkt_time timestamps, are continuously updated by nDPId as soon as it saw packets related to that flow. Based on the direction of such a packet (a request from SRC to DST, or a reply from DST to SRC), nDPId will update the flow_src_last_pkt_time or the flow_dst_last_pkt_time, respectively

--

Q3: What about the flow_idle_time time attribute? Which time it refers to?

A3: ...to be filled...

--

Q4: What about the thread_ts_usec timestamp attribute? Which timestamp it refers to?

A4: ...to be filled...

--

Q5: What about the midstream attribute? It seems its value is always 0...

A5: ...to be filled...

--

Q6: as for update events, While examining a set of flow-events related to the same flow, I noticed:

  • an initial new event (as expected)
  • a following not-detected event (as it could be possible)
  • lots of following update events (in the order of tens...), that seems to be sent at mostly-regular interval (in the order of tens of seconds).

Can you explain when update events are issued and confirm that the thread_ts_usec can be considered as the timestamp associated by nDPId to those events?

This is going to be an important question, expecially in terms of inter-arrival-time analysis of those update events.

--

Duplicated "end" and "idle" events received for some flows

While analyzing my incoming UDP-stream, I noticed that sometime (in the order of once every one thousand) my nDPId-rt-analyzer receive two consecutive end event or two consecutive idle event referred to the very same flow_id

This lead my analyzer to complain, as it expect that for every flow_id, it should receive only one of end|idle event.

I double checked my analyzer, and I bet that it effectively received the events twice, despite the fact that they are identical.

I have no problem getting rid of the spurious event... but probably, this could be of some interest to you.

I'm attaching a ZIP containing the JSON-dump of a selection of 4 distinct flows (id: 7337, 30684, 33023, 32921) where you can clearly see the final double events.

duplicated_evts_example.zip

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.