Coder Social home page Coder Social logo

utoni / ndpid Goto Github PK

View Code? Open in Web Editor NEW
56.0 5.0 11.0 112.39 MB

Tiny nDPI based deep packet inspection daemons / toolkit.

License: GNU General Public License v3.0

C 81.82% CMake 7.72% Shell 9.27% Makefile 1.01% Dockerfile 0.17%
dpi ndpi daemon json-serialization libndpi toolkit linux json

ndpid's Issues

nDPId: Add simil-netflow, UDP-based outgoing stream support

I'm interested in the possibility, for nDPId, to directly send out the JSON-stream of events, via UDP, to a remote host.

I'm thinking to a behaviour much similar to NetFlow/IPFIX, that I'm succesfully using in OpenWRT (at home, inside my WDR4300 wireless router), collecting flows with softflowd and relaying them to a remote location, via UDP. As such, I'm able to off-load/enrich netflow analysis, with no technical constraint. Indeed: at my remote location, I'm enriching received flows with geo-referential data (provided by MaxMind free library ) and pushing them to an opensearch instance.

I'm trying to further enrich my data with high-level protocol information (provided by lib-nDPI), and nDPId fit perfectly in such a role. The only missing bit is the possibility to stream out flows, directly.

Of course I can run a local "gateway" (fetching from nDPId and writing to remote location) but this is not easy, as the whole stuff need to be run inside OpenWRT boxes, that are VERY resource-constrained (BTW: libndpi and softflowd are already packaged for OpenWRT) and... I'm lacking C/C++ knowledge :-(

Is this an interesting feature, for the nDPId project?

BTW: thanks for developing nDPId

Duplicated "end" and "idle" events received for some flows

While analyzing my incoming UDP-stream, I noticed that sometime (in the order of once every one thousand) my nDPId-rt-analyzer receive two consecutive end event or two consecutive idle event referred to the very same flow_id

This lead my analyzer to complain, as it expect that for every flow_id, it should receive only one of end|idle event.

I double checked my analyzer, and I bet that it effectively received the events twice, despite the fact that they are identical.

I have no problem getting rid of the spurious event... but probably, this could be of some interest to you.

I'm attaching a ZIP containing the JSON-dump of a selection of 4 distinct flows (id: 7337, 30684, 33023, 32921) where you can clearly see the final double events.

duplicated_evts_example.zip

Info regarding detection of **VERY_LONG** lasting connections

While analyzing flow-event data received by nDPId I noticed that for an OpenVpn connection that I usually launch at startup on my notebook and last for several days... I receive:

  • a NEW flow event;
  • a GUESSED flow event (OpenVPN, based on UDP/1194);
  • a DETECTED flow event (OpenVPN)
  • ...and PLENTY of UPDATE event. Currently I count 612 (six hundreds!) UPDATE, each one sent ~50 seconds.

Is it a normal behaviour? Are there some "expiring" long-lasting flows feature?

Of course, I can handle the "long-lasting 'live' flows" on my side, within my analyzer.... But I'm curious if there is something I'm missing regarding nDPId

Thanks!

OpenWRT: Malformed JSON-UDP stream (when JSON string length is longer than 1024 bytes)

I just built and installed the OpenWRT package of nDPId built on commit 36f1786. I upgraded my previous nDPId, as I understood that the JSON format could have been upgraded (as for this PR #1725 on the nDPI project)

Unfortunately, some of the JSON coming out from current release are malformed: they seems to be "truncated" to 1024 chars. As such, if they are longer.. and gets truncated, they actually comes "joined" with next one, generating a malformed unique JSON (due to missing double quotes, commas and, for sure, the final }.
I'm attaching a JSON-file obtained via a raw netcat, redirecting the output on a file. As you can see, for every row beginning with a number bigger than 1024... there is a problem.

Can you confirm the problem is on the source side? Or should I have been done something on the openwrt-package-building side?

Thanks,
DV

a.zip

DFI over DPI?

Hi ,
I assuming you are still actively associated in the ndpi enhancement. Do you think there is a need for DFI (deep flow inspection) along with the existing DPI (where the dissectors mostly checks packet payload patterns or payload length.) to detect application accurately?
I was reading below paper and wants to discuss with you before posting it to ndpi repo issue.
https://reader.elsevier.com/reader/sd/pii/S187770581730276X?token=74B2C8BC7E1E9DEFCC8A8992234ED823EF2A7B8F4BAEA2C547AC049837EEE74362C1D8737D0C18B3CE68F82CA659FDB1&originRegion=eu-west-1&originCreation=20220103053518

In my understanding, if ndpi fails to get info from sni or http etc parsing i.e. upto L5, it goes for pattern matching based on some reverse engineering methods learned from pcap files which may produce false positives in case encrypted traffic. But the paper shows that dissectors made of flow based model gives more accuracy than packet payload based matching. Any comment on this?

Thanks

I can not change source of ndpi

I need to change name of field "proto_by_ip" field in json log, and to that end i go to
nDPId/libnDPI/src/lib/ndpi_utils.c

in ndpi_serialize_proto function :

ndpi_serialize_string_string(serializer, "protocol_by_ip", ndpi_get_proto_name(ndpi_struct, l7_protocol.protocol_by_ip));

After i change this line I run command make in
nDPId/libnDPI
but nothing changes when I run nDPId

where i am going wrong ?

nDPId: incomplete "flow_event_schema.json" schema definition

During analysis of the UDP-stream generated by nDPId (as for ref), I got following JSON, related to an HTTPS request:

  {
    "flow_event_id": 7,
    "flow_event_name": "detection-update",
    "flow_id": 54994,
    "flow_state": "info",
    "flow_packets_processed": 6,
    [...]
    "l3_proto": "ip4",
    "src_ip": "192.168.0.128",
    "dst_ip": "***.***.***.***",
    "src_port": 45396,
    "dst_port": 443,
    "l4_proto": "tcp",
    "ndpi": {
      "flow_risk": {
        "15": {
          "risk": "TLS (probably) Not Carrying HTTPS",
          "severity": "Low",
          "risk_score": { "total": 760, "client": 680, "server": 80 }
        }
      },
      "confidence": { "6": "DPI" },
      "proto": "TLS",
      "breed": "Safe",
      "category": "Web"
    },
    "tls": {
      "version": "TLSv1.2",
      "client_requested_server_name": "www.********.com",
      "ja3": "398430069e0a8ecfbc8db0778d658d77",
      "ja3s": "fbe78c619e7ea20046131294ad087f05",
      "unsafe_cipher": 0,
      "cipher": "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256",
      "tls_supported_versions": "TLSv1.3,TLSv1.2"
    }
  }

Such a JSON contains following confidence object:

  "confidence": { "6": "DPI" }

In the example JSON schema file included in the nDPId sources, the very same confidence attribute is declared this way:

  "confidence": {
      "type": "string",
      "enum": [
          "0",
          "1",
          "2",
          "3",
          "4"
      ]
  }

and it's missing the value "6" is missing.

Schema definition should be updated to include also "6" value, as well as others missing (5?, 7?)

On the meaning of several Flow-Event JSON attributes

While analyzing flow-event data received by nDPId I'm having some trouble understanding the gory detail of some JSON attributes.

I'm going to raise some questions, here below, with a temptative answer I guessed from my analysis. Please, check them as well.

Note that I'm also available to put them on a specific FAQ page, once their answer will be defined.


Q1: Where I can get the detailed structure of the Flow-Event JSONs I'll be sent by nDPId?

A1: A related schema file can be retrieved in the schema subfolder

--

Q2: What about flow_first_seen, flow_src_last_pkt_time and flow_dst_last_pkt_time timestamp attributes? Which timestamp they refer to?

A2: flow_first_seen is the timestamp registered by nDPId when it saw the very first packet originating this new flow. On the contrary, flow_src_last_pkt_time and flow_dst_last_pkt_time timestamps, are continuously updated by nDPId as soon as it saw packets related to that flow. Based on the direction of such a packet (a request from SRC to DST, or a reply from DST to SRC), nDPId will update the flow_src_last_pkt_time or the flow_dst_last_pkt_time, respectively

--

Q3: What about the flow_idle_time time attribute? Which time it refers to?

A3: ...to be filled...

--

Q4: What about the thread_ts_usec timestamp attribute? Which timestamp it refers to?

A4: ...to be filled...

--

Q5: What about the midstream attribute? It seems its value is always 0...

A5: ...to be filled...

--

Q6: as for update events, While examining a set of flow-events related to the same flow, I noticed:

  • an initial new event (as expected)
  • a following not-detected event (as it could be possible)
  • lots of following update events (in the order of tens...), that seems to be sent at mostly-regular interval (in the order of tens of seconds).

Can you explain when update events are issued and confirm that the thread_ts_usec can be considered as the timestamp associated by nDPId to those events?

This is going to be an important question, expecially in terms of inter-arrival-time analysis of those update events.

--

OpenWRT: Unexpected flood of daemon events ("daemon_event_name":"status") right after start...

After upgrading nDPId on my OpenWRT box (built on commit 36f1786 ), I'm registering an unexpected behaviour: right after launching the daemon with:

/usr/sbin/nDPId-testing -i br-lan -c 192.168.0.128:9999 -o max-packets-per-flow-to-send=0

where 192.168.0.128 is my local notebook, properly listening on UDP/9999, I receive a STORM of daemon-events.
In 5 to 10 seconds, I got 16358 JSONs, and only 13 of them where related to "flows". Other 16345 where daemon-related, with ~99% of them "status".
Is this an expected behaviour?

I'm attached the above 16358 JSONs, should you need to check them. Again: they have been received in ~10 seconds (maybe less)
raw_json.zip

Details about "Flow STATES" and "Flow EVENTS"

As for "Flow STATES" and "Flow EVENTS" (whose description is reported in the README), I'm trying to better understand what exactly they means.

After running for ~24 hours my realtime analyzer receiving the UDP-stream of an nDPId instance running at the border of a small set of VPSs, I got these numbers:

numbers

please note that I'm interested ONLY on "flows" tracking.

As you can see, I got 1.735.579 messages ("in"), succesfully processed as JSONs ("ok"), with zero errors ("err").

From those JSONs the analyzer skipped the 2111 JSONs NOT related to flows, and focused to the others 1.733.468.

From those 1.733.468 flow JSONs, it extracted "flow_state" and "flow_event_name", combining them in a string and counting related groups.

With a show counter I got the numbers of occurrencies of those strings and, as you can see, I got:

  • info/new: 460232
  • info/detected: 429441
  • info/detection-update: 344050
  • info/not-detected: 28158
  • finished/end: 82356
  • info/guessed: 3112
  • info/end: 26788
  • finished/idle: 188938
  • finished/update: 7307
  • info/idle: 161786
  • info/update: 1300

whose sum is exactly 1.733.468

I'm trying to figure out the state-diagram used by nDPI, to understand exactly what's the event (and the state) that signal the termination of the activities performed by nDPI. I guess it's "finished/end".... but a "info/end" makes me in trouble :-(

I scratched down following diagram:

state_diagram

Could you be so kind to explain me WHICH EVENT I should focus, to let me know when exactly nDPI will finish processing flow... so that I can expect that no other events, related to that flow, will be received by my analizer?

At the moment, I'm keeping track of "everything", with an always-increasing memory-map of EVERY flows. What I want to achieve is it EXTRACT "completed flows" from such a table and forward them to next processing stage

Sorry if this sounds a bit cumbersome: I understand I'm not exactly clear with this request.... :-(

Can not connect to socket in FreeBSD

          After working with ndpid i discovered that some logs are missing and the reason is some threads can not connect to nDPIsrvd Collector at my unix socket.

The error is as following :

nDPId [error]: Thread 10: Could not connect to nDPIsrvd Collector at /tmp/listen.socket, will try again later. Error: Connection refused
nDPId [error]: Thread 11: Could not connect to nDPIsrvd Collector at /tmp/listen.socket, will try again later. Error: Connection refused
nDPId [error]: Thread 15: Could not connect to nDPIsrvd Collector at /tmp/listen.socket, will try again later. Error: Connection refused
nDPId [error]: Thread 14: Could not connect to nDPIsrvd Collector at /tmp/listen.socket, will try again later. Error: Connection refused
nDPId [error]: Thread 13: Could not connect to nDPIsrvd Collector at /tmp/listen.socket, will try again later. Error: Connection refused
nDPId [error]: Thread 12: Could not connect to nDPIsrvd Collector at /tmp/listen.socket, will try again later. Error: Connection refused
nDPId [error]: Thread 9: Could not connect to nDPIsrvd Collector at /tmp/listen.socket, will try again later. Error: Connection refused

The command i use to make a socket :
nc -lkU /tmp/listen.socket

can you help me fix this

Originally posted by @fateme81 in #27 (comment)

Improve collectd example.

The current collectd example that can be used together with the exec plugin prints bogus information about flows.
This issue is caused by the memset in c-collectd.c:512. The desired behavior should be that the executable does not only print statistics between collectd time intervals and resets them afterwards no matter if the flow is still active (e.g. did neither end nor timed out).

Instead, it should also process idle and end events to validate if a flow is still active and modify the statistics printed to stdout in a proper way.

Is it possible to carry the DPI information to the flow_event_end event?

In my case, I want to one event of one flow. I try to do such thing

 static void free_workflow(struct nDPId_workflow ** const workflow);
 static void serialize_and_send(struct nDPId_reader_thread * const reader_thread);
 static void jsonize_flow_event(struct nDPId_reader_thread * const reader_thread,
-                               struct nDPId_flow_extended * const flow_ext,
+                               struct nDPId_flow * const flow,
                                enum flow_event event);
 static void jsonize_flow_detection_event(struct nDPId_reader_thread * const reader_thread,
                                          struct nDPId_flow * const flow,
@@ -1788,11 +1788,11 @@ static void process_idle_flow(struct nDPId_reader_thread * const reader_thread,

                 if (flow->flow_extended.flow_basic.tcp_fin_rst_seen != 0)
                 {
-                    jsonize_flow_event(reader_thread, &flow->flow_extended, FLOW_EVENT_END);
+                    jsonize_flow_event(reader_thread, flow, FLOW_EVENT_END);
                 }
                 else
                 {
-                    jsonize_flow_event(reader_thread, &flow->flow_extended, FLOW_EVENT_IDLE);
+                    jsonize_flow_event(reader_thread, flow, FLOW_EVENT_IDLE);
                 }
                 break;
             }
@@ -1843,11 +1843,11 @@ static void process_idle_flow(struct nDPId_reader_thread * const reader_thread,
                 }
                 if (flow->flow_extended.flow_basic.tcp_fin_rst_seen != 0)
                 {
-                    jsonize_flow_event(reader_thread, &flow->flow_extended, FLOW_EVENT_END);
+                    jsonize_flow_event(reader_thread, flow, FLOW_EVENT_END);
                 }
                 else
                 {
-                    jsonize_flow_event(reader_thread, &flow->flow_extended, FLOW_EVENT_IDLE);
+                    jsonize_flow_event(reader_thread, flow, FLOW_EVENT_IDLE);
                 }
                 break;
             }
@@ -1897,11 +1897,11 @@ static void ndpi_flow_update_scan_walker(void const * const A, ndpi_VISIT which,
             case FS_INFO:
             {
                 struct nDPId_flow_extended * const flow_ext = (struct nDPId_flow_extended *)flow_basic;
-
+                struct nDPId_flow * const flow = (struct nDPId_flow *)flow_basic;
                 if (is_flow_update_required(workflow, flow_ext) != 0)
                 {
                     workflow->total_flow_updates++;
-                    jsonize_flow_event(reader_thread, flow_ext, FLOW_EVENT_UPDATE);
+                    jsonize_flow_event(reader_thread, flow, FLOW_EVENT_UPDATE);
                     flow_ext->last_flow_update = workflow->last_thread_time;
                 }
                 break;
@@ -2644,11 +2644,12 @@ static void jsonize_packet_event(struct nDPId_reader_thread * const reader_threa

 /* I decided against ndpi_flow2json as it does not fulfill my needs. */
 static void jsonize_flow_event(struct nDPId_reader_thread * const reader_thread,
-                               struct nDPId_flow_extended * const flow_ext,
+                               struct nDPId_flow * const flow,
                                enum flow_event event)
 {
     struct nDPId_workflow * const workflow = reader_thread->workflow;
     char const ev[] = "flow_event_name";
+    struct nDPId_flow_extended * const flow_ext = &flow->flow_extended;

     ndpi_serialize_string_int32(&workflow->ndpi_serializer, "flow_event_id", event);
     if (event > FLOW_EVENT_INVALID && event < FLOW_EVENT_COUNT)
@@ -4086,7 +4087,7 @@ static void ndpi_process_packet(uint8_t * const args,
                     flow_to_process->flow_extended.last_flow_update = workflow->last_thread_time;
         flow_to_process->flow_extended.max_l4_payload_len[direction] = l4_payload_len;
         flow_to_process->flow_extended.min_l4_payload_len[direction] = l4_payload_len;
-        jsonize_flow_event(reader_thread, &flow_to_process->flow_extended, FLOW_EVENT_NEW);
+        jsonize_flow_event(reader_thread, flow_to_process, FLOW_EVENT_NEW);
     }

     if (nDPId_options.enable_data_analysis != 0 && flow_to_process->flow_extended.flow_analysis != NULL &&
@@ -4114,7 +4115,7 @@ static void ndpi_process_packet(uint8_t * const args,

         if (total_flow_packets == nDPId_options.max_packets_per_flow_to_analyse)
         {
-            jsonize_flow_event(reader_thread, &flow_to_process->flow_extended, FLOW_EVENT_ANALYSE);
+            jsonize_flow_event(reader_thread, flow_to_process, FLOW_EVENT_ANALYSE);
             free_analysis_data(&flow_to_process->flow_extended);
         }
     }

first try to using nDPId_flow instead nDPId_flow_extended

Then add the ndpi_dpi2json in jsonize_flow_event

+            if (event == FLOW_EVENT_END){
+                if (ndpi_dpi2json(workflow->ndpi_struct,
+                                  &flow->info.detection_data->flow,
+                                  flow->flow_extended.detected_l7_protocol,
+                                  &workflow->ndpi_serializer) != 0)
+                {
+                    logger(1,
+                           "[%8llu, %4llu] ndpi_dpi2json failed for detected/detection-update flow",
+                           workflow->packets_captured,
+                           flow->flow_extended.flow_id);
+                }
+            }
+

It will got segment fault. Even I comment free_detection_data. It still got segment.
I don't want to add such if (event == FLOW_EVENT_END && flow->info.detection_completed == 1) line. Because I want the end event can contains the DPI info.

I am wondering the data struct need by ndpi_dpi2json was freed by which function? freed by nDPId or freed by nDPI ?
What I want to achieve is: When the segment solved, I will do not serialize other event.
Do you have some suggestion?

Could not get netmask for pcap device vmx0: No such file or directory

The command i use :
./nDPId -u root -g root -l -c 127.0.0.1:7000

And the result is :

nDPId: version 1.5-199-g29904cfb

nDPI version: 4.7.0-4260-1f693c3f
 API version: 8445
pcap version: 1.10.4

gcrypt version: 1.8.6internal

nDPId: Capturing packets from default device: vmx0
nDPId: vmx0 IPv4 address netmask subnet: 192.168.162.61 255.255.255.0 192.168.162.0
nDPId [error]: Could not get netmask for pcap device vmx0: No such file or directory

and the result of ifconfig :

vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	description: port1
	options=8000b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM>
	ether 00:50:56:96:4c:e6
	inet6 fe80::250:56ff:fe96:4ce6%vmx0 prefixlen 64 scopeid 0x1
	inet 192.168.162.61 netmask 0xffffff00 broadcast 192.168.162.255
	media: Ethernet autoselect
	status: active
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
enc0: flags=0<> metric 0 mtu 1536
	groups: enc
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
	options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
	inet6 ::1 prefixlen 128
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
	inet 127.0.0.1 netmask 0xff000000
	groups: lo
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
pflog0: flags=100<PROMISC> metric 0 mtu 33160
	groups: pflog
pfsync0: flags=0<> metric 0 mtu 1500
	groups: pfsync

can you help me fix this ?

centos7 support?

with cmake installed, nDPId seem can't support centos 7 default build system while libndpi is ok.
have such error

/home/jiamo/diting_ndpid/dependencies/nDPIsrvd.h: In function 'nDPIsrvd_get_next_token':
/home/jiamo/diting_ndpid/dependencies/nDPIsrvd.h:920:5: error: 'for' loop initial declarations are only allowed in C99 mode
     for (int i = *next_index + 1; i < sock->jsmn.tokens_found; ++i)

But if I cmake like cmake -DBUILD_NDPI=off -DCMAKE_C_FLAGS="-std=c99" .. got something like

/home/jiamo/diting_ndpid/examples/c-captured/c-captured.c:663:34: error: 'optarg' undeclared (first use in this function)
                 pidfile = strdup(optarg);
                                  ^

I know gcc in centos 7 is too low. Is there some good method can make centos7 work ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.