utoni / ndpid Goto Github PK
View Code? Open in Web Editor NEWTiny nDPI based deep packet inspection daemons / toolkit.
License: GNU General Public License v3.0
Tiny nDPI based deep packet inspection daemons / toolkit.
License: GNU General Public License v3.0
I'm interested in the possibility, for nDPId, to directly send out the JSON-stream of events, via UDP, to a remote host.
I'm thinking to a behaviour much similar to NetFlow/IPFIX, that I'm succesfully using in OpenWRT (at home, inside my WDR4300 wireless router), collecting flows with softflowd and relaying them to a remote location, via UDP. As such, I'm able to off-load/enrich netflow analysis, with no technical constraint. Indeed: at my remote location, I'm enriching received flows with geo-referential data (provided by MaxMind free library ) and pushing them to an opensearch instance.
I'm trying to further enrich my data with high-level protocol information (provided by lib-nDPI), and nDPId
fit perfectly in such a role. The only missing bit is the possibility to stream out flows, directly.
Of course I can run a local "gateway" (fetching from nDPId
and writing to remote location) but this is not easy, as the whole stuff need to be run inside OpenWRT boxes, that are VERY resource-constrained (BTW: libndpi and softflowd are already packaged for OpenWRT) and... I'm lacking C/C++ knowledge :-(
Is this an interesting feature, for the nDPId
project?
BTW: thanks for developing nDPId
While analyzing my incoming UDP-stream, I noticed that sometime (in the order of once every one thousand) my nDPId-rt-analyzer
receive two consecutive end
event or two consecutive idle
event referred to the very same flow_id
This lead my analyzer to complain, as it expect that for every flow_id, it should receive only one of end|idle
event.
I double checked my analyzer, and I bet that it effectively received the events twice, despite the fact that they are identical.
I have no problem getting rid of the spurious event... but probably, this could be of some interest to you.
I'm attaching a ZIP containing the JSON-dump of a selection of 4 distinct flows (id: 7337, 30684, 33023, 32921) where you can clearly see the final double events.
While analyzing flow-event data received by nDPId I noticed that for an OpenVpn connection that I usually launch at startup on my notebook and last for several days... I receive:
NEW
flow event;GUESSED
flow event (OpenVPN, based on UDP/1194);DETECTED
flow event (OpenVPN)UPDATE
event. Currently I count 612 (six hundreds!) UPDATE
, each one sent ~50 seconds.Is it a normal behaviour? Are there some "expiring" long-lasting flows feature?
Of course, I can handle the "long-lasting 'live' flows" on my side, within my analyzer.... But I'm curious if there is something I'm missing regarding nDPId
Thanks!
I just built and installed the OpenWRT package of nDPId
built on commit 36f1786. I upgraded my previous nDPId, as I understood that the JSON format could have been upgraded (as for this PR #1725 on the nDPI project)
Unfortunately, some of the JSON coming out from current release are malformed: they seems to be "truncated" to 1024 chars. As such, if they are longer.. and gets truncated, they actually comes "joined" with next one, generating a malformed unique JSON (due to missing double quotes, commas and, for sure, the final }
.
I'm attaching a JSON-file obtained via a raw netcat
, redirecting the output on a file. As you can see, for every row beginning with a number bigger than 1024... there is a problem.
Can you confirm the problem is on the source side? Or should I have been done something on the openwrt-package-building side?
Thanks,
DV
Hi ,
I assuming you are still actively associated in the ndpi enhancement. Do you think there is a need for DFI (deep flow inspection) along with the existing DPI (where the dissectors mostly checks packet payload patterns or payload length.) to detect application accurately?
I was reading below paper and wants to discuss with you before posting it to ndpi repo issue.
https://reader.elsevier.com/reader/sd/pii/S187770581730276X?token=74B2C8BC7E1E9DEFCC8A8992234ED823EF2A7B8F4BAEA2C547AC049837EEE74362C1D8737D0C18B3CE68F82CA659FDB1&originRegion=eu-west-1&originCreation=20220103053518
In my understanding, if ndpi fails to get info from sni or http etc parsing i.e. upto L5, it goes for pattern matching based on some reverse engineering methods learned from pcap files which may produce false positives in case encrypted traffic. But the paper shows that dissectors made of flow based model gives more accuracy than packet payload based matching. Any comment on this?
Thanks
I need to change name of field "proto_by_ip" field in json log, and to that end i go to
nDPId/libnDPI/src/lib/ndpi_utils.c
in ndpi_serialize_proto function :
ndpi_serialize_string_string(serializer, "protocol_by_ip", ndpi_get_proto_name(ndpi_struct, l7_protocol.protocol_by_ip));
After i change this line I run command make in
nDPId/libnDPI
but nothing changes when I run nDPId
where i am going wrong ?
During analysis of the UDP-stream generated by nDPId
(as for ref), I got following JSON, related to an HTTPS request:
{
"flow_event_id": 7,
"flow_event_name": "detection-update",
"flow_id": 54994,
"flow_state": "info",
"flow_packets_processed": 6,
[...]
"l3_proto": "ip4",
"src_ip": "192.168.0.128",
"dst_ip": "***.***.***.***",
"src_port": 45396,
"dst_port": 443,
"l4_proto": "tcp",
"ndpi": {
"flow_risk": {
"15": {
"risk": "TLS (probably) Not Carrying HTTPS",
"severity": "Low",
"risk_score": { "total": 760, "client": 680, "server": 80 }
}
},
"confidence": { "6": "DPI" },
"proto": "TLS",
"breed": "Safe",
"category": "Web"
},
"tls": {
"version": "TLSv1.2",
"client_requested_server_name": "www.********.com",
"ja3": "398430069e0a8ecfbc8db0778d658d77",
"ja3s": "fbe78c619e7ea20046131294ad087f05",
"unsafe_cipher": 0,
"cipher": "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256",
"tls_supported_versions": "TLSv1.3,TLSv1.2"
}
}
Such a JSON contains following confidence
object:
"confidence": { "6": "DPI" }
In the example JSON schema file included in the nDPId sources, the very same confidence
attribute is declared this way:
"confidence": {
"type": "string",
"enum": [
"0",
"1",
"2",
"3",
"4"
]
}
and it's missing the value "6" is missing.
Schema definition should be updated to include also "6" value, as well as others missing (5?, 7?)
While analyzing flow-event data received by nDPId
I'm having some trouble understanding the gory detail of some JSON attributes.
I'm going to raise some questions, here below, with a temptative answer I guessed from my analysis. Please, check them as well.
Note that I'm also available to put them on a specific FAQ page, once their answer will be defined.
Q1: Where I can get the detailed structure of the Flow-Event JSONs I'll be sent by nDPId
?
A1: A related schema file can be retrieved in the schema subfolder
--
Q2: What about flow_first_seen
, flow_src_last_pkt_time
and flow_dst_last_pkt_time
timestamp attributes? Which timestamp they refer to?
A2: flow_first_seen
is the timestamp registered by nDPId
when it saw the very first packet originating this new
flow. On the contrary, flow_src_last_pkt_time
and flow_dst_last_pkt_time
timestamps, are continuously updated by nDPId
as soon as it saw packets related to that flow. Based on the direction of such a packet (a request from SRC to DST, or a reply from DST to SRC), nDPId
will update the flow_src_last_pkt_time
or the flow_dst_last_pkt_time
, respectively
--
Q3: What about the flow_idle_time
time attribute? Which time it refers to?
A3: ...to be filled...
--
Q4: What about the thread_ts_usec
timestamp attribute? Which timestamp it refers to?
A4: ...to be filled...
--
Q5: What about the midstream
attribute? It seems its value is always 0...
A5: ...to be filled...
--
Q6: as for update
events, While examining a set of flow-events related to the same flow, I noticed:
new
event (as expected)not-detected
event (as it could be possible)update
events (in the order of tens...), that seems to be sent at mostly-regular interval (in the order of tens of seconds).Can you explain when update
events are issued and confirm that the thread_ts_usec
can be considered as the timestamp associated by nDPId
to those events?
This is going to be an important question, expecially in terms of inter-arrival-time analysis of those update
events.
--
First i installed the last version of libnDPI based on Github :
https://github.com/ntop/nDPI/branches/stale
and it's version is 4.6.0 however the required version of libndpi for your software is '>= 4.7.0' how that can happens?
i got the following error :
Package dependency requirement 'libndpi >= 4.7.0' could not be satisfied.
Package 'libndpi' has version '4.6.0', required version is '>= 4.7.0
we have field host_server_name as a field of EVENT_FLOW_DETECTED log. how can we bring it in EVENT_FLOW_END ??
After upgrading nDPId
on my OpenWRT box (built on commit 36f1786 ), I'm registering an unexpected behaviour: right after launching the daemon with:
/usr/sbin/nDPId-testing -i br-lan -c 192.168.0.128:9999 -o max-packets-per-flow-to-send=0
where 192.168.0.128
is my local notebook, properly listening on UDP/9999, I receive a STORM of daemon-events.
In 5 to 10 seconds, I got 16358 JSONs, and only 13 of them where related to "flows". Other 16345 where daemon-related, with ~99% of them "status".
Is this an expected behaviour?
I'm attached the above 16358 JSONs, should you need to check them. Again: they have been received in ~10 seconds (maybe less)
raw_json.zip
I'm trying to build this on FreeBSD and I get whole lot of errors. Can you help me on this ?
Do you have plan support Packet acquisition via PF_RING
I only need some of the events from nDPId not all of them !
All i want is only FLOW_EVENT_END logs in Unix-Domain socket.
As for "Flow STATES" and "Flow EVENTS" (whose description is reported in the README), I'm trying to better understand what exactly they means.
After running for ~24 hours my realtime analyzer receiving the UDP-stream of an nDPId instance running at the border of a small set of VPSs, I got these numbers:
please note that I'm interested ONLY on "flows" tracking.
As you can see, I got 1.735.579 messages ("in"), succesfully processed as JSONs ("ok"), with zero errors ("err").
From those JSONs the analyzer skipped the 2111 JSONs NOT related to flows, and focused to the others 1.733.468.
From those 1.733.468 flow JSONs, it extracted "flow_state" and "flow_event_name", combining them in a string and counting related groups.
With a show counter
I got the numbers of occurrencies of those strings and, as you can see, I got:
whose sum is exactly 1.733.468
I'm trying to figure out the state-diagram used by nDPI, to understand exactly what's the event (and the state) that signal the termination of the activities performed by nDPI. I guess it's "finished/end".... but a "info/end" makes me in trouble :-(
I scratched down following diagram:
Could you be so kind to explain me WHICH EVENT I should focus, to let me know when exactly nDPI will finish processing flow... so that I can expect that no other events, related to that flow, will be received by my analizer?
At the moment, I'm keeping track of "everything", with an always-increasing memory-map of EVERY flows. What I want to achieve is it EXTRACT "completed flows" from such a table and forward them to next processing stage
Sorry if this sounds a bit cumbersome: I understand I'm not exactly clear with this request.... :-(
After working with ndpid i discovered that some logs are missing and the reason is some threads can not connect to nDPIsrvd Collector at my unix socket.
The error is as following :
nDPId [error]: Thread 10: Could not connect to nDPIsrvd Collector at /tmp/listen.socket, will try again later. Error: Connection refused
nDPId [error]: Thread 11: Could not connect to nDPIsrvd Collector at /tmp/listen.socket, will try again later. Error: Connection refused
nDPId [error]: Thread 15: Could not connect to nDPIsrvd Collector at /tmp/listen.socket, will try again later. Error: Connection refused
nDPId [error]: Thread 14: Could not connect to nDPIsrvd Collector at /tmp/listen.socket, will try again later. Error: Connection refused
nDPId [error]: Thread 13: Could not connect to nDPIsrvd Collector at /tmp/listen.socket, will try again later. Error: Connection refused
nDPId [error]: Thread 12: Could not connect to nDPIsrvd Collector at /tmp/listen.socket, will try again later. Error: Connection refused
nDPId [error]: Thread 9: Could not connect to nDPIsrvd Collector at /tmp/listen.socket, will try again later. Error: Connection refused
The command i use to make a socket :
nc -lkU /tmp/listen.socket
can you help me fix this
Originally posted by @fateme81 in #27 (comment)
The current collectd example that can be used together with the exec
plugin prints bogus information about flows.
This issue is caused by the memset in c-collectd.c:512
. The desired behavior should be that the executable does not only print statistics between collectd time intervals and resets them afterwards no matter if the flow is still active (e.g. did neither end nor timed out).
Instead, it should also process idle
and end
events to validate if a flow is still active and modify the statistics printed to stdout in a proper way.
In my case, I want to one event of one flow. I try to do such thing
static void free_workflow(struct nDPId_workflow ** const workflow);
static void serialize_and_send(struct nDPId_reader_thread * const reader_thread);
static void jsonize_flow_event(struct nDPId_reader_thread * const reader_thread,
- struct nDPId_flow_extended * const flow_ext,
+ struct nDPId_flow * const flow,
enum flow_event event);
static void jsonize_flow_detection_event(struct nDPId_reader_thread * const reader_thread,
struct nDPId_flow * const flow,
@@ -1788,11 +1788,11 @@ static void process_idle_flow(struct nDPId_reader_thread * const reader_thread,
if (flow->flow_extended.flow_basic.tcp_fin_rst_seen != 0)
{
- jsonize_flow_event(reader_thread, &flow->flow_extended, FLOW_EVENT_END);
+ jsonize_flow_event(reader_thread, flow, FLOW_EVENT_END);
}
else
{
- jsonize_flow_event(reader_thread, &flow->flow_extended, FLOW_EVENT_IDLE);
+ jsonize_flow_event(reader_thread, flow, FLOW_EVENT_IDLE);
}
break;
}
@@ -1843,11 +1843,11 @@ static void process_idle_flow(struct nDPId_reader_thread * const reader_thread,
}
if (flow->flow_extended.flow_basic.tcp_fin_rst_seen != 0)
{
- jsonize_flow_event(reader_thread, &flow->flow_extended, FLOW_EVENT_END);
+ jsonize_flow_event(reader_thread, flow, FLOW_EVENT_END);
}
else
{
- jsonize_flow_event(reader_thread, &flow->flow_extended, FLOW_EVENT_IDLE);
+ jsonize_flow_event(reader_thread, flow, FLOW_EVENT_IDLE);
}
break;
}
@@ -1897,11 +1897,11 @@ static void ndpi_flow_update_scan_walker(void const * const A, ndpi_VISIT which,
case FS_INFO:
{
struct nDPId_flow_extended * const flow_ext = (struct nDPId_flow_extended *)flow_basic;
-
+ struct nDPId_flow * const flow = (struct nDPId_flow *)flow_basic;
if (is_flow_update_required(workflow, flow_ext) != 0)
{
workflow->total_flow_updates++;
- jsonize_flow_event(reader_thread, flow_ext, FLOW_EVENT_UPDATE);
+ jsonize_flow_event(reader_thread, flow, FLOW_EVENT_UPDATE);
flow_ext->last_flow_update = workflow->last_thread_time;
}
break;
@@ -2644,11 +2644,12 @@ static void jsonize_packet_event(struct nDPId_reader_thread * const reader_threa
/* I decided against ndpi_flow2json as it does not fulfill my needs. */
static void jsonize_flow_event(struct nDPId_reader_thread * const reader_thread,
- struct nDPId_flow_extended * const flow_ext,
+ struct nDPId_flow * const flow,
enum flow_event event)
{
struct nDPId_workflow * const workflow = reader_thread->workflow;
char const ev[] = "flow_event_name";
+ struct nDPId_flow_extended * const flow_ext = &flow->flow_extended;
ndpi_serialize_string_int32(&workflow->ndpi_serializer, "flow_event_id", event);
if (event > FLOW_EVENT_INVALID && event < FLOW_EVENT_COUNT)
@@ -4086,7 +4087,7 @@ static void ndpi_process_packet(uint8_t * const args,
flow_to_process->flow_extended.last_flow_update = workflow->last_thread_time;
flow_to_process->flow_extended.max_l4_payload_len[direction] = l4_payload_len;
flow_to_process->flow_extended.min_l4_payload_len[direction] = l4_payload_len;
- jsonize_flow_event(reader_thread, &flow_to_process->flow_extended, FLOW_EVENT_NEW);
+ jsonize_flow_event(reader_thread, flow_to_process, FLOW_EVENT_NEW);
}
if (nDPId_options.enable_data_analysis != 0 && flow_to_process->flow_extended.flow_analysis != NULL &&
@@ -4114,7 +4115,7 @@ static void ndpi_process_packet(uint8_t * const args,
if (total_flow_packets == nDPId_options.max_packets_per_flow_to_analyse)
{
- jsonize_flow_event(reader_thread, &flow_to_process->flow_extended, FLOW_EVENT_ANALYSE);
+ jsonize_flow_event(reader_thread, flow_to_process, FLOW_EVENT_ANALYSE);
free_analysis_data(&flow_to_process->flow_extended);
}
}
first try to using nDPId_flow
instead nDPId_flow_extended
Then add the ndpi_dpi2json
in jsonize_flow_event
+ if (event == FLOW_EVENT_END){
+ if (ndpi_dpi2json(workflow->ndpi_struct,
+ &flow->info.detection_data->flow,
+ flow->flow_extended.detected_l7_protocol,
+ &workflow->ndpi_serializer) != 0)
+ {
+ logger(1,
+ "[%8llu, %4llu] ndpi_dpi2json failed for detected/detection-update flow",
+ workflow->packets_captured,
+ flow->flow_extended.flow_id);
+ }
+ }
+
It will got segment fault. Even I comment free_detection_data
. It still got segment.
I don't want to add such if (event == FLOW_EVENT_END && flow->info.detection_completed == 1)
line. Because I want the end
event can contains the DPI info.
I am wondering the data struct need by ndpi_dpi2json
was freed by which function? freed by nDPId or freed by nDPI ?
What I want to achieve is: When the segment solved, I will do not serialize other event.
Do you have some suggestion?
The command i use :
./nDPId -u root -g root -l -c 127.0.0.1:7000
And the result is :
nDPId: version 1.5-199-g29904cfb
nDPI version: 4.7.0-4260-1f693c3f
API version: 8445
pcap version: 1.10.4
gcrypt version: 1.8.6internal
nDPId: Capturing packets from default device: vmx0
nDPId: vmx0 IPv4 address netmask subnet: 192.168.162.61 255.255.255.0 192.168.162.0
nDPId [error]: Could not get netmask for pcap device vmx0: No such file or directory
and the result of ifconfig :
vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: port1
options=8000b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM>
ether 00:50:56:96:4c:e6
inet6 fe80::250:56ff:fe96:4ce6%vmx0 prefixlen 64 scopeid 0x1
inet 192.168.162.61 netmask 0xffffff00 broadcast 192.168.162.255
media: Ethernet autoselect
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
enc0: flags=0<> metric 0 mtu 1536
groups: enc
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
inet 127.0.0.1 netmask 0xff000000
groups: lo
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
pflog0: flags=100<PROMISC> metric 0 mtu 33160
groups: pflog
pfsync0: flags=0<> metric 0 mtu 1500
groups: pfsync
can you help me fix this ?
with cmake installed, nDPId seem can't support centos 7 default build system while libndpi is ok.
have such error
/home/jiamo/diting_ndpid/dependencies/nDPIsrvd.h: In function 'nDPIsrvd_get_next_token':
/home/jiamo/diting_ndpid/dependencies/nDPIsrvd.h:920:5: error: 'for' loop initial declarations are only allowed in C99 mode
for (int i = *next_index + 1; i < sock->jsmn.tokens_found; ++i)
But if I cmake like cmake -DBUILD_NDPI=off -DCMAKE_C_FLAGS="-std=c99" ..
got something like
/home/jiamo/diting_ndpid/examples/c-captured/c-captured.c:663:34: error: 'optarg' undeclared (first use in this function)
pidfile = strdup(optarg);
^
I know gcc in centos 7 is too low. Is there some good method can make centos7 work ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.