There is an extension mechanism for entities, in order not to duplicate field definiti

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Extending data dictionaries? about ossem HOT 3 OPEN

otrf commented on July 27, 2024

Extending data dictionaries?

from ossem.

Comments (3)

hxnoyd commented on July 27, 2024 1

Hi @nicolasreich.

Thanks for the detailed explanation, it is now more clear what you mean by 'extending', in a nutshell: deconstruct data dictionaries depending on the field prevalence, to avoid duplicates, and keep the data dictionary YAML as clean as possible.

I see the benefit of such approach for events in the same log source (keep it simple/reduce duplicate), but that would mean an increase in the number of data dictionaries, since we would need to create the 'common fields' data dictionaries (i.e. src_ip, dest_ip, etc). On the one hand we would have a schema with low duplicate fields and, on the other hand, we would have more YAML data dictionaries to maintain.

The field name duplication have been raised multiple times in the past, but we always opted by keeping the data dictionaries as close as possible to the original events, so that the community could customize them as needed. The main reason for this is to keep the data dictionary atomicity, an absolutely independent object, or the source of truth in a single document if you like. By doing so we enable the community to model the data dictionaries as they like, to their own needs (i.e. logstash pipelines).

Regardless, I think your suggestion is aligned with our vision for the improvement of data dictionaries, possibly with the creation of a separate dictionary that would provide a first layer of abstraction for data dictionaries, where the community would be able to better map events with entities, and/or the detection data model. This would allow us to keep the source of truth, at the expense of maintaining another dictionary with modeled/standardized events.

Unfortunately the last few months have been insanely busy, and we haven't had the time to work on a PoC for this... but it is on the roadmap :)

from ossem.

hxnoyd commented on July 27, 2024

Hi @nicolasreich. First of all, sorry for the late reply.

So far we have developed data dictionaries as independent document, as close as possible to the raw events produced by the sensor. The main goal is that you will always be able to drill down (i.e. from the data model) to the source of truth of an event and its fields. One of the tradeoffs is, as you suggest, duplicate information, that becomes apparent when you consume multiple events in the same sensor.

We are, however, planning to improve Data Dictionaries, in order to deal with situations were event fields can have different definitions depending on the event type, or in situations where a field contains a nested JSON,list,etc, that we could use to extend the fieldset of the event.

Regardless, I would interested in further exploring your use case.

from ossem.

nicolasreich commented on July 27, 2024

Hi @hxnoyd. No worries, it was the holidays for everyone.

The rationale for this question was Suricata Eve JSON logs, where you have common fields, then nested fields for specific data. So for any alert, you get common fields, like source and destination IP addresses, as well as an alert section, and a different section depending on the protocol that triggered the alert.

So for a alert triggered by a DNS request, you would get something like:

src_ip: ...,
dest_ip: ...,
...
other common fields
...
alert: { ... alert fields ... },
dns: { ... dns fields ... }

While for an alert triggered by an HTTP request:

src_ip: ...,
dest_ip: ...,
...
other common fields
...
alert: { ... alert fields ... },
http: { ... http fields ... }

So the common fields are present in every event; the alert object is present in every alert; and then, depending on the type of the underlying traffic, there might be other objects.

It's obviously possible to have a data dictionary for each alert type, each containing the common fields and the alert fields; but it means a lot of duplication, causing a lot of potential mistakes, and what seems like unnecessary verbiage.

I think it would make sense to be able to extend a data dictionary, much like it's possible for entities. The rendered markdown version of the Data Dictionary would still be an independent document containing all the data.

from ossem.

Extending data dictionaries? about ossem HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent