Comments (3)
Hi @nicolasreich.
Thanks for the detailed explanation, it is now more clear what you mean by 'extending', in a nutshell: deconstruct data dictionaries depending on the field prevalence, to avoid duplicates, and keep the data dictionary YAML as clean as possible.
I see the benefit of such approach for events in the same log source (keep it simple/reduce duplicate), but that would mean an increase in the number of data dictionaries, since we would need to create the 'common fields' data dictionaries (i.e. src_ip, dest_ip, etc). On the one hand we would have a schema with low duplicate fields and, on the other hand, we would have more YAML data dictionaries to maintain.
The field name duplication have been raised multiple times in the past, but we always opted by keeping the data dictionaries as close as possible to the original events, so that the community could customize them as needed. The main reason for this is to keep the data dictionary atomicity, an absolutely independent object, or the source of truth in a single document if you like. By doing so we enable the community to model the data dictionaries as they like, to their own needs (i.e. logstash pipelines).
Regardless, I think your suggestion is aligned with our vision for the improvement of data dictionaries, possibly with the creation of a separate dictionary that would provide a first layer of abstraction for data dictionaries, where the community would be able to better map events with entities, and/or the detection data model. This would allow us to keep the source of truth, at the expense of maintaining another dictionary with modeled/standardized events.
Unfortunately the last few months have been insanely busy, and we haven't had the time to work on a PoC for this... but it is on the roadmap :)
from ossem.
Hi @nicolasreich. First of all, sorry for the late reply.
So far we have developed data dictionaries as independent document, as close as possible to the raw events produced by the sensor. The main goal is that you will always be able to drill down (i.e. from the data model) to the source of truth of an event and its fields. One of the tradeoffs is, as you suggest, duplicate information, that becomes apparent when you consume multiple events in the same sensor.
We are, however, planning to improve Data Dictionaries, in order to deal with situations were event fields can have different definitions depending on the event type, or in situations where a field contains a nested JSON,list,etc, that we could use to extend the fieldset of the event.
Regardless, I would interested in further exploring your use case.
from ossem.
Hi @hxnoyd. No worries, it was the holidays for everyone.
The rationale for this question was Suricata Eve JSON logs, where you have common fields, then nested fields for specific data. So for any alert, you get common fields, like source and destination IP addresses, as well as an alert
section, and a different section depending on the protocol that triggered the alert.
So for a alert triggered by a DNS request, you would get something like:
src_ip: ...,
dest_ip: ...,
...
other common fields
...
alert: { ... alert fields ... },
dns: { ... dns fields ... }
While for an alert triggered by an HTTP request:
src_ip: ...,
dest_ip: ...,
...
other common fields
...
alert: { ... alert fields ... },
http: { ... http fields ... }
So the common fields are present in every event; the alert
object is present in every alert; and then, depending on the type of the underlying traffic, there might be other objects.
It's obviously possible to have a data dictionary for each alert type, each containing the common fields and the alert fields; but it means a lot of duplication, causing a lot of potential mistakes, and what seems like unnecessary verbiage.
I think it would make sense to be able to extend a data dictionary, much like it's possible for entities. The rendered markdown version of the Data Dictionary would still be an independent document containing all the data.
from ossem.
Related Issues (20)
- Question: Defining Data Models as Ontologies HOT 2
- Creating a Sub-Repo for Data Dictionaries HOT 1
- Validating content within OSSEM sub-repos HOT 1
- Picking initial entities for reviewing OSEEM Ontology HOT 1
- Sysmon data dictionaries compliant with entities HOT 2
- a few new fields for models and an entity
- Typographical Error HOT 1
- Issue on page /cdm/entities/device.html
- Update OSSEM CDM source, destination or target guideline
- `event_category_type` is duplicated (?)
- Remove column 'field name' from CIM HOT 2
- Windows Security logs, Computer Account Management auditing fields mismatch between events HOT 1
- CDM vs data dictionaries - what's the "source of truth" in cases of mismatch? HOT 2
- Windows Security logs, fields mismatch for Object Access HOT 1
- Issue on page /cdm/entities/destination_nat.html HOT 2
- Entities for scheduled tasks and services? HOT 2
- Data dictionaries for the cowrie honeypot HOT 2
- WMI fields mismatch between sysmon events and built in wmi events
- Upated sysmon parser script to fix issue reported on Sentinel Github HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ossem.