Coder Social home page Coder Social logo

scribery / aushape Goto Github PK

View Code? Open in Web Editor NEW
41.0 9.0 11.0 321 KB

A library and a tool for converting audit logs to XML and JSON

Home Page: https://scribery.github.io/aushape/

License: GNU Lesser General Public License v2.1

Makefile 1.72% M4 1.27% C 96.52% C++ 0.49%
audit auditd rsyslog json xml convert stream log file

aushape's Introduction

Aushape

Aushape is a tool and a library for converting Linux audit log messages to JSON and XML, allowing both single-shot and streaming conversion.

At the moment Aushape can output to stdout, a file, or syslog(3). The latter outputs one document or event per message.

NOTE: Aushape is in early development stage and anything about its interfaces and outputs can change. Use at your own risk.

Schemas

Aushape output document schemas are still in flux, but the main idea is to aggregate input records belonging to the same event into single output event object/element, while keeping the naming and the structure as close to the original audit log as possible.

A truncated JSON example:

[
    {
        "serial"    : 123,
        "time"      : "2016-01-03T02:37:51.394+02:00",
        "host"      : "auditdtest.a1959.org",
        "text"   : [
            "node=auditdtest.a1959.org type=SYSCALL ...",
            "node=auditdtest.a1959.org type=PROCTITLE ...",
            ...
        ],
        "data"   : {
            "syscall"   : {
                "syscall"   : ["rt_sigaction","13"],
                "success"   : ["yes"],
                "exit"      : ["0"],
                ...
            },
            "proctitle" : {
                "proctitle" : ["bash","\"bash\""]
            },
            ...
        }
    },
    ...
]

A truncated XML example:

<?xml version="1.0" encoding="UTF-8"?>
<log>
    <event serial="194433" time="2016-01-03T02:37:51.394+02:00" host="auditdtest.a1959.org">
        <text>
            <line>node=auditdtest.a1959.org type=SYSCALL ...</line>
            <line>node=auditdtest.a1959.org type=PROCTITLE ...</line>
            ...
        </text>
        <data>
            <syscall>
                <syscall i="rt_sigaction" r="13"/>
                <success i="yes"/>
                <exit i="0"/>
                ...
            </syscall>
            <proctitle>
                <proctitle i="bash" r="&quot;bash&quot;"/>
            </proctitle>
            ...
        </data>
    </event>
    ...
</log>

There is a number of challenges, the main one being both the Linux kernel and the Auditd code defining record structure and sometimes changing it from version to version, without an official specification being there. Yet, we have developed draft schemas for both JSON and XML, and will continue on improving them in collaboration with Auditd developers.

We encourage you to simply try running aushape on your logs to see what the output structure is like.

Dependencies

Aushape uses the Auparse library (a part of the Auditd package) to parse audit logs. The development version of this library needs to be installed before building Aushape. It is available in "audit-libs-devel" package on Fedora and RHEL, and "libauparse-dev" or "libaudit-dev" package on Debian-based systems.

If you're installing an RPM package, the package manager would take care of dependencies for you.

If you're building from a release tarball, then you can install the dependencies as follows.

On RPM-based systems:

sudo yum install -y gcc make audit-libs-devel

On Debian-based systems:

sudo apt-get install -y gcc make '^libau(dit|parse)-dev$'

If you're building from the Git source tree, then you can install the additional dependencies as follows.

On RPM-based systems:

sudo yum install -y autoconf automake libtool

On Debian-based systems:

sudo apt-get install -y autoconf automake libtool pkg-config

Building

If you'd like to build Aushape from the Git source tree, you need to first generate the build system files:

autoreconf -i -f

After that, or if you're building from a release tarball, you need to follow the usual configure & make approach:

./configure --prefix=/usr --sysconfdir=/etc && make

Installing

You can install Aushape with the usual make install:

sudo make install

Usage

Single-shot

For one-shot conversions simply use the aushape program. E.g. to convert an audit.log to the default JSON:

aushape audit.log

or explicitly:

aushape -l json audit.log

To convert to XML:

aushape -l xml audit.log

To write output to a file:

aushape audit.log > audit.json

or:

aushape -f audit.json audit.log

Live

You can also use Aushape as an Auditd's Audispd plugin to convert messages as they are generated by the system. However, since Audispd doesn't support supplying more than two (unquoted) command-line arguments to plugins, you'll have to write a little wrapper script to configure Aushape appropriately and specify that to Audispd as the program to run.

If you would like your audit events converted to JSON and sent to syslog, one event per message, you can write this wrapper and put it, for example, into /usr/bin/aushape-audispd-plugin:

#!/bin/sh
exec /usr/bin/aushape -l json --events-per-doc=none --fold=all -o syslog

Don't forget to make it executable.

If you'd like to also log original audit messages, add --with-text option. If you'd like to limit the logged event message sizes, add --max-event-size=SIZE option, e.g. --max-event-size=4k for a four-kilobyte limit.

You can then add it to Audispd configuration by putting this into /etc/audisp/plugins.d/aushape.conf:

active = yes
direction = out
path = /usr/bin/aushape-audispd-plugin
type = always
format = string

After Auditd is restarted, the events should be logged to syslog with "authpriv" facility and "info" priority (you can change these with more command-line options to aushape). Beside the Systemd's journal, if you also use rsyslog with default configuration, they would end up in /var/log/secure on Fedora and RHEL, and in /var/log/auth.log on Debian-based systems.

NOTE: Some audit events can be large. For example the execve events can be in the order of megabytes for very long command lines. Most logging servers will drop long messages silently. Make sure your audit configuration only logs events which are not too long, limit the maximum logged event size to have events cropped with the --max-event-size=SIZE option, and/or configure your logging server to accept longer messages.

Forwarding to Elasticsearch

Once aushape messages hit the syslog(3) interface, whether it is provided by journald, or other logging service, they can be forwarded to Elasticsearch for storage and analysis. Several logging services are available which can do that, including Logstash, Fluentd, and rsyslog. Since rsyslog is included in most Linux distros, we'll use it as an example.

First of all, increase the maximum message size rsyslog can handle to be a bit more than the message sizes you expect to see from aushape. If you decided that 16kB is enough, then put this before any network setup in rsyslog.conf (the top of the file is safest):

$MaxMessageSize 16k

Then load the Elasticsearch output module:

$ModLoad omelasticsearch
Filtering out aushape messages

Before we can feed aushape messages to Elasticsearch we need to strip them of syslog data to get pure JSON, using a template:

template(name="aushape" type="list") {
    constant(value="{")
    property(name="msg"
             regex.expression="{\\(.*\\)"
             regex.submatch="1")
    constant(value="\n")
}

Next you'll likely need to filter out aushape messages to put them into a separate Elasticsearch index. You can set up an action condition to filter by the logging program name. Aushape logs with aushape program name.

However, since any program can log with any program name, that is prone to log message spoofing. If you'd like to protect against that, you'll need to also filter by something which is harder to spoof, like the UID of the logging program. The UID will be zero for aushape running under auditd and audispd. However filtering needs to be done differently, depending on where rsyslog receives aushape messages from.

If it serves the rsyslog(3) socket itself, then you'll need to make sure the corresponding imuxsock module has its Annotate and ParseTrusted options enabled. E.g. like this:

module(load="imuxsock" SysSock.Annotate="on" SysSock.ParseTrusted="on")

Then you can use this condition in your filtering action:

if $!uid == "0" and $programname == "aushape" then {
    # ... actions ...
}

If rsyslog receives aushape messages from journald, then no extra setup is needed, and the filtering condition can be this:

if $!_UID == "0" and $programname == "aushape" then {
    # ... actions ...
}

Note that the above would only work with rsyslog v8.17.0 and later, due to an issue preventing it from parsing variable names starting with underscore.

Sending the messages

Once your rule condition is established, you can add the actual action sending aushape messages to Elasticsearch:

action(name="aushape-elasticsearch"
       type="omelasticsearch"
       server="localhost"
       searchIndex="aushape-rsyslog"
       searchType="aushape"
       bulkmode="on"
       template="aushape")

The action above would send messages formatted with the aushape template, described above, to an Elasticsearch server running on localhost and default port, and would put them into index aushape-rsyslog with type aushape, using the bulk interface.

Add the following action if you want to also send aushape messages to a dedicated file for debugging:

action(name="aushape-file"
       type="omfile"
       file="/var/log/aushape.log"
       fileCreateMode="0600"
       template="aushape")

Further, if you don't want aushape messages delivered anywhere else you can add the discard action (~) after both of those:

~

If you'd like to exclude aushape messages from any other logs remember to put its rule before any other rules in rsyslog.conf.

Here is a complete example of a rule matching messages arriving from aushape, delivered by journald. It sends them to Elasticsearch running on localhost with default port, puts them into aushape-rsyslog index with type aushape, using bulk interface, stores them in /var/log/aushape.log file, and then stops processing, not letting them get anywhere else.

if $!_UID == "0" and $programname == "aushape" then {
	action(name="aushape-elasticsearch"
		   type="omelasticsearch"
		   server="localhost"
		   searchIndex="aushape-rsyslog"
		   searchType="aushape"
		   bulkmode="on"
		   template="aushape")
	action(name="aushape-file"
		   type="omfile"
		   file="/var/log/aushape.log"
		   fileCreateMode="0600"
		   template="aushape")
	~
}

Other

See the aushape --help output and experiment!

Contributing

Feel free to open issues, submit pull requests and write to the author directly. All contributions are welcome!

aushape's People

Contributors

git001 avatar spbnick avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aushape's Issues

Implement reporting conversion errors in-band

Various errors can occur during conversion, such as unknown records/fields, invalid field/record format, unexpected duplicated record types, etc.

Since aushape is supposed to run reliably under auditd, and can't simply stop processing the log, it needs to handle and report those errors somewhere.

Output events which failed to parse as a special type of event, containing the raw records and description of the failure.

Output raw representation as array of lines

At the moment raw representation is concatenated together with a newline at the end, which is hard to read when viewing human-oriented output.

Instead output each raw line in its own (array) element.

Ignore EOE events

Ignore events consisting of a single EOE records, which are produced by auparse sometimes.

Differentiate between continuous and discrete outputs

Make converter differentiate between and act differently for continuous and discrete outputs. E.g. a file and a syslog output. Continuous outputs can receive data in whatever pieces, discrete outputs can receive only complete documents.

Don't assume there are only two formats

Make sure the conv code never assumes that there are only two formats possible and doesn't do something like:

if (format == AUSHAPE_FORMAT_XML) {
    /* Format is XML */
} else {
    /* *Assume* format is JSON */
}

Instead do this:

if (format == AUSHAPE_FORMAT_XML) {
    /* Format is XML */
} else if (format == AUSHAPE_FORMAT_JSON) {
    /* Format *is* JSON */
}

This way in case another format is added, there won't be a possibility of
output in mixed format.

Consider improving formatting code structure

Consider improving formatting code structure. E.g. make an entity output code not care about entity separators, let the invoking code deal with that. Look for other logic failures.

Handle repeated record types

Since JSON can't represent duplicate records in objects, and object arrays are hard to use in ElasticSearch, figure out what to do with repeated records of the same type in one event.

At the moment repeated execve records are stitched together. There are still other repeated record types: AVC (in permissive mode), PATH, and OBJ_PID (if signal is sent to multiple processes), at the least.

One option is to aggregate them, similarly to execve, but more complicated records would still have a problem of array of objects and ElasticSearch flattening.

Another option is to multiply events with repeated records, outputting each event with a single record from the sequence.

Third option is to simply output records in an array, but this will be hit hardest by ElasticSearch array flattening, and will be hard to access.

Implement support for delivery to ElasticSearch

To be able to decide on the specific JSON schema to use, implement the necessary code to support log delivery to ElasticSearch. This includes running under audispd and accepting input from it #14, outputting one message per line #5, and logging to syslog(3) #7.

Consider having separate executables for streaming and converting

At the moment aushape has a single executable, which can be used for both streaming audit log to syslog (and possibly other targets) and doing single-shot conversion. This results in a somewhat complicated interface, which might be confusing and difficult to understand for new users.

Consider making two separate programs using the same library: one for single-shot conversion, another for streaming.

The benefits can be simpler interface and clearer separation of purpose. The downside can be either inability to stream an already saved file, or having the interface complexity of the streaming program to be about the same.

Ignore or warn about event not being trimmed to the required maximum

As events can be sized arbitrarily by users specifying arbitrary indent sizes and long hostnames, it is not possible to guarantee a minimum event size (unless we calculate that based on other settings, which would be complicated to implement and use).

Therefore don't fail the assertion on failing to trim, but instead produce a warning somewhere or just ignore it.

Limit event size

Provide an option to limit event size. Events exceeding the size can be replaced with an event with a special attribute saying event was truncated. This can be a good start. Later adaptive truncation can be implemented, such as truncating some records, perhaps with a separate record size limit, or truncating execve record argument list, also with a separate limit.

Move raw record output to the event level

Since one input record can affect several output records, put the raw records directly under the event layer. This has a nice benefit of removing the need to have "fields" container in JSON, and also simplifying record collectors.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.