scribery / aushape Goto Github PK

View Code? Open in Web Editor NEW

41.0 9.0 11.0 321 KB

A library and a tool for converting audit logs to XML and JSON

Home Page: https://scribery.github.io/aushape/

License: GNU Lesser General Public License v2.1

Makefile 1.72% M4 1.27% C 96.52% C++ 0.49%

audit auditd rsyslog json xml convert stream log file

aushape's Introduction

Aushape

Aushape is a tool and a library for converting Linux audit log messages to JSON and XML, allowing both single-shot and streaming conversion.

At the moment Aushape can output to stdout, a file, or syslog(3). The latter outputs one document or event per message.

NOTE: Aushape is in early development stage and anything about its interfaces and outputs can change. Use at your own risk.

Schemas

Aushape output document schemas are still in flux, but the main idea is to aggregate input records belonging to the same event into single output event object/element, while keeping the naming and the structure as close to the original audit log as possible.

A truncated JSON example:

[
    {
        "serial"    : 123,
        "time"      : "2016-01-03T02:37:51.394+02:00",
        "host"      : "auditdtest.a1959.org",
        "text"   : [
            "node=auditdtest.a1959.org type=SYSCALL ...",
            "node=auditdtest.a1959.org type=PROCTITLE ...",
            ...
        ],
        "data"   : {
            "syscall"   : {
                "syscall"   : ["rt_sigaction","13"],
                "success"   : ["yes"],
                "exit"      : ["0"],
                ...
            },
            "proctitle" : {
                "proctitle" : ["bash","\"bash\""]
            },
            ...
        }
    },
    ...
]

A truncated XML example:

<?xml version="1.0" encoding="UTF-8"?>
<log>
    <event serial="194433" time="2016-01-03T02:37:51.394+02:00" host="auditdtest.a1959.org">
        <text>
            <line>node=auditdtest.a1959.org type=SYSCALL ...</line>
            <line>node=auditdtest.a1959.org type=PROCTITLE ...</line>
            ...
        </text>
        <data>
            <syscall>
                <syscall i="rt_sigaction" r="13"/>
                <success i="yes"/>
                <exit i="0"/>
                ...
            </syscall>
            <proctitle>
                <proctitle i="bash" r="&quot;bash&quot;"/>
            </proctitle>
            ...
        </data>
    </event>
    ...
</log>

There is a number of challenges, the main one being both the Linux kernel and the Auditd code defining record structure and sometimes changing it from version to version, without an official specification being there. Yet, we have developed draft schemas for both JSON and XML, and will continue on improving them in collaboration with Auditd developers.

We encourage you to simply try running aushape on your logs to see what the output structure is like.

Dependencies

Aushape uses the Auparse library (a part of the Auditd package) to parse audit logs. The development version of this library needs to be installed before building Aushape. It is available in "audit-libs-devel" package on Fedora and RHEL, and "libauparse-dev" or "libaudit-dev" package on Debian-based systems.

If you're installing an RPM package, the package manager would take care of dependencies for you.

If you're building from a release tarball, then you can install the dependencies as follows.

On RPM-based systems:

sudo yum install -y gcc make audit-libs-devel

On Debian-based systems:

sudo apt-get install -y gcc make '^libau(dit|parse)-dev$'

If you're building from the Git source tree, then you can install the additional dependencies as follows.

On RPM-based systems:

sudo yum install -y autoconf automake libtool

On Debian-based systems:

sudo apt-get install -y autoconf automake libtool pkg-config

Building

If you'd like to build Aushape from the Git source tree, you need to first generate the build system files:

autoreconf -i -f

After that, or if you're building from a release tarball, you need to follow the usual configure & make approach:

./configure --prefix=/usr --sysconfdir=/etc && make

Installing

You can install Aushape with the usual make install:

sudo make install

Usage

Single-shot

For one-shot conversions simply use the aushape program. E.g. to convert an audit.log to the default JSON:

aushape audit.log

or explicitly:

aushape -l json audit.log

To convert to XML:

aushape -l xml audit.log

To write output to a file:

aushape audit.log > audit.json

or:

aushape -f audit.json audit.log

Live

You can also use Aushape as an Auditd's Audispd plugin to convert messages as they are generated by the system. However, since Audispd doesn't support supplying more than two (unquoted) command-line arguments to plugins, you'll have to write a little wrapper script to configure Aushape appropriately and specify that to Audispd as the program to run.

If you would like your audit events converted to JSON and sent to syslog, one event per message, you can write this wrapper and put it, for example, into /usr/bin/aushape-audispd-plugin:

#!/bin/sh
exec /usr/bin/aushape -l json --events-per-doc=none --fold=all -o syslog

Don't forget to make it executable.

If you'd like to also log original audit messages, add --with-text option. If you'd like to limit the logged event message sizes, add --max-event-size=SIZE option, e.g. --max-event-size=4k for a four-kilobyte limit.

You can then add it to Audispd configuration by putting this into /etc/audisp/plugins.d/aushape.conf:

active = yes
direction = out
path = /usr/bin/aushape-audispd-plugin
type = always
format = string

After Auditd is restarted, the events should be logged to syslog with "authpriv" facility and "info" priority (you can change these with more command-line options to aushape). Beside the Systemd's journal, if you also use rsyslog with default configuration, they would end up in /var/log/secure on Fedora and RHEL, and in /var/log/auth.log on Debian-based systems.

NOTE: Some audit events can be large. For example the execve events can be in the order of megabytes for very long command lines. Most logging servers will drop long messages silently. Make sure your audit configuration only logs events which are not too long, limit the maximum logged event size to have events cropped with the --max-event-size=SIZE option, and/or configure your logging server to accept longer messages.

Forwarding to Elasticsearch

Once aushape messages hit the syslog(3) interface, whether it is provided by journald, or other logging service, they can be forwarded to Elasticsearch for storage and analysis. Several logging services are available which can do that, including Logstash, Fluentd, and rsyslog. Since rsyslog is included in most Linux distros, we'll use it as an example.

First of all, increase the maximum message size rsyslog can handle to be a bit more than the message sizes you expect to see from aushape. If you decided that 16kB is enough, then put this before any network setup in rsyslog.conf (the top of the file is safest):

$MaxMessageSize 16k

Then load the Elasticsearch output module:

$ModLoad omelasticsearch

Filtering out aushape messages

Before we can feed aushape messages to Elasticsearch we need to strip them of syslog data to get pure JSON, using a template:

template(name="aushape" type="list") {
    constant(value="{")
    property(name="msg"
             regex.expression="{\\(.*\\)"
             regex.submatch="1")
    constant(value="\n")
}

Next you'll likely need to filter out aushape messages to put them into a separate Elasticsearch index. You can set up an action condition to filter by the logging program name. Aushape logs with aushape program name.

However, since any program can log with any program name, that is prone to log message spoofing. If you'd like to protect against that, you'll need to also filter by something which is harder to spoof, like the UID of the logging program. The UID will be zero for aushape running under auditd and audispd. However filtering needs to be done differently, depending on where rsyslog receives aushape messages from.

If it serves the rsyslog(3) socket itself, then you'll need to make sure the corresponding imuxsock module has its Annotate and ParseTrusted options enabled. E.g. like this:

module(load="imuxsock" SysSock.Annotate="on" SysSock.ParseTrusted="on")

Then you can use this condition in your filtering action:

if $!uid == "0" and $programname == "aushape" then {
    # ... actions ...
}

If rsyslog receives aushape messages from journald, then no extra setup is needed, and the filtering condition can be this:

if $!_UID == "0" and $programname == "aushape" then {
    # ... actions ...
}

Note that the above would only work with rsyslog v8.17.0 and later, due to an issue preventing it from parsing variable names starting with underscore.

Sending the messages

Once your rule condition is established, you can add the actual action sending aushape messages to Elasticsearch:

action(name="aushape-elasticsearch"
       type="omelasticsearch"
       server="localhost"
       searchIndex="aushape-rsyslog"
       searchType="aushape"
       bulkmode="on"
       template="aushape")

The action above would send messages formatted with the aushape template, described above, to an Elasticsearch server running on localhost and default port, and would put them into index aushape-rsyslog with type aushape, using the bulk interface.

Add the following action if you want to also send aushape messages to a dedicated file for debugging:

action(name="aushape-file"
       type="omfile"
       file="/var/log/aushape.log"
       fileCreateMode="0600"
       template="aushape")

Further, if you don't want aushape messages delivered anywhere else you can add the discard action (~) after both of those:

If you'd like to exclude aushape messages from any other logs remember to put its rule before any other rules in rsyslog.conf.

Here is a complete example of a rule matching messages arriving from aushape, delivered by journald. It sends them to Elasticsearch running on localhost with default port, puts them into aushape-rsyslog index with type aushape, using bulk interface, stores them in /var/log/aushape.log file, and then stops processing, not letting them get anywhere else.

if $!_UID == "0" and $programname == "aushape" then {
	action(name="aushape-elasticsearch"
		   type="omelasticsearch"
		   server="localhost"
		   searchIndex="aushape-rsyslog"
		   searchType="aushape"
		   bulkmode="on"
		   template="aushape")
	action(name="aushape-file"
		   type="omfile"
		   file="/var/log/aushape.log"
		   fileCreateMode="0600"
		   template="aushape")
	~
}

Other

See the aushape --help output and experiment!

Contributing

Feel free to open issues, submit pull requests and write to the author directly. All contributions are welcome!

aushape's People

Contributors

Stargazers

Watchers

Forkers

burnalting git001 guoyu07 soltius cezidev gabeochieng zhangxiaohuan968 rusbomber isabella232 doc-sheet

aushape's Issues

Decide on and implement consistent argument verification convention

Some of the functions use assertions, some return AUSHAPE_RC_INVALID_ARGS, some assert that and other return values.

Decide on the rules and implement them.

Filter out "node" field from records

Do not output the "node" field in records, as it is already present for the event as "host".

Implement an option to specify input log character encoding

Implement a command-line or configuration option to specify input log character encoding.
Convert the input to UTF-8 before processing and outputting to both JSON and XML.

Format descriptions of function return values appropriately

Format function return value descriptions in comments according to doxygen documentation.
See http://www.stack.nl/~dimitri/doxygen/manual/commands.html#cmdreturn

Refactor event formatting and trimming

Implement reporting conversion errors in-band

Various errors can occur during conversion, such as unknown records/fields, invalid field/record format, unexpected duplicated record types, etc.

Since aushape is supposed to run reliably under auditd, and can't simply stop processing the log, it needs to handle and report those errors somewhere.

Output events which failed to parse as a special type of event, containing the raw records and description of the failure.

Assert the modified object is valid on each library function exit

Consider standardizing on structure object creation arguments instead of va_list ones

Output raw representation as array of lines

At the moment raw representation is concatenated together with a newline at the end, which is hard to read when viewing human-oriented output.

Instead output each raw line in its own (array) element.

Refactor-out common prologue and epilogue output in conv.c

Aggregate "path" records into a single element/object

Make header and trailer a part of conv output

Make XML and JSON log header and trailer a (configurable) part of converter output.

Ignore EOE events

Ignore events consisting of a single EOE records, which are produced by auparse sometimes.

Differentiate between continuous and discrete outputs

Make converter differentiate between and act differently for continuous and discrete outputs. E.g. a file and a syslog output. Continuous outputs can receive data in whatever pieces, discrete outputs can receive only complete documents.

Live logging to ElasticSearch

Don't assume there are only two formats

Make sure the conv code never assumes that there are only two formats possible and doesn't do something like:

if (format == AUSHAPE_FORMAT_XML) {
    /* Format is XML */
} else {
    /* *Assume* format is JSON */
}

Instead do this:

if (format == AUSHAPE_FORMAT_XML) {
    /* Format is XML */
} else if (format == AUSHAPE_FORMAT_JSON) {
    /* Format *is* JSON */
}

This way in case another format is added, there won't be a possibility of
output in mixed format.

Make aushape support output to syslog(3)

This requires controllable support for #8.

Note when a function expects a valid argument and fails assertion otherwise

Make the raw representation output optional

Make and use universal "GUARD" macros

Instead of using copy-pasted (but small) macros to handle failures and error returns, define a global set and use them everywhere.

Implement generation of XML and JSON schemas

Generate both XML and JSON schemas from whatever data it is possible to extract from auditd source tree and auditd field registry.

Consider improving formatting code structure

Consider improving formatting code structure. E.g. make an entity output code not care about entity separators, let the invoking code deal with that. Look for other logic failures.

Handle repeated record types

Since JSON can't represent duplicate records in objects, and object arrays are hard to use in ElasticSearch, figure out what to do with repeated records of the same type in one event.

At the moment repeated execve records are stitched together. There are still other repeated record types: AVC (in permissive mode), PATH, and OBJ_PID (if signal is sent to multiple processes), at the least.

One option is to aggregate them, similarly to execve, but more complicated records would still have a problem of array of objects and ElasticSearch flattening.

Another option is to multiply events with repeated records, outputting each event with a single record from the sequence.

Third option is to simply output records in an array, but this will be hit hardest by ElasticSearch array flattening, and will be hard to access.

Find an output which can handle large documents

Syslog is usually limited by message size. Find if we can support some output which lets us log (essentially unlimited) execve documents.

Implement Elasticsearch mapping

Put out a webpage

Consider having separate folding level for documents

Consider having folding level 0 signify newlines between output documents.
Then documents can end and start on separate lines in files down to folding level 1.

Consider switching to a global rc type

Consider switching to a global return code type, instead of using per-module return codes as planned earlier.

Consider moving common record prologue/epilogue formatting to record.c

Implement support for delivery to ElasticSearch

To be able to decide on the specific JSON schema to use, implement the necessary code to support log delivery to ElasticSearch. This includes running under audispd and accepting input from it #14, outputting one message per line #5, and logging to syslog(3) #7.

Implement execution under audispd

Implement whatever is necessary to run under audispd.

Note validation functions accept NULL

Differentiate between recoverable and unrecoverable errors

Make converter be able to repeat an operation if it can be recovered (e.g. an output failure), and permanently stick to an error if it's not (e.g. an auparse error messing up its state).

Support message per line output

Implement build integration test

Implement a test verifying that the library can be built with and used.

Implement handling of invalid UTF-8 sequences

Mask and divert invalid UTF-8 sequences in JSON output. Check how XML handles them and implement a similar scheme, if necessary.

Scan the code for TODO and FIXME

Consider having separate executables for streaming and converting

At the moment aushape has a single executable, which can be used for both streaming audit log to syslog (and possibly other targets) and doing single-shot conversion. This results in a somewhat complicated interface, which might be confusing and difficult to understand for new users.

Consider making two separate programs using the same library: one for single-shot conversion, another for streaming.

The benefits can be simpler interface and clearer separation of purpose. The downside can be either inability to stream an already saved file, or having the interface complexity of the streaming program to be about the same.

Implement unit tests

Implement unit tests for the aushape_gbuf and aushape_conv.

Write a draft JSON schema

Reconsider what part of the collector stack should be checking for record type uniqueness

Ignore or warn about event not being trimmed to the required maximum

As events can be sized arbitrarily by users specifying arbitrary indent sizes and long hostnames, it is not possible to guarantee a minimum event size (unless we calculate that based on other settings, which would be complicated to implement and use).

Therefore don't fail the assertion on failing to trim, but instead produce a warning somewhere or just ignore it.

Support specifying indent depth for multiline message output

Limit event size

Provide an option to limit event size. Events exceeding the size can be replaced with an event with a special attribute saying event was truncated. This can be a good start. Later adaptive truncation can be implemented, such as truncating some records, perhaps with a separate record size limit, or truncating execve record argument list, also with a separate limit.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.