Coder Social home page Coder Social logo

m3047 / shodohflo Goto Github PK

View Code? Open in Web Editor NEW
14.0 2.0 0.0 541 KB

Pure Python netflow and DNS correlation, with reusable Frame Streams, DnsTap and Protobuf implementations

License: Apache License 2.0

Python 96.59% CSS 0.74% HTML 2.58% Shell 0.09%
protobuf fstrm dnstap python3 netflow asyncio dns-traffic frame-streams

shodohflo's People

Contributors

m3047 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

shodohflo's Issues

Show port numbers with IP addresses

The packet sniffer captures the remote port as well as remote address, and this is recorded as part of the flow key in Redis. Surface this in the UI.

How? Rollover? Just display it after the address? Should address + port combos be distinct in the UI, or grouped by the address (DNS A and AAAA records don't distinguish port numbers)?

Incorrectly decoded packets in pcap_agent

I'm seeing this sporadically:

  ERROR:asyncio:Exception in callback Server.process_data()
   handle: <Handle Server.process_data()>
   Traceback (most recent call last):
     File "/usr/lib64/python3.6/asyncio/events.py", line 145, in _run
       self._callback(*self._args)
     File "/usr/local/share/shodohflo/agents/pcap_agent.py", line 315, in process_data
       if pkt.p == socket.IPPROTO_TCP and pkt.data.flags & dpkt.tcp.TH_RST:
   AttributeError: 'bytes' object has no attribute 'flags'

and I'm adding some code to the fwm branch to assess how often it's occurring and test a possible mitigation.

Asyncio implementations of fstrm, dns and pcap agents

Hello. I'm currently working on versions of the DNS and pcap agents which use asyncio. Included in this, the frame streams implementation shodohflo/fstrm.py needs to support asyncio as well.

shodohflo.fstrm will remain backwards compatible.

A new branch will appear in the repository, bringing the total of more or less permanent branches to three:

  • synchronous will contain the fully synchronous version, essentially a snapshot of the repo today.
  • fwm remains my working copy, and will contain the updates until they're tested and soaked to my satisfaction.
  • master remains the development branch and will ultimately reflect the changes needed to implement asyncio.

Separate Dnstap and DNS agents

The present implementation lacks flexibility and only supports 1:1 telemetry capture.

The current dns_agent combines two functionalities:

  • Reads Dnstap protocol telemetry from a unix socket.
  • Writes filtered data to a Redis database.

The Dnstap protocol as architected is not network-aware; the BIND implementation is capable only writing to either the unix socket or to (rotating) files.

It is desirable to support many-to-one, one-to-many, and many-to-many capture modalities for redundancy and failover, as well as for additional uses. For instance, work is underway to alter Rear View RPZ to that it can optionally ingest telemetry data via UDP datagrams transmitted in the format envisioned here.

The anticipated future architecture is:

  • A Dnstap agent which reads from the unix socket, filters relevant information, and sends UDP datagrams.
  • A DNS agent which receives UDP datagrams, filters relevant information from them, and updates the Redis database.

This issue exists to inform the community of the anticipated change and to solicit feedback.

UDP as the transmission modality

UDP datagrams are anticipated as the transmission modality, with each datagram encapsulating one observable event.

An observable event is defined as a (potential) CNAME chain ending in a single IP address; if the underlying Dnstap event resolves the (single) CNAME chain to multiple addresses, then one observable event is generated for each address.

UDP is chosen because it supports not only one-to-one and many-to-one, but also one-to-many and many-to-many (multicast addresses). Additionally, UDP is connectionless and so recovery / tolerance for network or receiver outages is much simpler to build as well as understand.

UDP has an absolute 64K byte limit on datagram size; on the other hand the efficient datagram size is determined by path MTU and can be considerably smaller (the typical ethernet MTU is 1500 bytes). Datagrams larger than the path MTU are dealt with via fragmentation, which increases the possibility of data loss and requires packet reassembly at the receiving end.

Note that the maximum size of a DNS query or response tracks the UDP absolute limit on datagram size.

Content of the telemetry data

Dnstap telemetry can capture different versions of a DNS query or response (stub resolver to caching resolver or caching resolver to authoritative, request or response) as well as additional metadata. Since a single DNS message itself can theoretically reach the absolute limit on datagram size, curation of data is required in order to reliably use UDP as a transport.

The datagram content is envisioned as a JSON dictionary with three keys:

  • id a monotonically increasing serial number for the datagram, reset to zero on restart of the Dnstap agent
  • address the address or "end" of the CNAME chain; both IPv4 and IPv6 are supported
  • chain a list containing the reversed CNAME chain
  • client the address from which the query was sent

Given the DNS data:

www.example.com.    IN CNAME server.example.com.
server.example.com. IN A     10.0.0.1

a sample (and prettified) datagram payload might look like this:

{   "id": 1,
    "address": "10.0.0.1",
    "chain": ["server.example.com.", "www.example.com."],
    "client": "10.43.11.48"
}

Tasks are WeakRefs

It wasn't documented (well) in Python 3.6 but the event loop keeps track of Tasks using a weakref.WeakSet. Theoretically this can cause tasks to mysteriously disappear during garbage collection.

So, special measures need to be taken to keep a regular reference to the task around while a request is being serviced.

Affected items are:

  • examples/dnstap2json.py
  • shodohflo/fstrm.py
  • agents/dns_agent.py
  • agents/pcap_agent.py

Python 3.11

Fedora 37 ships with Python 3.11 which no longer supports loop.run_forever(). This was mitigated for trualias in m3047/trualias#6 and the intent is to do something along the same lines here.

This will be a multipart effort involving:

  • shodohflo/fstrm.py
  • agents/pcap_agent.py

Dnstap -> JSON sample program

Create an example program which outputs JSON to a UDP socket. agents/dns_agent.py provides an example of extracting Dnstap information and writing it to Redis using a ThreadPoolExecutor. This new example would write directly to a UDP socket asynchronously.

Rationale for UDP socket: A udp socket allows subscribers to connect/disconnect from the stream independently of the sending application, effectively decoupling them. On the downside a UDP socket effectively limits the datagram size to (some fraction of) MTU.

This issue exists to solicit comments on this proposal.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.