Coder Social home page Coder Social logo

israel-lugo / netforeman Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 0.0 158 KB

Making sure your network is running smoothly

License: GNU General Public License v3.0

Python 100.00%
network network-admin automate diagnostics react infrastructure-monitoring routing watchdog

netforeman's Introduction

NetForeman

license CodacyBadge

Making sure your network is running smoothly.

NetForeman is your network's foreman: an advanced programmable network system supervisor. It is meant to run on network infrastructure systems, such as routers or automated probes, for automatic monitoring and reacting to a wide array of situations.

The idea behind this program is to automate issue detection and problem solving whenever possible. You shouldn't have to stare at screens and graphs 24/7 to know that your firewall's connection tracking table is almost full. When that happens, you shouldn't have to manually go check the top 10 flow endpoints, to find out that a single host is sourcing 1,000,000 flows to different IPs.

Likewise, you shouldn't have to check the routing table on 20 routers to find out that your OSPF designated router is failing in a silent way, still sending out hellos but no longer redistributing LSAs (the consequence being that both the DR and BDR have a full table, but every other router is isolated from the network).

NetForeman's mission is to help detect, diagnose and react to these sorts of problems.

Features

NetForeman supports both IPv4 and IPv6.

This is currently a work in progress; the feature set is under development. The following features are planned for now:

Verifications

  • Monitor the routing table (only Linux for now, but designing for extensibility)
    • The number of routes is within a certain range
    • There is a route for a certain prefix
    • The route for a certain prefix is through a specific nexthop
  • Monitor running processes
    • A certain process is running
  • Monitor connection tracking
    • The number of tracked connections doesn't exceed a certain limit
  • Monitor active DHCP servers
    • Only a specific set of IPs respond to DHCP queries

Actions

Triggered by a verification failure:

  • Insert routes
  • Send emails
  • Restart services
  • Execute arbitrary scripts

Installing

NetForeman requires Python 3.

From source

NetForeman can be run directly from the source directory, as long as the requirements are already installed (or copied to the root of the source directory).

The following third-party modules are required:

netaddr can be installed with the python3-netaddr package on a Debian or Ubuntu system, or dev-python/netaddr on a Gentoo system. pyroute2 would be package python3-pyroute2 on Debian or Ubuntu, but it must be version 0.4.0 or greater; it has no Gentoo package available. pyhocon has no package available for Debian, Ubuntu or Gentoo. pyparsing can be installed as python3-pyparsing on Debian or Ubuntu, or as dev-python/pyparsing on Gentoo. psutil can be installed through package python3-psutil on Debian or Ubuntu, or dev-python/psutil on Gentoo.

Contact

NetForeman is developed by Israel G. Lugo <[email protected]>. Main repository for cloning, submitting issues and/or forking is at https://github.com/israel-lugo/netforeman

License

Copyright (C) 2016, 2017 Israel G. Lugo <[email protected]>

NetForeman is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

NetForeman is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with NetForeman. If not, see <http://www.gnu.org/licenses/>.

netforeman's People

Contributors

israel-lugo avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

netforeman's Issues

Return meaningful exit codes

We should use the program exit codes to return useful information to the user.

Example 1: a check failed, and corrective action was taken. What to return?

Example 2: a check failed, and one of the corrective actions failed, but not all. What to return? I think it should be an error, period. Right now we're not even detecting this; FIBModuleAPI._route_check_failed catches the error from the action, logs it, and continues happily. We would need to propagate the error up, but without interrupting the execution of other actions.

How to deal with actions?

Assuming for example the following configuration:

linux {
    # route-checks is an array, order is important
    route-checks = [
        # make sure we can reach the email relay host
        {   
            dest = ${email_relay_host}
            # must lead to somewhere, i.e. not a blackhole
            non_null = true
            # on-error is an array, order is important
            on-error = [
                {   
                    action = add_route
                    dest = ${email_relay_host}
                    nexthops = ${core_routers}
                }
                { action = email }
            ]
        }
}

We have two actions to execute in case of error:

  1. Add a route (in this case so we can reach the email relay)
  2. Send an email

Both actions are supplied by separate modules (in this case, add_route comes from linux and email comes from email). Should we have some kind of action list, into which modules insert when loaded? How do we deal with name conflicts? What if module foo exports action bar, and module bla also exports its own different action bar?

We could go all out and have namespaces per-module. But it looks weird having to type linux.add_route from within the linux configuration. Perhaps we could do relative lookups, where the name is first looked up in the namespace of the module that defines it, and then in the global namespace. Or we could require that relative names only exist within their own module and that's it, no need to go look in the global namespace.

Normalize configuration exceptions

We're dieing from netaddr exceptions and so on, e.g. if we define a route_check with dest=0.0.0.0/k, we die with:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.4/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.4/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/capi/netforeman/netforeman/cli.py", line 94, in <module>
    sys.exit(main())
  File "/home/capi/netforeman/netforeman/cli.py", line 81, in main
    dispatcher = dispatch.Dispatch(args.config_file)
  File "/home/capi/netforeman/netforeman/dispatch.py", line 50, in __init__
    ok = self.config.load_modules()
  File "/home/capi/netforeman/netforeman/config.py", line 182, in load_modules
    settings = API.settings_from_pyhocon(config_tree, self)
  File "/home/capi/netforeman/netforeman/config.py", line 65, in settings_from_pyhocon
    return cls._SettingsClass.from_pyhocon(conf, configurator)
  File "/home/capi/netforeman/netforeman/fibinterface.py", line 284, in from_pyhocon
    for subconf in conf.get_list('route_checks', default=[])
  File "/home/capi/netforeman/netforeman/fibinterface.py", line 284, in <listcomp>
    for subconf in conf.get_list('route_checks', default=[])
  File "/home/capi/netforeman/netforeman/fibinterface.py", line 248, in from_pyhocon
    dest = netaddr.IPNetwork(cls._get_conf(conf, 'dest'))
  File "/usr/lib/python3/dist-packages/netaddr/ip/__init__.py", line 933, in __init__
    raise AddrFormatError('invalid IPNetwork %s' % addr)
netaddr.core.AddrFormatError: invalid IPNetwork 0.0.0.0/k

We should catch these kinds of errors, and convert them to a config.ParseError.

Move plugin modules to a new subpackage

It would be a good idea to separate the plugin modules (e.g. email, linuxfib) from the core framework. Place them in a new subpackage called modules.

Open question: what to do with moduleapi? It's not a plugin module, but defines the API for them. And what about abstract modules such as fibinterface? Perhaps rename these to _moduleapi and _fibinterface and place them inside modules too.

Implement email sending

We need some way to send emails. The configuration file (issue #1) already models the required settings. Now we just need to make it do stuff.

We can use the Python email module to create a message, and the smtplib module to send it.

linuxfib: deal with blackhole/unreachable routes

We get a NetlinkError(22, "Invalid argument") when trying to do a LinuxFIBInterface.get_route_to() and the route happens to be a blackhole. Deal with this somehow.

iproute2 has the same behavior, seems to be netlink-related. However, iproute2 has the ip route list match operator, which can show us routes that match even if they're blackholed. See how they do that. Can't use IPRoute.get_routes(dst=), that fails with a NetlinkError too.

route_check: implement non_null check

Now that LinuxFIBInterface.get_route_to() is capable of dealing with blackhole/unreachable routes (see #5), we can implement this. Remember to check the rest of the code to make sure other places don't break with null routes.

Detect errors in the entire config tree at parse time

Right now, we're not parsing the entire tree before running modules in Dispatch.run(). For example, error handler conf subtrees are only parsed if we enter the error. This may result in configuration errors only being detected at inappropriate times, e.g. precisely when a failure occurred and one needed to fix it.

A simple example. Action foo does not exist, but this will only be detected at runtime, if and when we enter the on_error handler:

modules = [
    linuxfib
]

linuxfib {
    route_checks = [
        {   
            dest = 198.51.100.5
            nexthops_any = [ 192.0.2.129, 192.0.2.143 ]
            on_error = [
                {   
                    action = foo
                    arg = 1
                }
            ]
        }
    ]
}

linuxfib: getting a route to default doesn't work

Doing a ipr.route("get", dst='0.0.0.0/0') results in the following:

{'attrs': [('RTA_TABLE', 254),
           ('RTA_DST', '0.0.0.0'),
           ('RTA_OIF', 1),
           ('RTA_PREFSRC', '127.0.0.1'),
           ('RTA_CACHEINFO', {'rta_tsage': 0, 'rta_used': 0, 'rta_id': 0, 'rta_lastuse': 0, 'rta_clntref': 1, 'rta_ts': 0, 'rta_error': 0, 'rta_expires': 0})],
 'dst_len': 32,
 'event': 'RTM_NEWROUTE',
 'family': 2,
 'flags': 2147484160,
 'header': {'error': None,
            'flags': 0,
            'length': 96,
            'pid': 882,
            'sequence_number': 289,
            'type': 24},
 'proto': 0,
 'scope': 0,
 'src_len': 0,
 'table': 254,
 'tos': 0,
 'type': 2}

That is, a 0.0.0.0/32 (local) dev lo proto unspec route. The nexthop is completely missing, and can't be checked against anything. Furthermore, the destination length is 32 instead of 0.

Same thing happens for IPv6.

This will break route_check for 0.0.0.0/0 (or ::/0) with nexthops_any, as the nexthops are nowhere to be found.

New module to monitor services/processes

Services crash. Bad things happen. We need to create a module that makes sure a certain process is running. Ideally, it should know about services, so it can do things like restart automatically. Perhaps two separate check types, service_check and process_check.

The module should have actions like restartservice, startservice and executecmd (which just runs an arbitrary shell command).

New module to detect rogue DHCP servers

Try using pydhcplib or something else. Send out DHCP requests and check responses against a whitelist of authorized servers.

Suggested actions would be email.sendmail or even other more complex things such as logging in somewhere and shutting down a port :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.