ciscocsirt / netsarlacc Goto Github PK
View Code? Open in Web Editor NEWLicense: BSD 3-Clause "New" or "Revised" License
License: BSD 3-Clause "New" or "Revised" License
We gotta get our readme and other docs up-to-date. We should add an authors file and figure out a license too. Examples of how to run the sinkhole and how to tune a system to make it perform well should be completed too.
Right now each worker does a read on the socket to get the HTTP request, parses the request, and then builds a response to send back.
Building the response is very CPU intensive so we don't want too many workers going in parallel or everything slows down. Testing shows that maximum performance is achieved roughly when the number of workers matches the number of physical CPU cores.
However, reading on the socket can tie a worker up doing nothing for however long the client-read-timeout is (by default 300 ms). This means with just a small number of bogus requests at a time, all the workers can be tied up for 300 ms. It's easy to starve legitimate requests by tying up workers in this way.
One possible solution is to have a very large pool of workers that just read from the sockets and then a smaller set of workers that parse the requests and respond. The easiest thing to do here is to take out the reading code and put it in its own worker pool and then feed the read results to the existing (now modified) workers.
So instead of ACCEPT -> WORKER we'd have ACCEPT -> READ -> WORKER where there are many more routines implementing READ. The more routines that are dedicated to just reads, the harder it is to tie up all of the reading routines waiting for a timeout.
Right now there is no "version" or build date or any other information integrated into the binary when built. It would be a nice to, at a minimum, integrate a version number. A build date would be nice too. The git commit hash, build host, and build user are also options.
Possible ways to do this:
https://www.atatus.com/blog/golang-auto-build-versioning/
https://stackoverflow.com/questions/11354518/golang-application-auto-build-versioning
There are lots of names, values, strings, and path info that's hardcoded all over the place. We should bring these out into variables and then hopefully expose them via a configuration file.
Right now src_ip and dest_ip contain sub fields:
"src_ip":{"IP":"127.0.0.1","Port":54660,"Zone":""}
"dest_ip":"127.0.0.1:3333"
We should have a src_ip, src_port, and dst_ip, dst_port. We may actually consider not having dst_ip at all since it'll be the same for the sinkhole. Instead we should probably include a sinkhole instance name / ID in the json so that if we're running more than one sinkhole we can tell them apart in the logs.
dest_name is the "Host:" header provided by the client
Rework the worker pool design
We need to be able to daemonize and interact cleanly with init scripts or similar. This also means we need a way of gracefully stopping the daemon non-interactively. Presumably via a single handler but there may be a better "Go" way.
Multiple copies of netsarlacc may stomp on each other's logs. We should use advisory locking via flock() to watch out for this.
Right now we have a number of things hardcoded. It would be nice to create a configuration struct that specifies where we log, what ports we listen on, what protocols, and the TLS certs for any TLS-wrapped protocols. Then we could have a JSON configuration file to populate this struct.
Before reading from a connection we set a read deadline. Then when we try to read we check for an error and if an error happened we report it with fmt.Println("Error reading:", err.Error())
But then later we try to write to the socket to tell the client it was an io timeout: work.Connection.Write([]byte("Error I/O timeout. \n"))
however there are other reasons this could error. For example the client could close the connection before we call read. If we try to write to a socket that is in an error state we could just compound the trouble. Instead we should just move on without trying to send anything to the client.
Right now there are multiple prints that go to stdout which are fine for testing but won't work for production. We should cleanup the output. If we still need output for debugging purposes we should support a debug (or verbose) flag and send the output to stderr instead.
Example:
2017/03/02 22:27:30 {"timestamp":"2017-03-02 22:27:30.00813237 +0000 UTC","bytes_client":"144","http_method":"GET","url_path":"/test/path/here.txt?crap","http_version":"HTTP/1.1","http_user_agent":"curl/7.52.1","dest_name":"127.0.0.1:3333","http_referer":"\u003cscript\u003ealert(\"pwned\");\u003c/script\u003e","src_ip":{"IP":"127.0.0.1","Port":58328,"Zone":""},"dest_ip":"127.0.0.1:3333","raw_data":"474554202f746573742f706174682f686572652e7478743f6372617020485454502f312e310d0a486f73743a203132372e302e302e313a333333330d0a557365722d4167656e743a206375726c2f372e35322e310d0a4163636570743a202a2f2a0d0a526566657265723a203c7363726970743e616c657274282270776e656422293b3c2f7363726970743e0d0a0d0a"}
And:
2017/03/03 22:29:18 {"raw_data":"476554202f746869735f4745545f7761735f47655420485454502f312e300d0a0d0a"}
Maybe we should just roll our own logging library? That would eliminate the dependency on an external lib like Lumberjack.
To avoid any issues, we should start using branches and submitting pull request instead of committing to the master branch.
Right now log file names are based on the moment the sinkhole was started and rotate every 10 minutes after that. This produces names like:
sinkhole-2017-05-30-22-14-43.log
Instead we should snap logging into files that fall on 10 minute boundaries for the hour like
12:00:00
12:10:00
12:20:00
....
This will require a minor re-work of the timer to instead be a loop checking time.Now() and doing some basic rounding / modular arithmetic. Then we could just drop seconds from the filename altogether. The code protected by the mutex that closes the old file and opens a new one doesn't really have to change, just the name creation code needs to change.
It would be nice to record the TLS version used by clients.
As mentioned in issue #2 , we should add a sinkhole instance name/ID instead of the dst IP.
The way header parsing is done right now isn't particularly efficient and it's a bet error-prone. By splitting on ":" it leads to possibly many fields if the user-controlled value contains ":". Also, the space after the header ":" is not part of the value yet the current way they're parse leads to the space being included as part of the value. Example:
<td rowspan="1" colspan="1"> curl/7.52.1</td>
The template code goes and fetches the template file every time. The template fetching and filling out should be moved into its own function and that function should try to re-use work so that it doesn't have to fetch the file every time.
Right now sockets only get one read() call:
// Make enough space to recieve client bytes
read.Buffer = make([]byte, 8192)
err = read.Conn.SetReadDeadline(time.Now().Add(time.Millisecond * time.Duration(*ClientReadTimeout)))
But for large requests that don't all fit into one packet, sometimes the kernel will make a subset of the data available on the socket and not all of it will come back in the single read. This causes parsing to see a truncated request.
It seems like a good option here would be to call read() with an initial timeout and then exponentially decrease the timeout value for subsequent calls to read() until either a minimum is reached (maybe 50ms) or a timeout occurs because no more data is available.
The danger here is letting a client trickle data to the server so that the read loop never ends. An absolute cap on total read time needs to be set. The current ClientReadTimeout could be re-worked to be a maximum time to spend reading and any time left over from the first read could be used for a second, third, fourth, etc.
The logger should know how to stop itself instead of relying on netsarlacc.go to do a lot of the stopping and error checking work itself.
We have a number of places in the code where we check for an error that shouldn't ever happen and if the error does happen, it's not clear that the code can actually recover. One such example is in opening / closing log files in the logger code. If an error happens on open or write or close we're pretty much hosed and we should do our best to just shut everything down after we've sent some details about the error to syslog.
I'm not sure yet of the best way to signal all the goroutines to gracefully stop but we need to look into it.
Right now all of the file based operations (logging, PID file, daemonization) all assume paths are relative to '.' which isn't adequate for production. The daemonization code can't chdir to / right now for this reason.
It should be possible to specify a pidfile path and a logging directory path and a template file path. The code should also be able to continue functioning even after a chdir /
The dispatcher.go file handles starting workers. It should also handle more of the stopping of the workers instead of relying on netsarlacc.go to fully track the stopping. This will probably mean moving the stop channels over to it too.
These two functions are basically identical but use different channels. Instead make a meta function that we can use to pass the channels to as arguments. This eliminates code duplication and any chance the two functions could accidentally diverge from each other.
It would be great if netsarlacc had a protocol handler that just listened for whatever the client sends it and records that. Maybe listen for a second or so before closeing the connection. This could give people some flexibility to handle other protocols where the client says something first without actually having to handle any of the protocol details.
Go doesn't seem to provide a way to control the TCP backlog setting on a socket but if we create the socket with syscalls we should be able to.
Right now the HTTP (and to a lesser extend SMTP) protocol handling is just built straight into the code and isn't modular at all. It would be nice if all the stuff needed to handle a protocol were put together into a file with a clearly defined API to implement reading and interacting and logging of the activity.
The main benefit of this would be at a code organization level. It might also lower the bar to adding support for more interactive protocols or extending the interactivity of an existing protocol like SMTP.
The code doesn't currently support TLS. We probably should support listening on multiple different sockets with a flag specifying which sockets are TLS. That way we can listen on 80, 443, 8000, 8080, an 8443.
Go seems to support SNI https://golang.org/pkg/crypto/tls/#ClientHelloInfo
If the client uses a server name for TLS it would be nice to record it in the logs.
To quote the go documentation https://golang.org/pkg/text/template/ "To generate HTML output, see package html/template, which has the same interface as this package but automatically secures HTML output against certain attacks."
It is possible to bind to the sockets and then drop privileges and start a daemon and pass it the file handles. This would fit in well with the existing daemonization code.
We haven't been careful about timezones. I've only been testing on a machine running in UTC so any timezone bugs would not have shown up.
Ultimately the code should log in UTC by default but support a local timezone with a cmdline flag / config file option.
When a client sends bogus data and generates an error nothing about the client or the time is written:
2017/03/03 22:29:18 {"raw_data":"476554202f746869735f4745545f7761735f47655420485454502f312e300d0a0d0a"}
Right now the sinkhole is built assuming HTTP only but people may want to add other protocols like SMTP, POP, IRC, etc. The code should at least be ready for these additional protocols to be added without having to restructure too much code.
Right now the way a connection is handled by a worker is a header struct is filled out with information about the request. Assuming the request goes well, that header struct is then encoded into a LoggedRequest object and returned with a nil error:
validConnLogging := LoggedRequest{Timestamp: time.Now().UTC().String(), Header: req_header, SourceIP: sourceIP, SourcePort: sourcePort, Destination: allHeaders["host"], EncodedConn: raw}
return validConnLogging, nil
If something goes wrong, we instead fill out an empty LoggedRequest and sends the error:
return LoggedRequest{}, err
The trouble with this is that in (almost?) all cases we want to log at least some basic information about the client that caused the error. I think we should change the Header struct to a information log struct where we store all the information we can log about a client. As soon as we get any bytes from the client we can put them in the raw_data field in the information struct. The same goes for when we learn the client IP and port. Then either on success or error we encode what we've recorded in the information struct into a LoggedRequest.
We could create a function that takes an information log struct and reads all the non-nil fields and fills out and returns a LoggedRequest struct. Then returning on error would look more like:
return BuildLogRequest(client_info), err
And success would look pretty much the same:
return BuildLogRequest(client_info), nil
This also gives us the opportunity to set the error message in the JSON we log. A log could look like:
{"error":"true", "error_message":"Request header failed regex validation", "src_ip":"1.2.3.4", "src_port":"5678", "raw_data":"476554202f746869735f4745545f7761735f47655420485454502f312e300d0a0d0a"}
Right now any worker that produces logs calls straight into the logging routines. This opens up the possibility for race conditions, especially during log rotation. We should create a logging goroutine and pass all log contents over a channel to eliminate any chance of races.
netsarlacc should be able to handle IPv6 in the same way it handles v4.
Write test cases
It would be nice to provide init scripts / service files for the various common distributions.
A client doesn't have to provide a Referer or User-Agent header. We have both of these fields in a table in the template. When a client doesn't provide them the value cell in the table is empty. It would be nice if there was a conditional check that could prevent the whole table row from being printed.
There are many global configuration variables just mixed into the netsarlacc namespace. It would be nice to throw these into a struct and then maybe pass a reference to the struct around instead of accessing them as global vars across files.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.