umm-csci-systems / log-processing Goto Github PK
View Code? Open in Web Editor NEWExtract and plot failed login info from system logs
License: MIT License
Extract and plot failed login info from system logs
License: MIT License
This write-up is really long, which makes it intimidating, and hard to scan through for particular pieces of information. It would probably be a lot more readable if we cut it into multiple pages, with the "main" README having lots of links into those pages.
This would help the students and be useful for the TAs.
If we're going to keep this as a two week lab, with different parts due in consecutive weeks, we should have them create releases (or at least tags) after the first part is done to make it easier for us to check out the first part separately from the second. Otherwise it's really hard to figure out where their first half "ends" and their work on the second half "begins". If we have them create releases or tags then we can checkout the appropriate release/tag and grade that without worrying about how things get muddied by their work on the second part.
Instead of having all the confusion about submodules caused by bats
, we should make a setup-bats.sh
script (or similar) that runs the desired commands. They people can run that once when they first clone the repo. Thanks to @biruk741 for the suggestion!
This should happen in earlier labs, like the command line intro and its pre-lab, and continue out across the subsequent labs that use bats
.
This order is incorrect:
The test actually wants to see:
The two parts of this lab are collectively worth nearly 100 points, which is a bit out of proportion to the value of the other labs. The Segmented File Server is only worth 24, i.e., a quarter of this, and it really should be more like 50-75 points compared to this.
There are user names with _
and -
in the full data set, but none in the test data until you get to the final test. It might be good to have examples in the partial tests that include those so people get a heads up sooner.
Right now, process_client_logs.bats
and process_logs.bats
each take the actual result, sort it, and diff it with the expected result ignoring whitespace.
# process_client_logs.bats
sort data/discovery/failed_login_data.txt > data/discovery_sorted.txt
sort tests/discovery_failed_login_data.txt > data/target_sorted.txt
run diff -wbB data/target_sorted.txt data/discovery_sorted.txt
assert_success
# process_logs.bats
sort tests/summary_plots.html > targets/target.txt
sort failed_login_summary.html > targets/sorted.txt
run diff -wbB targets/target.txt targets/sorted.txt
assert_success
There's a problem, though—if a line has extra whitespace, it might get sorted differently than the canonical, whitespace-normalized version!
So, to compare two files in a whitespace-insensitive way, we need to normalize whitespace and then sort :)
https://dev.to/afrodevgirl/replacing-master-with-main-in-github-2fjf
Make sure to update any existing pull requests as well.
The F21 students have shared links to a bunch of potentially useful resources in Perusall. We should move some of these into the write-up.
The assemble_report.bats
test looks for test/failed_login_summary.html
which doesn't exist, causing the test to fail.
Log-processing/test/assemble_report.bats
Line 44 in 8aef05a
We need to convert from the old-style bats
to the new bats-core
. That's a fair bit of work because there are a lot of tests in this repo, all of which will have to be updated. We'll also need to mention the git submodule add
commands in the README.
Test files to update:
process_client_logs.bats
create_username_dist.bats
create_hours_dist.bats
create_country_dist.bats
assemble_report.bats
process_logs.bats
Other tasks:
shellcheck
git submodule add
commandsIn the Write create_country_dist.sh
section, I say:
After you've converted IP addresses to country codes, you can extract the country codes, count their occurrences (like we counted usernames before), and generate the necessary
data.addRow
lines, which again look likedata.addRow(['04', 87])
;. Remember to then wrap those with the appropriate header and footer, and you're done with this part.
The example is presumably a copy/paste from the hours section. The string '04'
should be changed to a country code like 'FR'
.
Every semester students are confused by the diff
output from the bats
tests, and TBH I typically have to reconstruct what's going on there each time.
Maybe we should document that output in the write-up (or a separate document), including:
XdY
, XcY
, etc., lines<
and >
linesI think that if they include --recurse-submodules
when they do their git clone
(presumably on the command-line), then it will avoid all the submodule weirdness. I'm not 100%, though, so I'm going to leave this be for now.
There's a nifty solution to process_client_logs.sh
that uses NF
to count backwards with syntax like $(NF-5)
. This avoids the need for two patterns in the awk
call here. Maybe mention that? (Or maybe not add that complication?)
v1.0.1 now fails because of changes in GitHub; v1.2.0 seems to work fine, though.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.