Coder Social home page Coder Social logo

cavejay / strippy Goto Github PK

View Code? Open in Web Editor NEW
2.0 5.0 3.0 2.35 MB

Use this Powershell Script to sanitise your logs of configured patterns before handing them off to someone else (like your support team)

License: MIT License

PowerShell 100.00%
log sanitisation wip

strippy's People

Contributors

cavejay avatar joelstuart avatar mzball-dt avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

strippy's Issues

Memory Consumption Issue

Strippy can consume a large amount of memory when sanitising.

This is especially evident when processing collector (DT CAG) that is used as a RUM beacon. It is possible to consume 11-13 GB of memory and causing powershell to crash due to out of memory issue.

Version ~2.4 (To be confirmed.)

Speeeeed.

Strippy struggles with tons of keys unique keys because it checks each one to ensure it's new. Strippy is also very male at the moment and unable to focus on more than one thing at once (Super Synchronous).

Options for more power:

  • We could do away with the unique check entirely, and that might help with performance.
  • Fork for key finding, consolidate and fork again for sanitisation.
  • Reducing complexity.

Some mixture of the above would work well, I find the forking solution (and a nice looking tui) particularly appealing.

More Order plz

The Keylist isn't ordered in any way and this upsets me.

plsfix

Supplemental post execution sanity check

Optionally provide a run through of some basic, less specific regex's, to identify where the sanitisation process may have missed lines. Here false positives are better than false negatives.

e.g.
Simple regex for picking anything that looks like an IP, without any text context.
Simple regex for picking any single word (space delimited) which contains letter AND numbers, but not all letter or all numbers – find server/workstation names.
Simple regex for picking any single word (space delimited) which contains letters optional numbers but delimited by ‘.’ – find any DNS names

Support for PowerShell v2, v3 and v4

While running on and with the latest and greatest it's important to support older OS's and versions of Powershell as well, especially when that's all some customers have access too.

Not sure if this should be a separate release, potentially as a cut-down version to begin with or if it should (or can be?) implemented in another 1-for-all method

Match alias's are being 'found' during sanitation stage

Currently if you have a rule that replaces a string with 'abcdefg' and another that replaces 'cde' with 'memes' you'll could end up with abmemes123fg4 which is obviously an undesirable outcome.

in order to prevent this patterning the 'cde' -> 'memes' sanitisation should occur before the 'xx' -> 'abcdefg' sanitisation.

To resolve this bug please either:

  • add a warning to the config file about this behaviour with steps prevent it (order the rules in the config file to avoid this behaviour)
  • automatically resolve the ordering of sanitisation at run time using magic (unwritten code)
  • add a config entry for sanitisation ordering

Strippy is fast but fat.

Strippy needs to lose weight.

I'm watching it process 2.3gb of log files rn and the main process is using a whopping 13gb and climbing. This is a bit of an issue. This is during the scout-stripper routine so there must be something we can do in there to improve either garbage collection or something further as that is a ridiculous amount of memory to consume.

plz fix

Support for unknown length 'lists' of sensitive data

Currently the only way to capture more than one piece of sensitive information from the same line is to explicitly capture each piece with a new rule. This works well for cases that are known length, say splitting domain\user into a domain and user.

Some times however a log line contains a list of servers or URLs that are best captured individually. If the number of items in this list changes between log sets then support for 'lists' of this data becomes valuable.


Suggested Solution

Rule alias' that end in 'List' represent lists of sensitive data. Rather than the [String1]=[String2] format of other rules, list rules are represented in the conf file with a [String1]=[String2],[String3].

  • [String1] - Regex Rule
  • [String2] - Rule Alias
  • [String3] - List delimiter

Implementation

This will require updating the config parsing code and the merging process.

Better support '$' regex matches.

Writing matches that use the end-of-line character currently fail to match anything. BUT matches strings that need to be sanitised and are found at the end of a line include a '...' a the end of the token.

This is strange and things aren't working as they should. Please fix.

Name                           Value
----                           -----
SanitisationSuccess1           dumbtest
BAD_STRING!!!!!2               null…
BAD_STRING!!!!!1               ignorethis…
BAD_STRING!!!!!3               "pleaseignore"…

Places to start:

  • scout-stripper
    ** This is either a symptom of the way the files are loaded
    ** or, the current regex method we use. It's possible the switch to compiled regex will fix this.

How even do measure as far go like?

Currently when sanitising a large number of files there's no way to be sure how far through key finding or sanitising processes we are.

I would like to have a write-progress bar that either shows progress for these 2 areas or the overall process for the job.

Change the way Jobs are named and reported on just slightly.

When running Strippy in a new Powershell shell errors about Job Id's appear and makes it look broken:

ERROR: ParentActivityId cannot be the same as the ActivityId.
ERROR: ParentActivityId cannot be the same as the ActivityId.
ERROR: ParentActivityId cannot be the same as the ActivityId.
ERROR: ParentActivityId cannot be the same as the ActivityId.
ERROR: ParentActivityId cannot be the same as the ActivityId.
ERROR: ParentActivityId cannot be the same as the ActivityId.
ERROR: ParentActivityId cannot be the same as the ActivityId.
ERROR: ParentActivityId cannot be the same as the ActivityId.
ERROR: ParentActivityId cannot be the same as the ActivityId.
ERROR: ParentActivityId cannot be the same as the ActivityId.
ERROR: ParentActivityId cannot be the same as the ActivityId.

This should be fixed for a more confident/secure EUE.

-makeconfig broken in 2.1.3

-makeconfig is ignored if there's no -file param and -help output is showed instead.

Currently users need to work around by running with a dummy -file flag.

Possible solutions for this are to use Parameter sets.

Code Execution from conf file

This should be addressed post-haste

An independent reviewer has shown that the conf lines that deal with SanitisedFileFirstLine and KeyListFirstLine can both be used to execute arbitrary code.

This is probably occurring due to some part of Strippy either invoking or executing the string to pull out it's 'final form'. This had been the MO for those configuration lines in previous versions of Strippy before I moved away from that design specifically to avoid this type of problem.

It appears I haven't scrubbed this behaviour from the code completely as the following conf line demonstrates a code execution vulnerability: ')";calc.exe;"This keylist was created at {0}.'

Probable Fix:

line 367: $out = Invoke-Expression "$($str -f $(get-date).ToString())"

Fix will be to do a manual replacement for /{\d}/ rather than using invoke-expression

Processing insight

I would like the ability to create some type of graphic or output file that shows how each of the rules were hit and perhaps even where.

I'd just like a ton of meta data and other information that would help improve the script

Don't modify the format of sanitised files

strippy breaks json file parsing by adding the sanitised header.
e.g.

This file was Sanitised at 20/03/2019 12:48:48 PM.
==

{"columnHeader":["

It also adds an additional CR/LF to the last line.

Surely the change to the file name is sufficient notice the file has been sanitised.

'Dumb' rules - replace static text

There's a usecase for removing known-bad strings. This can currently be done by creating a rule that is only a group, but a known-bad string does not need to be scouted for, just removed - so creating a group-only-rule is a waste of processing.

This could be done by creating a new config area or extending the current Rules config area. I'd like to take feedback but so few people use the tool I'm usually left to just make up my own mind.

Options:

  1. New configuration area in strippy.conf [KnownStrings]
  2. Extend Rules area and make all rules without an explicit group a 'static' rule.
    a. this is not a bad solution
  3. Extend Rules area and make users include a symbol or similar to mark the rule as a static rule.

This is likely to be a post 3.0 feature - but depending on pain felt on-site could move up in priority

Get-Help Examples are out of date.

These need to be updated. Some examples use arguments that are not valid at the moment.

For anyone that learns by example this is a bad pitfall. Please update immediately.

Better handling of Keylist

The keylist file that details what sensitive information was replaced with is currently being exported only at the end of the script. This can be problematic when the script hits an error while sanitising (like that detailed in #31) or if sanitising otherwise takes too long.

Strippy should be dumping the keylist to file as soon as it has it, rather than waiting till the very end. This would also allow a user to salvage a session by starting a new session with the generated keylist even if the original sanitisation step fails.

Invoke-Strippy the module

At the moment using the script is great and works how I want it to but it's frustrating having to find the script file and sting everything together using absolute paths. I end up copying script to the parent folder of the file/folder I want to sanitise and that's an annoying step to have to take every time I want to sanitise something.

I realise that while this can't be added to the PATH like an executable could, is there some other way that I can make it globally reachable? I suppose the strippy.exe release is a step in the right direction.

What about using powershell profiles?

Increase performance of Keylist merging

This could be done by using an ordered list of key values to compare against and then only generating key names after all unique values are found.

This should actually be easy to implement

handle unpacking of .zip archives

This should work both for a targeted .zip file and for any .zip files found inside it.

Best way to approach this is likely to be unpacking any found archives during the file discovery phase. with functionality hidden behind a config setting or switch.

Tests

Strippy is a beast. The beast has a lot of ability but sometimes doesn't actually do what was expected or coded for and as it grows will require challenges and prodding to ensure it grows correctly.

While Unit testing is nice and all I'm more interesting in black box, known in = expected output type of testing.

Pre-processing of files

There are times when a file type (I'm looking at you NPM CAS <.<) produces many many instances of a simple string. Something like an IP resolution/lookup saying IP 0.0.0.0 resolved to localhost.

These events in the logs are entirely useless but also incredibly insecure when dealing with private systems as it will give away both ips and URLs. There needs to be a way to deal with these strings without bloating out rules and making it longer to check if we already have a record of that IP.

Pre-processing of files to remove this type of line/information will speed up the processing as well. Each key we find increases the amount of time it takes to check the next key is new. when we have 100k addresses in a log file the time is increased dramatically and without many solutions.

This feature will go in after the new config file is implemented.

Log shuffling deletes shuffled log.

As per title.

I've seen Strippy run with log output that replaces a shuffled log file with a message saying its been shuffled with no other content.

I believe this should have been prepended and the prepend has failed.

please update or remove the message so that logging of Strippy continues normally

Non-standard Config

Strippy's config needs better readability and less forced structure, so I'd like to eventually move it to a custom parsed format similar to ini.

This will removed the reliance on json or any other (comparatively) bloated data format while hopefully making it less scary for people to edit.

Compile regex on startup

Compile the regex rules upon loading. For large inputs this should provide a performance boost.

Config strings aren't being eval'd

I don't think this was ever actually working and that the config file's eval was ever actually used. Until recently that is with the major work put in by #4

Config-in-script

For a while now I've disliked that strippy requires a separate config file to be properly usable. I had originally (and the code still reflects this I believe) intended to support a 'single file mode' that stored the config in the script itself.

I've had a brain wave that this could be done more easily now. Rather than needing to compile the config into powershell we could just have a delimited and commented-out area in the script that would contain the config. The idea would be that when the script looks for configuration it would check a variable before looking for the default config file. If the variable is present the script would then read the delimited area of it's own source and process that with the same logic that's used for the config file. This should be straight forward to implement.

Example of delimited comment in code

#Strippy code 

<#configstart
; Strippy Config file
; Developed for a 12.4.15 release of DCRUM
;Recurse=true
;InPlace=false
;Silent=false
;MaxThreads=5

[ Config ]
IgnoredStrings="/0:0:0:0:0:0:0:0", "0.0.0.0", "127.0.0.1", "name", "applications", ""

; These settings can use braces to include dynamic formatting: 
; {0} = Date/Time at processing
; #notimplemented {1} = Depends on context. Name of specific file being processed where relevant otherwise it's the name of the Folder/File provided to Strippy 
SanitisedFileFirstLine="This file was Sanitised at {0}.`r`n==`r`n`r`n"
KeyListFirstLine="This keylist was created at {0}."
;KeyFileName="Keylist.txt"
;AlternateKeyListOutput=".\keylist.txt"
;AlternateOutputFolder=".\SanitisedOutput"

[ Rules ]
;"Some Regex String here"="Replacement here"
"((([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]))[^\d]"="Address"
"addr=(.*?)[,&]"="Address"
"\d\sUser (\w+?) "="Username"
"Machine : (.*?); "="Hostname"
"Key User Report : section (.*?) - "="Username"
"Key User Report : section .*? - (.*?) - IP: "="Hostname"
"Key User Report : section .*? - .*? - IP: (.*?)"="Address"
"Received update event \(member (.*?),"="Address"

configend#>
#more strippy code here

The above should work really well provided the config used does not include #>. This idea needs to be tested.

Make better debug logs

do something like list all the rules at the start with generated nicknames and then referencce the nick name when the rule is being used throughout the process.

Bug: Sanitisation-Stripper stalls and doesn't reap child-jobs

While sanitizing large selections of files the script stalls with the progress bar at 98%. This does not resolve even when left for significant periods of time.

CTRL+D escapes the script and a get-job afterwards shows a job waiting to be reaped for each of the processed files.

It appears as though the reaping process is not adequate? Should look into what happens inside the sanitising-stripper function and determine how this could be improved and/or have a failout after a time limit has been exceeded.

Folder name sanitizing

Sanitizing of folder names
e.g. the folders in rumconsole\cva\workspace\logs\rest are named after the CAS IPs which could be sensitive information.

Aliases should fill 0's to prevent duplicate matches post sanitisation

Example:

Instead of Username1 and Username14
Use Username01 and Username14

Problem

Currently searching post-sanitisation (for fixes or otherwise) for 'Username1' will match 'Username1*' as well. Filling the number with zeros as necessary will ensure this doesn't happen.

This step will need to be done for each alias (or overall?) and then applied prior to the actual sanitisation step.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.