Coder Social home page Coder Social logo

Comments (13)

icanhazstring avatar icanhazstring commented on July 26, 2024 2

We could provide a list of predefined regexes (is that the plural? 😅). But I think I would make them opt-in.

So for example

{
    "regex": [
        "CaptainHook\\...\\RegexA", 
        "CaptainHook\\...\\RegexB" 
    ] 
} 

This way we could provide an interface that people can implement to provide even more custom tests that will trigger a critical.

from captainhook.

sebastianfeldmann avatar sebastianfeldmann commented on July 26, 2024 1

Sounds great.
I'm alway open to add more hooks to CaptainHook directly as long as it doesn't result in adding a lot of dependencies to the composer.json.
For features that depend heavily on external libraries I would prefer an extra package.

from captainhook.

icanhazstring avatar icanhazstring commented on July 26, 2024 1

With the GitLab security issue and this tweet https://twitter.com/movetodevnull/status/1124924936030699520 I though about the same.

Just to see if the "conditions" fit to add the action directly into captainhook or to create a separate repo.

as @shochdoerfer pointed out, we would need to following things:

  • whitelist -- possibly as options for the action it self, so we have the in captainhook.json
  • tokens -- list of keywords to search for, so you can add the into your library and share it with the team. Also as option for the action itself

Where to look for critical values?

Now the question of "how" to define if there is something committed that shouldn't be. There are many many possibilities which I guess can be quite "overwhelming".

Just to list the formats:

  • .yml
  • .json
  • .php
  • .env
  • .ini
  • others?

Which all have their respective format of defining values.
Should we parse these files in there respective language? Like using php-parser, yaml reader and so on?

How to evaluate a token as critical

The problem is how to define a critical token. Purely by its present? Or do we need to validate the value as well?

This should be a critical:

// credential.config.production.php
return [
    'password' => '123bvazusdv2'
];

But this?

// credential.config.global.php
return [
    'password' => ''
];

As for me, the "easiest" and "dirtiest" way for now is simply use the whitelist and tokens to search by regex. But that would mean you have to commit using -n which disables everything else or you have to add "problems" into the whitelist as well.

from captainhook.

shochdoerfer avatar shochdoerfer commented on July 26, 2024 1

Where to look for critical values? I'd say basically any non binary file, not sure though if we can figure that out in PHP.

How to evaluate a token as critical? whitelist + tokens sounds good to me. As an alternative we could give the diffrent regexes names and allow the user to configure an ignore-list of those tokens. Depends a bit if we want to provide a fixed set of regex'es or let the user define them on his own.

from captainhook.

sebastianfeldmann avatar sebastianfeldmann commented on July 26, 2024 1

It just took 4 years but version 5.19.0 will feature a BlockSecrets Action.

I finally decided to go with a simple regex approach instead of implementing some fancy entropy calculation that kind of works but takes for ages to run.

Currently it has some basic regex pattern build in to detect AWS keys, Google keys, GitHub tokens, Stripe keys and general passwords.

If you have very specific keys, passwords or tokens in your project you can add your own patterns.
Or implement your own provider by implementing the Regex::patterns():array<string> interface.
You can add dummy or dev passwords to an allow list.

{
  "action": "\CaptainHook\App\Hook\File\Action\BlockSecrets",
  "options": {
    "providers":  [
      "\\CaptainHook\\App\\Hook\\File\\Regex\\Aws",
      "\\CaptainHook\\App\\Hook\\File\\Regex\\Github",
      "\\CaptainHook\\App\\Hook\\File\\Regex\\Stripe",
      "\\CaptainHook\\App\\Hook\\File\\Regex\\Google",
      "\\CaptainHook\\App\\Hook\\File\\Regex\\Password"
    ],
    "blocked": [
       "#password\": \"[\S]+\"#i"
    ],
    "allowed": [
      "#my-dummy-token#", "#root#"
    ]
  }
}

from captainhook.

sebastianfeldmann avatar sebastianfeldmann commented on July 26, 2024 1

I'm moving the Interface and default implementations to another Composer package. Maybe other people want to use it differently. So it will take another day or two ;)

from captainhook.

sebastianfeldmann avatar sebastianfeldmann commented on July 26, 2024

Currently still making it work for pre-push and pre-commit at the same time.

from captainhook.

shochdoerfer avatar shochdoerfer commented on July 26, 2024

Pretty cool! Have had this on my "eventually todo list" for a while now but somehow other things have been more important. Thanks for finally tackling this!

from captainhook.

sebastianfeldmann avatar sebastianfeldmann commented on July 26, 2024

Ok, I think I'm happy now.

The Cap'n is not checking the whole files anymore. I'm using git diff to get the actually added content.
This kind of made it possible to do this entropy stuff again. It is now opt in and tries die detect secrets in .php, .yml, .json, and .ini files (sorry no .env "yet")

Cap't will run the regex, if nothing is found he will do the entropy stuff if the threshold is greater 0.
I found 3.6 - 3.8 reasonable values that had no false positives but found tokens and auto generated passwords pretty reliably.

The final configuration should look like this.

{
  "action": "\CaptainHook\App\Hook\Diff\Action\BlockSecrets",
  "options": {
    "entropyThreshold": 3.7,
    "providers":  [
      "\\CaptainHook\\Secrets\\Regex\\Supplier\\Aws",
      "\\CaptainHook\\Secrets\\Regex\\Supplier\\Github",
      "\\CaptainHook\\Secrets\\Regex\\Supplier\\Stripe",
      "\\CaptainHook\\Secrets\\Regex\\Supplier\\Google",
      "\\CaptainHook\\Secrets\\Regex\\Supplier\\Password"
    ],
    "blocked": [
       "#password\": \"[\S]+\"#i"
    ],
    "allowed": [
      "#my-dummy-token#", "#root#"
    ]
  }
}

What do you think?

from captainhook.

shochdoerfer avatar shochdoerfer commented on July 26, 2024

The Cap'n is not checking the whole files anymore. I'm using git diff to get the actually added content.

Nice!

It is now opt in and tries die detect secrets in .php, .yml, .json, and .ini files (sorry no .env "yet")

Would it be possible - maybe in a 2nd step - to make that configurable? Or is there a specific reason you hardcoded this?

Maybe it makes sense to also scan Markdown files?

Pretty cool to see this in the Cap'n. One more thing that sets it apart from the competition ;)

from captainhook.

sebastianfeldmann avatar sebastianfeldmann commented on July 26, 2024

You are right. I limited it to .php, .yml, .json, and .ini files because I could use regex to find variable assignments and only check strings that have risk potential.
But skipping files without that kind of structural advantage is not a good idea.
If you provide the entropyThreshold I will just check everything and do a brute force check for every word in the change set.

Adding a whitelist for checked file types could further improve the performance. I will check after I implemented the brute force check how necessary it might be.

from captainhook.

shochdoerfer avatar shochdoerfer commented on July 26, 2024

Finding variable assignments sounds like a neat idea!

from captainhook.

shochdoerfer avatar shochdoerfer commented on July 26, 2024

I guess the issue can be closed now :)

from captainhook.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.