Comments (13)
We could provide a list of predefined regexes (is that the plural? 😅). But I think I would make them opt-in.
So for example
{
"regex": [
"CaptainHook\\...\\RegexA",
"CaptainHook\\...\\RegexB"
]
}
This way we could provide an interface that people can implement to provide even more custom tests that will trigger a critical.
from captainhook.
Sounds great.
I'm alway open to add more hooks to CaptainHook
directly as long as it doesn't result in adding a lot of dependencies to the composer.json
.
For features that depend heavily on external libraries I would prefer an extra package.
from captainhook.
With the GitLab security issue and this tweet https://twitter.com/movetodevnull/status/1124924936030699520 I though about the same.
Just to see if the "conditions" fit to add the action directly into captainhook
or to create a separate repo.
as @shochdoerfer pointed out, we would need to following things:
- whitelist -- possibly as options for the action it self, so we have the in
captainhook.json
- tokens -- list of keywords to search for, so you can add the into your library and share it with the team. Also as option for the action itself
Where to look for critical values?
Now the question of "how" to define if there is something committed that shouldn't be. There are many many possibilities which I guess can be quite "overwhelming".
Just to list the formats:
- .yml
- .json
- .php
- .env
- .ini
- others?
Which all have their respective format of defining values.
Should we parse these files in there respective language? Like using php-parser, yaml reader and so on?
How to evaluate a token as critical
The problem is how to define a critical token. Purely by its present? Or do we need to validate the value as well?
This should be a critical:
// credential.config.production.php
return [
'password' => '123bvazusdv2'
];
But this?
// credential.config.global.php
return [
'password' => ''
];
As for me, the "easiest" and "dirtiest" way for now is simply use the whitelist and tokens to search by regex. But that would mean you have to commit using -n
which disables everything else or you have to add "problems" into the whitelist as well.
from captainhook.
Where to look for critical values? I'd say basically any non binary file, not sure though if we can figure that out in PHP.
How to evaluate a token as critical? whitelist + tokens sounds good to me. As an alternative we could give the diffrent regexes names and allow the user to configure an ignore-list of those tokens. Depends a bit if we want to provide a fixed set of regex'es or let the user define them on his own.
from captainhook.
It just took 4 years but version 5.19.0 will feature a BlockSecrets
Action.
I finally decided to go with a simple regex approach instead of implementing some fancy entropy calculation that kind of works but takes for ages to run.
Currently it has some basic regex pattern build in to detect AWS keys, Google keys, GitHub tokens, Stripe keys and general passwords.
If you have very specific keys, passwords or tokens in your project you can add your own patterns.
Or implement your own provider by implementing the Regex::patterns():array<string>
interface.
You can add dummy or dev passwords to an allow list.
{
"action": "\CaptainHook\App\Hook\File\Action\BlockSecrets",
"options": {
"providers": [
"\\CaptainHook\\App\\Hook\\File\\Regex\\Aws",
"\\CaptainHook\\App\\Hook\\File\\Regex\\Github",
"\\CaptainHook\\App\\Hook\\File\\Regex\\Stripe",
"\\CaptainHook\\App\\Hook\\File\\Regex\\Google",
"\\CaptainHook\\App\\Hook\\File\\Regex\\Password"
],
"blocked": [
"#password\": \"[\S]+\"#i"
],
"allowed": [
"#my-dummy-token#", "#root#"
]
}
}
from captainhook.
I'm moving the Interface
and default implementations to another Composer package. Maybe other people want to use it differently. So it will take another day or two ;)
from captainhook.
Currently still making it work for pre-push
and pre-commit
at the same time.
from captainhook.
Pretty cool! Have had this on my "eventually todo list" for a while now but somehow other things have been more important. Thanks for finally tackling this!
from captainhook.
Ok, I think I'm happy now.
The Cap'n is not checking the whole files anymore. I'm using git diff
to get the actually added content.
This kind of made it possible to do this entropy stuff again. It is now opt in and tries die detect secrets in .php
, .yml
, .json
, and .ini
files (sorry no .env
"yet")
Cap't will run the regex, if nothing is found he will do the entropy stuff if the threshold
is greater 0.
I found 3.6 - 3.8 reasonable values that had no false positives but found tokens and auto generated passwords pretty reliably.
The final configuration should look like this.
{
"action": "\CaptainHook\App\Hook\Diff\Action\BlockSecrets",
"options": {
"entropyThreshold": 3.7,
"providers": [
"\\CaptainHook\\Secrets\\Regex\\Supplier\\Aws",
"\\CaptainHook\\Secrets\\Regex\\Supplier\\Github",
"\\CaptainHook\\Secrets\\Regex\\Supplier\\Stripe",
"\\CaptainHook\\Secrets\\Regex\\Supplier\\Google",
"\\CaptainHook\\Secrets\\Regex\\Supplier\\Password"
],
"blocked": [
"#password\": \"[\S]+\"#i"
],
"allowed": [
"#my-dummy-token#", "#root#"
]
}
}
What do you think?
from captainhook.
The Cap'n is not checking the whole files anymore. I'm using git diff to get the actually added content.
Nice!
It is now opt in and tries die detect secrets in .php, .yml, .json, and .ini files (sorry no .env "yet")
Would it be possible - maybe in a 2nd step - to make that configurable? Or is there a specific reason you hardcoded this?
Maybe it makes sense to also scan Markdown files?
Pretty cool to see this in the Cap'n. One more thing that sets it apart from the competition ;)
from captainhook.
You are right. I limited it to .php, .yml, .json, and .ini files because I could use regex to find variable assignments and only check strings that have risk potential.
But skipping files without that kind of structural advantage is not a good idea.
If you provide the entropyThreshold
I will just check everything and do a brute force check for every word in the change set.
Adding a whitelist for checked file types could further improve the performance. I will check after I implemented the brute force check how necessary it might be.
from captainhook.
Finding variable assignments sounds like a neat idea!
from captainhook.
I guess the issue can be closed now :)
from captainhook.
Related Issues (20)
- [feature] allow hooks to have a label HOT 3
- Path to the composer autoload bootstrap file is not using a "realpath" HOT 6
- STAGED_FILES greedy when there is conditions HOT 1
- Using $ENV in run-exec config HOT 3
- Support for Relative Path to CaptainHook Executable in Generated Hooks HOT 5
- Allow Staged and Changed files condition to define diff-filter HOT 1
- PHP Fatal error: Argument #2 ($filter) must be of type array HOT 7
- InjectIssueKeyFromBranch does not work as expected HOT 10
- http://captainhook.info is not available anymore? HOT 1
- Support Symfony 7 HOT 2
- $CHANGED_FILES placeholder is empty for first push HOT 1
- [FEATURE-REQUEST] New placeholder `ALL_CHANGED_FILES` HOT 17
- Docker mode not working? HOT 9
- Linting fails when it shouldn't? HOT 3
- Default config file does not work HOT 3
- Error 'The "--input" option does not exist' when rerunning the hooks HOT 1
- captainhook/captainhook-phar failing, prefer captainhook/captainhook in README HOT 3
- Running Hooks in Subrepositories fails HOT 10
- Execute PHAR hook installation without an autoloader HOT 9
- [Feature] Parallel execution of actions HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from captainhook.