broham / providencecrawler Goto Github PK
View Code? Open in Web Editor NEWCrawls GitHub looking for sensitive information that has been inadvertently committed to public repos
Crawls GitHub looking for sensitive information that has been inadvertently committed to public repos
This will be a static project that can be used to test the functions in the API wrapper. At a minimum it should have:
Since this will eventually contain sensitive information we do not want to add it to our github repo. Creating an entry for it in .gitignore will ensure this doesn't happen.
Currently we are limited to 60 requests per hour because we are making unauthenticated requests.. By authenticating we can bump this up to 5000 requests per hour per account.
Issue #4 must be completed before this can be worked on. We will include authentication information for as many accounts as we can utilize in the config.py
file and then make sure that all requests made from the github.py
file are authenticated.
There are several pieces of the code that do not meet best practices. A complete list of suggestions can be found here:
http://codereview.stackexchange.com/questions/151444/scraping-github-for-security-vulnerabilities
Issue #2 will need to be completed before this can be worked on.
Right now we aren't doing any error handling when there are bad calls to the API. This will be a good start
Make sure that issue #7 has been completed before we start this issue. This is important because we will be entering sensitive information that should not be uploaded to github for this issue.
Once we have the credentials needed to make authenticated API requests enter them into the config.py file so we can use them later.
Currently we are only pulling content for the last commited SHA. We should create a new function in github.py
that will return a list of SHAs that represents every commit in the history of the project.
Look at the getRepoSHA
function that is on line 29 as a base for this new function.
Initially I'm thinking we should look for these values:
pass
pw
pwd
passwd
key
user
username
secret
Once we have analyzed a number of files we can go back and see which terms are actually finding vulnerabilities.
This file should be added to the .gitignore
file as it will eventually contain sensitive information.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.