Coder Social home page Coder Social logo

commonspeak's Introduction

Commonspeak is a wordlist generation tool that leverages public datasets from Google's BigQuery platform. By performing queries on large datasets that are updated frequently, commonspeak is able to generate wordlists that are "evolutionary", in the sense that they reflect the newest trends on the internet.

Commonspeak was made to generate content discovery and subdomain wordlists for use in application security testing. More details about this tool can be found here.

Requirements

Instructions

  • Install jq (sudo apt-get install jq or brew install jq)

  • Clone the repository:

    git clone https://github.com/pentest-io/commonspeak

  • Install Google Cloud SDK

  • Create a Google Cloud project to use with BigQuery (mine was named crunchbox-160315)

  • cd to the dataset you would like to pull down: cd commonspeak/hackernews

  • Run the bash script, specifying the project name as the first argument: bash hackernews-subdomains.sh crunchbox-160315

The output will be located in commonspeak/hackernews/output/compiled

Features

Commonspeak currently supports the following datasets:

  • StackOverflow, HackerNews

    • Directories
    • Filenames
    • Subdomains
  • HTTPArchive

    • Directories
    • Filenames
    • Language based directories and filenames
    • Subdomains
  • Certificate Transparency Logs

    • Subdomains
  • Collection of bash scripts that can easily be automated by using cron jobs

  • Easy to modify SQL queries for each separate dataset

Usage

Extracting the top 1 million unique subdomains from certificate transparency logs:

~/projects/commonspeak/ctldata
⟩ bash ctl-subdomains.sh crunchbox-160315
* Creating new dataset on BigQuery: crunchbox-160315:ctl_2017_12_02
* running bq mk crunchbox-160315:ctl_2017_12_02

Dataset 'crunchbox-160315:ctl_2017_12_02' successfully created.

* Running query to extract all_dns_names to ctl_2017_12_02.all_dns_names
Waiting on bqjob_r5535032cd1a736b2_000001601706601a_1 ... (139s) Current status: DONE
+----------------------------------------------+
|                  dns_names                   |
+----------------------------------------------+
| keralacinfo.com                              |
| decreask.online                              |
| www.tmbworld.com                             |
| www.metroaccess.dk                           |
| ungueskynso.gq                               |
| [...omitted for brevity...]                  |
| webdisk.forbesitservices.com                 |
| develop-cdn01.rockwoolgroup.com              |
| autodiscover.linaproperty.com.my             |
| accountserver.mydevices.thethings.industries |
+----------------------------------------------+

* Cleaning subdomains from all all_dns_names to ctl_2017_12_02.top_1m_all_dns_names
Waiting on bqjob_r236f25aea0828b3a_00000160170897a0_1 ... (657s) Current status: DONE
* Parsing results and saving to output/compiled/ctl_2017_12_02.subdomains.txt

* Compiled top 1000000 subdomains

Follow the pentester.io team on twitter

commonspeak's People

Contributors

infosec-au avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

commonspeak's Issues

ctl-lists big query datset is not awailable

Hi,
It seems that Certificate Transparency Logs("ctl-lists:ctl_data.cert_data" in the tool) dateset in Big Query is not available for public access anymore. It gives this error message when a query is executed "Error: Access Denied: Table ctl-lists:ctl_data.cert_data: The user <user name> does not have permission to query table ctl-lists:ctl_data.cert_data."

Error showing bq command not found

Hey,

While i have installed bq but while running the script its showing.

* Running query to extract all_dns_names to ctl_2017_12_05.all_dns_names ctl-subdomains.sh: line 17: bq: command not found

Please let me know about the solution.

BigQuery error?

Hi,
I can't run this script error information:

BigQuery error in query operation: Invalid value stackoverflow_2018_09_08.urls for destination_table: Cannot determine table described by stackoverflow_2018_09_08.urls

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.