Coder Social home page Coder Social logo

zarcolio / wwwordlist Goto Github PK

View Code? Open in Web Editor NEW
95.0 4.0 11.0 100 KB

Wwwordlist is a wordlist generator for pentesters and bug bounty hunters. It extracts words from HTML, URLs, JS/HTTP/input variables, quoted texts in the text and mail files in order to generate wordlists.

License: MIT License

Python 94.19% Shell 5.81%
bugbounty pentest hacking bruteforce wordlist wordlist-generator wordlists pentesting penetration-testing infosec

wwwordlist's Introduction

WWWordList is a wordlist-generator, it creates a wordlist by taking input from stdin and extracts words based on HTML (extracted with BS4), URLs, JS/HTTP/input variables, quoted texts found in the supplied text and mail files. It isn't a scraper or spider, so Wwwordlist is used in conjunction with a tool that facilitates the downloading of HTML, for example wget.

Why use WWWordList?

Because Twitter says you should use good wordlists, based on the content of the target. This is my attempt on creating a wordlist-generator that supports this.

Install

WWWordList should be able to run with a default Kali Linux installation with BS4 installed. To install WWWordList including BS4:

git clone https://github.com/Zarcolio/wwwordlist
cd wwwordlist
sudo bash install.sh

When using the installer in an automated environment, use the following command for an automated installation:

sudo bash install.sh -auto

If you're running into trouble running WWWordList, please drop me an issue and I'll try to fix it :)

Usage

usage: wwwordlist [-h] [-type <type>] [-case <o|l|u>] [-iwh <length>] [-iwn <length>] [-ii] 
[-idu] [-min <length>] [-max <length>]

Use WWWordList to generate a wordlist from input.

optional arguments:
  -h, -help      show this help message and exit
  -type <type>   Analyze the text between HTML tags, inside urls found, inside quoted text or in
                 the full text. Choose between httpvars|inputvars|jsvars|html|urls|quoted|full.
                 Defaults to 'full'.
  -case <o|l|u>  Apply original, lower or upper case. If no case type is specified, lower case is the
                 default. If another case is specified, lower has to be specified to be included.
                 Spearate by comma's.
  -excl <file>   Leave out the words found in this file.
  -iwh <length>  Ignore values containing a valid hexadecimal number of this length. Don't use low 
                 values as letters a-f will be filtered.
  -iwn <length>  Ignore values containing a valid decimal number of this length.
  -ii            Ignore words that are a valid integer number.
  -idu           Ignore words containing a dash or underscore, but break them in parts.
  -min <length>  Defines the minimum length of a word to add to the wordlist, defaults to 3.
  -max <length>  Defines the maximum length of a word to add to the wordlist, defaults to 10
  -mailfile      Quoted-printable decode input first. Use this option when inputting an email body.

Examples

If you want to build a wordlist based on the text between the HTML tags, simply run the following command and let the wordlist generation begin:

cat index.html|wwwordlist -type html

If you want to build a wordlist based on links inside a file, simply run:

cat index.html|wwwordlist -type urls

If you want to build a wordlist based on the text between the HTML tags, but you want it to be quite small, simply run:

cat index.html|wwwordlist -type html -ih 4 -dui -max 8

If you want to build a wordlist based on the text between the HTML tags, but you want it to be really big, simply run:

cat index.html|wwwordlist -type html -ih 4 -case o,l,u

If you want to build a wordlist based on the text from a webpage, simply run:

wget -qO - example.com|wwwordlist -type html

If you want to build a big wordlist based on whole website and run it through ffuf, try:

wget -nd -r example.com -q -E  -R woff,jpg,gif,eot,ttf,svg,png,otf,pdf,exe,zip,rar,tgz,docx,ico,jpeg
cat *.*|wwwordlist -ih 4 -case o,l,u -max 10 -full|ffuf -recursion -w - -u https://example.com/FUZZ -r

Want to throw waybackurls in the mix? Use it together with xargs together and urlcoding (warning: this will take a lot of time):

cat domains.txt | waybackurls | urlcoding -e | parallel -pipe xargs -n1 wget -T 2 -qO - | wwwordlist -ih 4

Got a Git repo cloned locally? Try the following command inside the clone folder:

find . -type f -exec strings  {} +|wwwordlist

Contribute?

Do you have some usefull additions to WWWordList:

  • PR's Welcome
  • Twitter

wwwordlist's People

Contributors

hetroublemaker avatar zarcolio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

wwwordlist's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.