Coder Social home page Coder Social logo

averagesecurityguy / scrape Goto Github PK

View Code? Open in Web Editor NEW
68.0 6.0 11.0 82 KB

Extensible paste site scraper written in Golang.

Go 77.44% Shell 8.26% CSS 6.02% HTML 7.59% JavaScript 0.70%
pastebin golang scraper gists hacking-tool pentest osint osint-tool

scrape's Introduction

Scrape

Scrape finds interesting data in text files using keyword searches and regular expressions. Scrape pulls text files from Pastebin and Github Gists. In addition, Scrape can parse text files in a local directory. The search terms are user configurable and are stored in the config.json file. Scrape can run in the background as a service or it can run on demand.

Sources

Pastebin

To use scrape without getting blacklisted at Pastebin.com you will need to get a Lifetime Pro membership and whitelist your IP address. Scrape implements Pastebin's recommended scraping logic, which is defined at https://pastebin.com/api_scraping_faq.

Gists

To use scrape with Github Gists, you will need to create a read-only Github API key. Scrape gets the 100 most recent gists using the API endpoint described at: https://developer.github.com/v3/gists/#list-all-public-gists. At this time, no attempt is made to download truncated files or truncated content.

Local Files

To use scrape to parse files in a local directory, define the directory in the config.json file. Scrape will parse the files in batches of 100 by default. The batch size is configurable in the config.json file. Keep in mind, that after a file is processed it will be deleted from the directory.

Installation

You will first need to clone the Git repository with git clone https://github.com/averagesecurityguy/scrape. Once you have downloaded the repository, run the setup.sh script from the repository with sudo permissions. This will generate a new user called scrape and install the service.sh init script. If you already have a service account you want to use on your machine, modify the setup.sh script to disable creating the new account and modify service.sh to use the account you want..

Viewing Gathered Data

While scrape is running you can visit https://127.0.0.1:5000 to view the data that has been gathered. You will need to create a TLS certificate and key and define their locations in the config.json file. When scrape is not runnig you can use the view tool in the install directory to view scrape data.

View Command Usage

Usage:
    view filename action [arguments]

Actions:
    buckets                       Get a list of buckets.
    read <bucketname> <key>       Get the value of the key in the bucket.
    keys <bucketname>             Get a list of keys in a bucket.
    vals <bucketname>             Get a list of values in a bucket.
    search <bucketname> <string>  Get a list of keys from the bucket where the
                                  value contains the given string.

scrape's People

Contributors

averagesecurityguy avatar lukeeckley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.