Coder Social home page Coder Social logo

gotham-grabber's Introduction

gotham grabber

gotham-grabber is a set of scripts originally written to take the URL of a writer page on a site in the Gothamist/DNAinfo network and produce a collection of attractive PDFs of each article. It was created after the sites were abruptly shut down on Thursday, November 2, 2017. The former editor-in-chief of LAist, one of the sites in the Gothamist network, has written about the significance of that shutdown.

Since the project's inception, the scripts have been expanded to support author pages from the following news sites:

  • Gothamist (and other sites in the -ist network)
  • DNAinfo
  • LA Weekly
  • Newsweek
  • Kinja

An outer Python script, gothamgrabber.py, takes an author page URL as an argument with the flag --url, creates a directory in the out subfolder where it runs, and saves a list of article URLs. (If that list of URLs already exists, gotham-grabber.py can take it as input, using the -t or --textfile option.) It then invokes grabber.js, a node script that drives a headless Chrome instance to capture and format articles as PDFs.

grabber.js can be invoked independently. It requires an argument with the flag --url and accepts an argument with the flag --outdir.

Each script requires installation. To install, clone this repo and run:

npm install
pip install -r requirements.txt

The scripts should then be ready to run.

gotham-grabber's People

Contributors

dependabot[bot] avatar thisisparker avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

gotham-grabber's Issues

PR

I added support for a couple of websites and wanted to open a PR to commit those changes to master but said I was denied permission. Is there anyway I can get permission to this repo?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.