Coder Social home page Coder Social logo

iq-scm / congress Goto Github PK

View Code? Open in Web Editor NEW

This project forked from unitedstates/congress

0.0 0.0 0.0 1.93 MB

Public domain data collectors for the work of Congress, including legislation, amendments, and votes.

Home Page: https://github.com/unitedstates/congress/wiki

License: Creative Commons Zero v1.0 Universal

Shell 0.71% Python 98.79% Dockerfile 0.50%

congress's Introduction

unitedstates/congress

This is a community-run project to develop Python tools to collect data about the bills, amendments, roll call votes, and other core data about the U.S. Congress into simple-to-use structured data files.

The tools include:

  • Downloading the official bulk bill status data from Congress, the official source of information on the life and times of legislation, and converting the data to an easier-to-use format.

  • Scrapers for House and Senate roll call votes.

  • A document fetcher for GovInfo.gov, which holds bill text, bill status, and other official documents, and which downloads only newly updated files.

  • A defunct THOMAS scraper for presidential nominations in Congress.

Read about the contents and schema in the documentation in the github project wiki.

This repository was originally developed by GovTrack.us and the Sunlight Foundation in 2013 (see Eric's blog post) and is currently maintained by GovTrack.us and other contributors. For more information about data in Congress, see the Congressional Data Coalition.

Setting Up

This project is tested using Python 3.

System dependencies

On Ubuntu, you'll need wget, pip, and some support packages:

sudo apt-get install git python3-dev libxml2-dev libxslt1-dev libz-dev python3-pip python3-venv

On OS X, you'll need developer tools installed (XCode), and wget.

brew install wget

Python dependencies

It's recommended you use a virtualenv (virtual environment) for development. Create a virtualenv for this project:

python3 -m venv env
source env/bin/activate

Finally, with your virtual environment activated, install the package, which will automatically pull in the Python dependencies:

pip install .

Collecting the data

The general form to start the scraping process is:

usc-run <data-type> [--force] [other options]

where data-type is one of:

To get data for bills, resolutions, and amendments, run:

usc-run govinfo --bulkdata=BILLSTATUS
usc-run bills

The bills script will output bulk data into a top-level data directory, then organized by Congress number, bill type, and bill number. Two data output files will be generated for each bill: a JSON version (data.json) and an XML version (data.xml).

Common options

Debugging messages are hidden by default. To include them, run with --log=info or --debug. To hide even warnings, run with --log=error.

To get emailed with errors, copy config.yml.example to config.yml and fill in the SMTP options. The script will automatically use the details when a parsing or execution error occurs.

The --force flag applies to all data types and supresses use of a cache for network-retreived resources.

Data Output

The script will cache downloaded pages in a top-level cache directory, and output bulk data in a top-level data directory.

Two bulk data output files will be generated for each object: a JSON version (data.json) and an XML version (data.xml). The XML version attempts to maintain backwards compatibility with the XML bulk data that GovTrack.us has provided for years. Add the --govtrack flag to get fully backward-compatible output using GovTrack IDs (otherwise the source IDs used for legislators is used).

See the project wiki for documentation on the output format.

Contributing

Pull requests with patches are awesome. Unit tests are strongly encouraged (example tests).

The best way to file a bug is to open a ticket.

Running tests

To run this project's unit tests:

./test/run

Public domain

This project is dedicated to the public domain. As spelled out in CONTRIBUTING:

The project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication.

All contributions to this project will be released under the CC0 dedication. By submitting a pull request, you are agreeing to comply with this waiver of copyright interest.

Build Status

congress's People

Contributors

acxz avatar boblannon avatar camisatx avatar connorjoleary avatar crdunwel avatar dcloud avatar divergentdave avatar dwillis avatar gphemsley avatar hugovk avatar jamesa avatar jamesturk avatar jonathanstrong avatar joshdata avatar konklone avatar lorien avatar michaelblyons avatar paultag avatar plantfansam avatar richardbx avatar ryparker avatar s4njee avatar stevesdawg avatar trentmercer avatar treymo avatar tribble avatar willvanwazer avatar wilson428 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.