Coder Social home page Coder Social logo

stevesdawg / govstat Goto Github PK

View Code? Open in Web Editor NEW
6.0 4.0 1.0 70 KB

Project that uses unitedstates/congress code to show a daily status of what happened in the government today. Also shows budget info.

Home Page: http://govstat.us

License: GNU Affero General Public License v3.0

Python 72.80% CSS 1.84% HTML 25.36%
flask python congress government government-data congress-data

govstat's Introduction

govstat

Source code for GovStat.us website.

start_page budget

Webserver is implemented using Flask on Python 3.10.

Install locally using pip install .

Dependencies:

Python dependencies will be pulled in automatically by pip.

Data Sources

Some notes on where the data for this webapp comes from. Congress data on bills and votes comes from scrapers in the unitedstates/congress repo. Budget data comes from excel files published by the White House Office of Management and Budget (OMB).

To obtain congress data, do the following:

From the root of this repo, run:

usc-run votes --congress=XXX --session=YYYY --force=True --fast=True
usc-run govinfo --bulkdata=BILLSTATUS --congress=XXX
usc-run bills

where XXX is the Congress number, and YYYY is the session number.

For example,

usc-run votes --congress=117 --session=2022 --force=True --fast=True
usc-run govinfo --bulkdata=BILLSTATUS --congress=117
usc-run bills

Budget data is carried in this repo via git-lfs.

Flask MySQL DB Creation

Start a MySQL Server.

Simple start for MariaDB:

sudo mariadb-install-db --user=mysql --basedir=/usr --datadir=/var/lib/mysql
sudo systemctl start mariadb.service

Add Configuration Information:

cp app/cfg/config.sample.json app/cfg/config.json

and edit in the appropriate values to app/cfg/config.json.

Initialize and Start flask.

flask db init
flask db migrate -m "initial migration"
flask db upgrade

Populate DB

After creating the flask MySQL DB run the following commands to populate it:

python vote_loader.py
python bill_loader.py
python budget_loader.py

Webapp Entrypoint

To launch the webapp:

gunicorn -b localhost:5000 -w 4 govstat:app
  • Host Name (localhost)
  • Port Number (5000)
  • Number of Threads/Handlers (4)
  • Flask app and entrypoint (govstat:app)

Gunicorn, NGINX, and Supervisor Configuration

See above to run gunicorn.
Specify NGINX port permissions, and forwarding for HTTP and HTTPS requests at /etc/nginx/sites-enabled/
Configure supervisor to run gunicorn app at /etc/supervisor/conf.d/
Create SSL certificates

Directory Structure

congress/
+--	govstat/
    +-- app/
        +--	Bills.py
        +-- Budget.py
        +-- config.py
        +-- __init__.py		[App instantiation, database instantiation, import functions for data loading and retrieval.]
        +-- models.py
        +-- routes.py
        +--	Votes.py
        +-- static/
        +-- templates/
    +-- govstat.py
    +-- setup.py
    +-- bill_loader.py
    +--	vote_loader.py
+--	data/
    +-- 116/
        +-- amendments/
            +-- hamdt/ [House Amendments]
                +-- hamdtN/
                    +-- [JSON and XML files]
            +-- samdt/ [Senate Amendments]
                +-- samdtN/
                    +-- [JSON and XML files]
        +-- bills/
            +-- hconres/
                +-- hconresN/
                    +-- [XML files. After processing, JSON files]
            +-- hjres/
                +-- hjresN/
                    +-- [XML files. After processing, JSON files]
            +-- hr
                +-- hrN/
                    +-- [XML files. After processing, JSON files]
            +-- hres/
                +-- hresN/
                    +-- [XML files. After processing, JSON files]
            +-- s/
                +-- sN/
                    +-- [XML files. After processing, JSON files]
            +-- sconres/
                +-- sconresN/
                    +-- [XML files. After processing, JSON files]
            +-- sjres/
                +-- sjresN/
                    +-- [XML files. After processing, JSON files]
            +-- sres/
                +-- sresN/
                    +-- [XML files. After processing, JSON files]
        +-- votes/
            +-- 2020/
                +-- hN/
                    +-- [JSON and XML files]
                +-- sN/
                    +-- [JSON and XML files]
                +-- 2021/ [One directory per year]
    +-- 117/ ... [One directory per congress session number]
    +--	hist_fy21/ [Historical data through 2021 from Office of Management and Budget (OMB)]
        +-- [51 XLSX files containing data].
    +-- supplemental/
        +-- [XLSX files containing supplemental budget data]
    +--	upcoming_house_floor/
        +-- [JSON files per week containing bill activities that week]
+-- tasks/
    +-- [PY files for each type of data that can be scraped and delivered]
    +--	[amendments, bills, committees, govinfo, nominations, votes, upcoming, etc.]
+-- scripts/
    +-- [SH scripts to transform raw JSON and XML data into forms usable for govtrack and other utilities.]
+-- cache/
+-- test/
    +-- [Test scripts, not exhaustive]
+-- contrib/

govstat's People

Contributors

acxz avatar connorjoleary avatar sseshan7 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

connorjoleary

govstat's Issues

Move away from submodules

Presently, the govstat.us web server needs to be placed inside the congress repo to function correctly. It relies on the nested directory structure to read bill and vote data contained in congress. See issue sseshan7/congress#5.

Opening an issue to decouple these and provide a path to the congress data via environment variable, or config file mentioned in #4.

Integrate logging module for webserver

Use logging module to log errors, warnings, and messages instead of relying on print statements.
Log to a specified file, perhaps one that is specified in config.json.

Change configuration file strategy

Presently, a local file - config.py - is used to store keys and credentials needed for the web app to function. If others want to install this, they have to provide their own config.py file. This can be improved greatly.

I like @acxz's suggestion to store the creds in a json file, and have config.py read from that file, with sensible defaults, and descriptive error messages to show that creds have not been set.

Create page for Election Data

There are tons of elections in this country.

Federal:

  • Congressional: Every 2 years. 1/3 of senators of every election.
  • Presidential election: Every 4 years

For every state in the US, can we compile:

  • polling data by county for presidential (minimum), and congressional races (minimum)

At any point in time, can we provide a live list of:

  • Presidential candidates by party
  • Congressional candidates by party

This can be a one-stop-shop for users trying to learn about upcoming elections. They will also be able to see their voting records and bills sponsored through votes/bills section of the site.

State:

  • Governor (gubernatorial)
  • State legislature

Each state has it's own timetable and rules for holding different elections. Can we aggregate that information nationally?
For every state in the US, can we compile:

  • Gubernatorial election dates
  • State legislature election dates
  • Polling data by county for gubernatorial (minimum), and legislature

At any point in time, can we provide a live list of:

  • Gubernatorial candidates by party (minimum)
  • State legislature candidates by party

Local:

  • City mayor
  • Other officials like judges, school board, chief of police, etc.

Once again, each municipality has a different timetable and rules. Different positions appear on the ballot in different cities/counties.
For every county in the US, can we compile:

  • Mayor election dates
  • List of other officials that get elected, and corresponding dates
  • Polling data for mayor (minimum) and other officials

At any point in time, can we provide a live list of:

  • Mayoral candidates by party

Vote numbering resets every year

Database currently assumes that the vote ID (e.g. h450 or s38) is sufficient for uniquely identifying a vote. This is wrong.

The year + vote ID is necessary and sufficient because the numbering resets every year. The session isn't necessary, but can be stored as well.

Port to python3+

Now that congress and govstat are decoupled, one can be ported to python3 independent of the other.

Dependencies:

  • gunicorn
  • xlrd
  • pandas
  • numpy
  • requests
  • flask
  • flask-sqlalchemy
  • pymysql
  • flask-migrate
  • flask-wtf

Which version of python3 to port to?

`data/hist_fy21` folder empty

Command:
gunicorn -b localhost:5000 govstat:app

[2022-02-20 10:47:40 -0500] [964739] [INFO] Starting gunicorn 20.1.0
[2022-02-20 10:47:40 -0500] [964739] [INFO] Listening at: http://127.0.0.1:5000 (964739)
[2022-02-20 10:47:40 -0500] [964739] [INFO] Using worker: sync
[2022-02-20 10:47:40 -0500] [964743] [INFO] Booting worker with pid: 964743
[2022-02-20 10:47:41 -0500] [964743] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/home/acxz/venvs/govstat-venv/lib/python3.10/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
    worker.init_process()
  File "/home/acxz/venvs/govstat-venv/lib/python3.10/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.load_wsgi()
  File "/home/acxz/venvs/govstat-venv/lib/python3.10/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/home/acxz/venvs/govstat-venv/lib/python3.10/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/home/acxz/venvs/govstat-venv/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
    return self.load_wsgiapp()
  File "/home/acxz/venvs/govstat-venv/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/home/acxz/venvs/govstat-venv/lib/python3.10/site-packages/gunicorn/util.py", line 359, in import_app
    mod = importlib.import_module(module)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/acxz/vcs/git/github/acxz/govstat/govstat.py", line 1, in <module>
    from app import app
  File "/home/acxz/vcs/git/github/acxz/govstat/app/__init__.py", line 22, in <module>
    from app import routes, models
  File "/home/acxz/vcs/git/github/acxz/govstat/app/routes.py", line 8, in <module>
    import app.Budget as Budget
  File "/home/acxz/vcs/git/github/acxz/govstat/app/Budget.py", line 13, in <module>
    EXCEL_FILES = sorted(os.listdir(EXCEL_DIR))
FileNotFoundError: [Errno 2] No such file or directory: '/home/acxz/vcs/git/github/acxz/us/data/hist_fy21'
[2022-02-20 10:47:41 -0500] [964743] [INFO] Worker exiting (pid: 964743)
[2022-02-20 10:47:42 -0500] [964739] [INFO] Shutting down: Master
[2022-02-20 10:47:42 -0500] [964739] [INFO] Reason: Worker failed to boot.

Error with accessing ordered_dates from db

Having some more struggles running this locally, requesting live support.

Command: gunicorn -b localhost:5000 govstat:app

Output:

[2022-02-22 16:18:08,212] ERROR in app: Exception on / [GET]
Traceback (most recent call last):
  File "/home/acxz/venvs/govstat-venv/lib/python3.10/site-packages/flask/app.py", line 2073, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/acxz/venvs/govstat-venv/lib/python3.10/site-packages/flask/app.py", line 1518, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/acxz/venvs/govstat-venv/lib/python3.10/site-packages/flask/app.py", line 1516, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/acxz/venvs/govstat-venv/lib/python3.10/site-packages/flask/app.py", line 1502, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "/home/acxz/vcs/git/github/stevesdawg/govstat/app/routes.py", line 47, in index
    latest_votes = Votes.return_sql_json_by_date(
  File "/home/acxz/vcs/git/github/stevesdawg/govstat/app/Votes.py", line 232, in return_sql_json_by_date
    senate_max_date = senate_ordered.first()[-1].date()
TypeError: 'NoneType' object is not subscriptable

`ModuleNotFoundError: No module named 'config'`

Error:

Traceback (most recent call last):
  File "/home/acxz/venvs/govstat-venv/bin/webapp", line 33, in <module>
    sys.exit(load_entry_point('govstat', 'console_scripts', 'webapp')())
  File "/home/acxz/venvs/govstat-venv/bin/webapp", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 162, in load
    module = import_module(match.group('module'))
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/acxz/vcs/git/github/acxz/govstat/app/__init__.py", line 5, in <module>
    import config
ModuleNotFoundError: No module named 'config'

Command to reproduce:
pip install .
webapp

Missing module `openpyxl`

Error:

acxz@archard ~/vcs/git/github/stevesdawg/govstat (git)-[acxz-run-tmp] % python budget_loader.py
Traceback (most recent call last):
  File "/home/acxz/venvs/govstat-venv/lib/python3.10/site-packages/pandas/compat/_optional.py", line 126, in import_optional_dependency
    module = importlib.import_module(name)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'openpyxl'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/acxz/vcs/git/github/stevesdawg/govstat/budget_loader.py", line 5, in <module>
    Budget.load_mysql_all_budget()
  File "/home/acxz/vcs/git/github/stevesdawg/govstat/app/Budget.py", line 216, in load_mysql_all_budget
    load_mysql_deficit_surplus()
  File "/home/acxz/vcs/git/github/stevesdawg/govstat/app/Budget.py", line 62, in load_mysql_deficit_surplus
    dataxls = pd.read_excel(os.path.join(EXCEL_DIR, BUDGET_1), index_col=None)
  File "/home/acxz/venvs/govstat-venv/lib/python3.10/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/acxz/venvs/govstat-venv/lib/python3.10/site-packages/pandas/io/excel/_base.py", line 457, in read_excel
    io = ExcelFile(io, storage_options=storage_options, engine=engine)
  File "/home/acxz/venvs/govstat-venv/lib/python3.10/site-packages/pandas/io/excel/_base.py", line 1419, in __init__
    self._reader = self._engines[engine](self._io, storage_options=storage_options)
  File "/home/acxz/venvs/govstat-venv/lib/python3.10/site-packages/pandas/io/excel/_openpyxl.py", line 524, in __init__
    import_optional_dependency("openpyxl")
  File "/home/acxz/venvs/govstat-venv/lib/python3.10/site-packages/pandas/compat/_optional.py", line 129, in import_optional_dependency
    raise ImportError(msg)

Add scrapers to pull Budget data from OMB

Follow on issue created to finish the work raised in #14.

The budget data pipeline is currently completely manual. Automating it would make this repo easier to distribute to users, allowing them to pull their own data, rather than maintainers pushing excel files to the repo.

License

I may be interested in expanding upon this repo for future projects, but the default license in Github means no one may reproduce, distribute, or create derivative works from your work, so I cannot. Could you please add a license to this project? Info about them can be found here.

My preference is the GPU GPL, but it's your call: https://choosealicense.com/licenses/gpl-3.0/

Add link to website

It would be great if you add the link to your website in the Description of the repo so that people can easily access it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.