Coder Social home page Coder Social logo

sc3 / cookcountyjail Goto Github PK

View Code? Open in Web Editor NEW
31.0 31.0 23.0 4.91 MB

A Django app that tracks the population of Cook County Jail over time and summarizes trends.

Home Page: http://cookcountyjail.recoveredfactory.net/api/1.0/?format=json

License: Other

Python 99.17% Shell 0.83%

cookcountyjail's Introduction

What is the Supreme Chi-Town Coding Crew?

We are a small group of aspiring web developers and journalists based in Chicago who meet regularly to learn basic web skills and work on data journalism and civic technology projects. Learn about SC3's principles in our manifesto.

How do I join?

Come to OpenHack at FreeGeek Chicago, every Saturday from 1pm-5pm at 3411 W. Diversey. Details available on the FreeGeek Chicago calendar.

Projects

  • Cook County Inmate Tracker - A Django-based web API that provides data about inmates in Cook County Jail by scraping for the Sheriff's Inmate Locator. Status: 2.0 version in development. 1.0 version is inefficient and complicated but stable. Both versions are available online.
  • 26th and California - Tarbell-based data visualizaton site for the Cook County Jail API. Status: In active development.
  • Townsquare 2 - A Django application for tracking volunteer participation. Status: Alpha quality, in development.
  • Chicago Birthrates - A simple application that visualizes birthrates in Chicago based on the city data portal. Status: Stable version 1.0, demo site is broken.
  • Townsquare 1 - A Drupal distribution for tracking volunteer participation. Status: In production at FreeGeek Chicago, not maintained as a public project.
  • Hopper VM - A lightweight virtual machine for apps development Status: Not maintained.

License

All files in this repository are by the work by Supreme Chi-Town Coding Crew and licensed under a Creative Commons Attribution 4.0 International License.

cookcountyjail's People

Contributors

35thlair avatar bepetersn avatar cgcalabria avatar codersquid avatar derekeder avatar eads avatar fgregg avatar joegermuska avatar nwinklareth avatar randy7771026 avatar sarahelizabeth avatar sharonelizch avatar waffle-iron avatar wilbertom avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cookcountyjail's Issues

Add parameter to filter by booking date to scraper functions.

To make scraping quicker, we need to filter for inmate records with booking date greater than some parameter, and also calculate discharge dates against the same set of records.

The handle, process_urls, and calculate_discharge_date methods in scrape_inmates.py will need to get this parameter. handle needs to get it from the command line arguments, and then pass it as a function parameter to the other two functions.

Create fabfile

Fairly low priority at the moment -- you should really just look into what fabric is and what it does. We need something, eventually, that can at the least bootstrap the database.

Integrate South?

I kind of hate south. But we probably need it because we're scraping data.

Add housing_history to API

Add full housing_history functionality to the API. Have the reverse relationships with the housing_locations displayed like court_dates.

Housing Location discrepancy

Looking at the API's JSON output and comparing it to the search page, every piece of data is correct except housing_location. @wilbertomorales and I wrote a single inmate scraper to pull one inmates data and compare it to the JSON output, and the data is scraped correctly.

So... this happens somewhere between the database and output. Anyone have any idea?

"Poll" mode for inmate scraper

The current scraper process assumes a long, full-blown search of all inmates at once-a-day intervals. But because of the possibility of short-term prison stays, we need some faster, less bandwidth intensive ways of scraping.

I suspect a mix of two techniques could help:

  1. Polling regularly for new inmates using the last known jail id #. Right now, that's http://www2.cookcountysheriff.org/search2/details.asp?jailnumber=2012-1128234 -- if you polled every X minutes for 2012-1128235, you could quickly pick up new inmates at they're added.
  2. Searching the last X days' worth of inmates every hour or two to look for recent dropouts. The value of X should be whatever duration we think hour-level precision for time of discharge matters. For long stays, precise discharge time becomes less important, and so can be excluded.

I'm curious if these should be their own management commands or should be specified by a flag to scrape_inmates.

Set up new repo for user interface, clean up this repo

The plan for the January class should be to remove UI stuff from this repository and move it to another project, probably called "fullhouse", which will be ridiculously simple: A static HTML template that anyone with git can clone and work on. Here's a general layout to start from:

  • index.html: Load the javascript, build up the app
  • js/
    • js/config.js: Global configuration variables: URL data API (defaults to fullhouse.recoveredfactory.net).
    • js/app.js: Backbone app.
  • css/
    • css/style.css: Stylez.
  • lib/: Librariez.

This will make environment issues essentially moot -- those who want to build out data viz side of project need a browser and git.

Parsing the location fields

We need to understand the location code in order to figure out whether a person is in jail, on electronic monitoring, awol, or otherwise disposed.

We might also want to do further parsing so that we could figure out which building and tier an inmate is in. This could be very nice for visualization purposes, but could also lead to insights about the internal operations of the jail.

These documents are a start: https://docs.google.com/spreadsheet/ccc?key=0AptDdMTTIKEndDdJUEhNMXhWWDhDMHVQejRWZEtoeFE https://docs.google.com/spreadsheet/ccc?key=0AptDdMTTIKEndGs3ZUdKTlpRdFVzMG1YRXFVbkVKeFE

But they don't completely square with how the Sheriff describes the jail: http://www.cookcountysheriff.org/doc/doc_DivisionsOfJail.html

For example is division 03-* the residential treatment unit or an auxillary?

Calculate BOOKING_AGE from Date of Birth and Booking date in Scraper

We want to store the person's age not the date of birth, which means we need to calculate the age from DOB in in scraper.

The right way to do this is to

1.) Subtract the DOB year from the BOOKING year. This gives BOOKING_AGE
2.) Check to see if the DOB day and month is after BOOKING day and month. If it is after, decrement BOOKING_AGE by one.

Parse Court Locations

It might be a good idea to start parsing the court locations with more detail. The objects currently look like this:
{
"id": "1",
"location": "Criminal C\nCriminal Courts Building, Room:506\n2650 South California Avenue Room: 506\nChicago, IL 60608\n"
},
If we break up the different segments in the models it will provide a better API. For example a model with:

location_name = Criminal C
branch_name = Criminal Courts Building
room_number = 506
address = 2650 South California Avenue,
city = Chicago,
state = IL
zip_code = 60608

Would make it possible to map out all the court dates currently happening at a specific location and, give us a better view of the traffic on it. Maybe not so far as to break up the city and state but far enough to separate it's zip code. With a zip code having it's own field we would be able to work at a much more abstract way.

Calculate discharge time

When processing, we'll probably want to compare the jail_ids that are harvested with jail_ids of people in the database without a discharge date. That should give us all the inmates who fell out of the system since that last time we ran the scraper. We can then assign an inferred discharge date to their record.

Too many locations

There's something funny going on with the locations.

I'm getting things like
| 683 | Maywood
Criminal Courts Building, Room:306
2650 South California Avenue Room: 306
Chicago, IL 60608
|

where the location_id 683 does NOT appear in the countyapi_courtdates table.
Right now about 280 distinct locations appear in the countyapi_courtdate table and
there are about 720 locations in the countyapi_courtlocations table.

Write tests

I need to brush up on writing Django tests.

Set up server (EBS?)

I'm leaning towards Elastic Beanstalk on this, which supports Django + MySQL.

CSV output for courtdates

Through the API, should be able to request a csv of just courtdates. This should be in the form
of

jail_id, date, court_date_id, location_id

Generate Inmate Summaries

Generate Inmate Summaries

This is a requested feature and it is easy enough to implement since we already have a table and command for it. All we need to do is implement it. Something to keep in mind is that all summaries should be a day or more behind because the cook county website seems to be behind.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.