Coder Social home page Coder Social logo

oca's Introduction

NYC Housing Court Filings

The OCA Data Collective regularly receives housing court filings data from the New York State Office of Court Administration (OCA). In this repository we manage the Extract-Transform-Load process for getting raw XML filings data from OCA via SFTP, parsing the nested XML data into a set of tables, and making those CSV files publicly available for download. These data are also now publicly available in XML format on the court system's website.

To work with these data you can use the NYCDB to automatically load all of the tables into a PostgreSQL database for analysis. You can also find documentation about the data, including a data dictionary on the NYCDB wiki.

The OCA Data Collective includes the Right to Counsel Coalition, BetaNYC, the Association for Neighborhood and Housing Development, the University Neighborhood Housing Program, and JustFix. It is also affiliated with the Housing Data Coalition (HDC).

Attribution

When utilizing this work, please use one of the following attributions and links:

Data from the New York State Office of Court Administration via the OCA Data Collective in collaboration with the Right to Counsel Coalition.

Data from the New York State Office of Court Administration via the OCA Data Collective. This data has been obtained and made available through the collaborative efforts of the Right to Counsel Coalition, BetaNYC, the Association for Neighborhood and Housing Development, the University Neighborhood Housing Program, and JustFix.

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Creative Commons License

CSV Files

Date Last Updated

About the data

The data we receive from OCA is an extract of all landlord and tenant cases in NYC housing court, without personally identifying information. For more details about the raw data and the final parsed tables, see /docs.

About the code

For information about the details of various components, see /lib

Setup

First, you will only be able to run this yourself if you have HDC's credentials to access to the SFTP to get the raw data transfered from OCA and access to the private AWS S3 where those files are stored.

You will need Docker.

First, you'll want to create an .env file by copying the example one:

cp .env.example .env     # Or 'copy .env.example .env' on Windows

Take a look at the .env file and fill in the AWS S3 credentials.

To run the whole process in the docker container run:

docker-compose run app

oca's People

Contributors

austensen avatar lblok avatar sraby avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

tomsb459 techelo

oca's Issues

refactor level-2 branch to use remote db for all steps

We want to be able to run this updating job via cron job in kubernetes, but the way we have it set up currently with docker-compose to run a local postrges as a second service complicates that. Rather than trying to use a second k8s job for that, we should just refactor the level-2 branch to do all the initial loading of the parsed data into the same remote aws db we use for the final data. That way we also don't have to rebuild every time from pg_dump. So the new steps would be something like:

  • upload parsed xml directly into the aws db (maybe under a different schema, though maybe this can just all be in the same)
  • export the aws db tables to csv files directly into the s3 bucket https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/postgresql-s3-export.html
  • download the address csv from s3, perform the geocoding, upload the new csv to s3
  • upload the updated geocoded address csv in the s3 directly into the db (already doing this)

unable to access csv files from s3

When trying to download any of the csv files (e.g. https://oca-2-dev.s3.amazonaws.com/public/oca_index.csv) I get this error response:

<Error>
  <Code>InvalidArgument</Code>
  <Message>Requests specifying Server Side Encryption with AWS KMS managed keys require AWS Signature Version 4.</Message>
  <ArgumentName>Authorization</ArgumentName>
  <ArgumentValue>null</ArgumentValue>
  <RequestId>AFVC85FVTCCD1BQM</RequestId>
  <HostId>t08yVTuuMb9nQdn0ar3Jfsxr8Kez6Pd/EWZJoRJneCW2hdkR5biJ+PAg4hY0qmbILYk2q5CPxiM=</HostId>
</Error>

This is also means nycdb --download oca does not work

add id columns for all tables where needed

when splitting out all the data there are some tables that don't have ID columns to uniquely identify rows. These should be added with serial postgres columns. If we always parse the raw files in the exact same order these could be consistent overtime, but I don't think they should be relied on for that. If we get new historic data then the order will be messed up.

create annual downloads

The csv files are getting really large, so it would be nice to add a series of single year zip files with all the tables with just cases with a single year (based on oca_index.fileddate). These will still need to be fully recreated with each update though, since cases can get updated/deleted at any point in the future.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.