Coder Social home page Coder Social logo

opp's Introduction

Open Policing Project (OPP)

Simple Bulk Downloads

Install python 3 if you do not have it, then follow the subsequent code from command line:

git clone https://github.com/stanford-policylab/opp.git # clone the repo
cd opp

# Example uses:
./download                            # download all locations with csv files to /tmp/opp_data
./download -h                         # see help for all commands
./download -t csv -l                  # list all locations with csv files
./download -t csv                     # download all locations with csv files to /tmp/opp_data
./download -t shapefiles -l           # list all locations with shapefiles
./download -t shapefiles              # download all locations with shapefiles to /tmp/opp_data
./download -t rds -l                  # list all locations with rds files
./download -t rds                     # download all locations with rds files to /tmp/opp_data
./download -t csv -o ~/Documents/opp  # download all locations with csv files to ~/Documents/opp
./download -t rds -o ~/Documents/opp  # download all locations with rds files to ~/Documents/opp
./download -t csv -s CA               # download all California csvs (state + city) to /tmp_opp_data
./download -c '.*beach.*'             # download all locations that have 'beach' in the city's name to /tmp/opp_data
./download -s CA -c '.*beach.*'       # download all locations in CA that have 'beach' in the city's name
./download -t rds -s CA -c 'Long Beach' -o ~/research/opp # will download the rds of Long Beach, CA data to ~/research/opp

Getting Started

Install R and clone the repository

git clone https://github.com/stanford-policylab/opp.git

Change into the repository's lib directory

cd opp/lib

Start R. The renv package should be automatically installed if not already available. Then, install the required packages using renv:

renv::restore(rebuild = TRUE)

This may take some time, as all packages must be rebuilt. For more details, see the renv package. (Note that using renv requires overriding your local .Rprofile.)

All these packages must successfully install in order to load the following main library:

source("opp.R")

Set download directory (optional); if you don't set this, it will default to /tmp/opp_data.

opp_set_download_directory('/my/data/directory')

Download some clean data

opp_download_clean_data("wa", "seattle")

Load the clean data

d <- opp_load_clean_data("wa", "seattle")

Explore!

Recreating Analyses

The easiest way to rerun all analyses from command line is the following:

./run.R --paper

However, for this to work, all the data must be downloaded and available locally. To do this we, recommend setting the data directory to a location with sufficient space and ensuring a healthy internet connection while up to 10Gb of data are downloaded. From within R, this can be done with the following:

source('opp.R')
opp_set_download_directory('/my/data/directory')
opp_download_all_clean_data()

Each analysis can also be run independently from command line:

./run.R --{disparity,marijuana,veil_of_darkness,prima_facie_stats}

They can also be run from within R code:

source('opp.R')
opp_run_{paper_analyses,disparity,marijuana_legalization_analysis,veil_of_darkness,prima_facie_stats}

Each of these effectively loads and runs the corresponding analysis script(s), which will be one of disparity.R, veil_of_darkness.R, marijuana_legalization_analysis.R, and prima_facie_stats.R. disparity.R contains both the outcome and threshold tests, which are also available as independent scripts in outcome_test.R and threshold_test.R. After running each of these, the results are saved in the opp/results directory. The analyses take anywhere from ~20 minutes to several hours to run. To run all the analyses will take about a day on a modern server.

Each of these analyses requires different subsets of the clean data and loads them using the load function defined in eligibility.R. The eligibility script contains all the filters for the data for each of the analyses. By default, the load function performs all the filters and creates the filtered dataset fresh, but it automatically saves the result to the opp/cache directory. If you run load again, you can run load(<analysis_name>, use_cache = T) to speed up load time, as it will use the post-filtered dataset from the previous run.

Reprocessing Data

Each location has it's own processing script, and these are located in opp/lib/states/state/city.R. Each script conforms to a contract that defines two methods: load_raw and clean. load_raw loads and joins all the data while making minimal changes to the raw data, while clean processes and standardizes the data to bring it into compliance with our schema defined in standards.R.

There are many convenience functions defined which can often be found in opp.R, utils.R, standardize.R, or sanitizers.R. At the end of most of these cleaning scripts there is a standarize function that adds calculated columns, selects only those columns in the schema (including those prefixed with raw_*), enforces data types (as defined in standards.R), corrects predicates (i.e. if contraband found was true but search conducted was false, contraband found is coerced to false, since nothing should be found if a search wasn't conducted -- all of these choices can be seen at the bottom of standards.R in the predicated_columns list).

If given access to the raw data, you should be able to modify the script associated with that location and run ./run.R --process --state <state> --city <city> and it will reprocess that location using the updated script.

Raw data is available upon request.

opp's People

Contributors

amyshoe avatar danjenson avatar jnu avatar jgaeb avatar kenielyao avatar tigerpaws avatar 5harad avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.