Coder Social home page Coder Social logo

jamilmirabito / nypd-stop-question-frisk Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 1.78 MB

Black and Latinx individuals are more likely to be arrested from stop and frisk incidents than White individuals. This analysis explores the various factors that lead up to an arrest. This analysis is an exploratory classification model ranking the features that are most important in predicting likelihood of arrest from Stop and Frisk. The analysis was completed entirely in Jupyter Notebooks, with sklearn and matplotlib.pyplot being the most incremental packages for analysis and visualization.

Jupyter Notebook 100.00%
race policing

nypd-stop-question-frisk's Introduction

NYPD-Stop-Question-Frisk

Introduction

Policing crime in the United States has long been a highly controversial practice given the disproportionate surveillance, arrest, and imprisonment of individuals from communities of color. While these disparities are largely a result of systemic racism, research suggests that there are likely individual biases that may contribute to disparities in arrests as well. This analysis aims to determine which individual-level factors contribute most to an arrest in the hopes that, if biases are present, they may be addressed.

Research Questions:

  • How large are the disparities in arrests resulting from a Stop, Question, and Frisk incident?
  • What are the most common factors resulting in an arrest?

Data

  • NYC Stop, Question, and Frisk Data (2016): 2016 was the most recent year with understandable documentation
  • Sample size: 12,404
  • 113 features (before creating dummy variables)

Methods

Target Variable: The target outcome that we are predicting is arrest. For the sake of model development, we grouped arrest and summons issued into one outcome arrest given the low numbers of summons issued.

Sample Description: High-level descriptive analysis of all 2016 Stop, Question, and Frisk incidents

Model Development:

  • Statistical Analysis: Analysis to determine statistical significance of all features. This method was used to trim down the feature set to a more interpretable number of features
  • Classification model development and training:
  • 14 different iterations of logistic regression and decision tree models
  • Model performance metrics and model selection
  • Exploration of feature importance: 26 features found to be most “important”

Sample Description

Black and Hispanic individuals make up the majority of 2016 SQF incidents

download

More than 90 percent of incidents in 2016 involved males

download-1

The average suspect in SQF incidents is 28 years old

download-4

Roughly 75 percent of SQF incidents resulted in neither an arrest or issuance of a summons

download-3

Model Development

Statistical Analysis:

  • Difference of two means t-test to compare the means of arrested vs not arrested for continuous variables (e.g., age, weight)
  • Difference of 2 proportions Z test to compare the percentages of positive cases for each feature (e.g., comparing the percentage of individuals carrying a weapon in the arrested group vs the not arrested group)
  • Insignificant features (p>0.05) were pruned from later models to facilitate model interpretability

Classification Models:

  • Logistic Regression and Decision Tree models were tested for this analysis given their high level of interpretability for classification problems
  • GridSearchCV was used to tune hyperparameters for both Logistic Regression and Decision Tree Models
  • Recursive Feature Elimination was used to select the most important features from Logistic Regression Models

Model Performance:

Screen Shot 2020-10-01 at 10 23 44 PM

Model 11 was chosen due to best performance with minimum number of features. This model was a grid search decision tree model. I arrived at 24 features by conducting recursive feature elimination with a logistic regression model and then running an earlier iteration of the decision tree grid search model. I then selected the top 15 features from this model and combined them with the 15 from the RFECV logistic regression model. I inserted these features into this grid search decision tree model for optimal performance.

Twenty-nine features were found to be most predictive of an arrest. They are grouped below into a few themes in no particular order:

  • Physical characteristics: Age and weight were among the strongest predictors of arrest.
  • Police Use-of-Force: Officers tend to use force in instances where they are going to arrest an individual. Force is often used in response to resistance (in most cases).
  • Ongoing investigation: Whether the suspect was already being surveilled for involvement in a previous crime or if they fit a description of someone who committed a crime.
  • Geography: Three precincts (13, 52, and 61) were among the strongest predictors of arrest. It’s likely that racial and socioeconomic factors are tied to these precincts as well.
  • Searched: Whether a suspect was searched and if any contraband was found on their body.
  • Suspicion of Particular Crimes: Four criminal acts were strong predictors of arrest - theft of services, Trespassing, Criminal Possession of a Weapon, and Petit Larceny

Age appears similarly distributed for individuals arrested and not arrested - the difference is statistically significant

download-7

Among all individuals in SQF cases, the most common police use of force is applying handcuffs

download-9

Among suspects in ongoing investigations, roughly 20% - 25% of individuals are arrested

download-8

Precincts 13, 52, and 61 were identified as strong predictors of arrest, but only precinct 13 makes the top 30 for total arrests in 2016

download-12

  • Precinct 13: Serving a southern portion of Midtown, Manhattan. The precinct features the Peter Cooper Village/Stuyvesant Town residential complex, Gramercy Park, the lower portion of Rosehill, Madison Square Park, and Union Square Park.
  • Precinct 52: Serving a northern portion of the Bronx. The precinct is home to Bedford Park, Fordham, Kingsbridge, Norwood, Bronx Park, and University Heights.
  • Precinct 61: Serving a southern portion of Brooklyn and encompasses Kings Bay, Gravesend, Sheepshead Bay, and Manhattan Beach.

Whether someone is searched is a strong predictor of arrest

download-10

Admission by suspect results in the highest likelihood of arrest from a search

download-13

Possession of a weapon is a very strong predictor of arrest

download-6

Each of the features below make the list for optimal feature importance, but criminal trespass and theft of services result in the highest likelihood of arrest

download-13

Summary & Recommendations

Summary & Next Steps:

  • Given the data we have access to, it seems that arrests from SQF incidents are made on observable characteristics such as the presence of a weapon, involvement in a crime, or the need for police use of force. That’s not to say that bias is not present, it is just hard to measure given the data. There does, however, seem to be suggestive evidence that bias is present in the reason for SQF. While this analysis examines the factors resulting in arrest from an SQF incident, it seems that bias may be a more prevalent factor in predicting SQF rather than arrest from SQF.
  • Future work might investigate the bias in approaching individuals for SQF incidents incorporating qualitative research as well.

Recommendations:

  • Collecting qualitative data on officer biases via survey materials, focus groups, or interviews
  • Engaging with researchers and community leaders to understand how officers can better serve their communities

nypd-stop-question-frisk's People

Contributors

jamilmirabito avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.