Coder Social home page Coder Social logo

geodacenter / opioid-policy-scan Goto Github PK

View Code? Open in Web Editor NEW
13.0 13.0 14.0 1.43 GB

The Opioid Environment Policy Scan provides access to data at multiple spatial scales to help characterize the multi-dimensional risk environment impacting opioid use in justice populations across the United States.

R 16.13% HTML 43.41% Jupyter Notebook 40.14% Python 0.32%
accessibility covid-19 health opioids public-health spatial-analysis spatial-data

opioid-policy-scan's People

Contributors

a-moosh avatar angela-li avatar balcaglayan avatar bucketteofivy avatar cavillanueva1 avatar juhe0120 avatar makosak avatar menghamo avatar mradamcox avatar nofurtherinformation avatar oliang2000 avatar qinyun-lin avatar rnvigil avatar spaykin avatar wam0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

opioid-policy-scan's Issues

Standardize shapefiles

A little overhaul of the spatial data here would be good. Some tasks include:

  • Inspect non-join columns, like state name or county name, and perhaps add more if needed. These will be columns that can be added to joined and exported data, so formatting and making them look good is important.
  • Standardize GEOIDs across spatial resolutions and time (e.g. the join field for ZTCAs should be the same name in 2010 shp as 2018).
  • Update columns in all CSVs as needed to streamline the joins.

Create all NA values

In a few CSVs, there are NA values with a lot of leading spaces, like NA. These leading spaces should be removed. Ideally, we would be able to find the source of these spaces and fix that (presumably in one of the R scripts). At this point though, cleaning up the CSVs is the top priority, ahead of making the real v2 release.

Fix variable name disagreements

There are a few disagreements between the variable names in data dictionaries and in the CSV files themselves. Creating this ticket to track their resolution ahead of the v2 release.

Ran a validation script against the table definitions (which are generated directly from the data dictionaries) and the CSVs themselves. Here is the output, i.e. the variable names that this ticket should address. Note that some of these have more to do with geometry fields, and should be fixed via #68.

VALIDATE INPUT SOURCE: csv/C_1980.csv
WARNINGS ENCOUNTERED: 0

VALIDATE INPUT SOURCE: csv/C_1990.csv
WARNINGS ENCOUNTERED: 2
  1 source columns missing from schema: Age15_24P
  1 schema fields missing from source: A15_24P

VALIDATE INPUT SOURCE: csv/C_2000.csv
WARNINGS ENCOUNTERED: 2
  1 source columns missing from schema: Age15_24P
  1 schema fields missing from source: A15_24P

VALIDATE INPUT SOURCE: csv/C_2010.csv
WARNINGS ENCOUNTERED: 2
  1 source columns missing from schema: VacP
  1 schema fields missing from source: VacantP

VALIDATE INPUT SOURCE: csv/C_Latest.csv
WARNINGS ENCOUNTERED: 1
  1 source columns missing from schema: Unnamed: 0

VALIDATE INPUT SOURCE: csv/S_1980.csv
WARNINGS ENCOUNTERED: 2
  2 source columns missing from schema: STATEFP, Age15_24P
  4 schema fields missing from source: G_STATEFP, GEOID, STUSPS, A15_24P

VALIDATE INPUT SOURCE: csv/S_1990.csv
WARNINGS ENCOUNTERED: 2
  2 source columns missing from schema: STATEFP, Age15_24P
  4 schema fields missing from source: G_STATEFP, GEOID, STUSPS, A15_24P

VALIDATE INPUT SOURCE: csv/S_2000.csv
WARNINGS ENCOUNTERED: 2
  3 source columns missing from schema: STATEFP, Age15_24P, OccP
  4 schema fields missing from source: G_STATEFP, GEOID, STUSPS, A15_24P

VALIDATE INPUT SOURCE: csv/S_2010.csv
WARNINGS ENCOUNTERED: 1
  4 schema fields missing from source: G_STATEFP, STUSPS, ChildrenP, Age18_64

VALIDATE INPUT SOURCE: csv/S_Latest.csv
WARNINGS ENCOUNTERED: 2
  3 source columns missing from schema: TotPopE, NoHSP, PrMisuse20
  3 schema fields missing from source: TotPop, NoHsP, PrMsuse20P

VALIDATE INPUT SOURCE: csv/T_1980.csv
WARNINGS ENCOUNTERED: 2
  3 source columns missing from schema: NoHSP, ChildrenP, OccP
  4 schema fields missing from source: TRACTCE, COUNTYFP, STATEFP, NoHsP

VALIDATE INPUT SOURCE: csv/T_1990.csv
WARNINGS ENCOUNTERED: 2
  4 source columns missing from schema: Age15_24P, NoHsp, ChildrenP, OccP
  5 schema fields missing from source: TRACTCE, COUNTYFP, STATEFP, A15_24P, NoHsP

VALIDATE INPUT SOURCE: csv/T_2000.csv
WARNINGS ENCOUNTERED: 2
  4 source columns missing from schema: Age15_24P, NoHsp, ChildrenP, OccP
  6 schema fields missing from source: TRACTCE, COUNTYFP, STATEFP, A15_24P, NoHsP, PciE

VALIDATE INPUT SOURCE: csv/T_2010.csv
WARNINGS ENCOUNTERED: 2
  2 source columns missing from schema: GiniCoeff, VacP
  7 schema fields missing from source: TRACTCE, COUNTYFP, STATEFP, AgeOv18, NonRelFhhP, NonRelNfhhP, VacantP

VALIDATE INPUT SOURCE: csv/T_Latest.csv
WARNINGS ENCOUNTERED: 2
  3 source columns missing from schema: TotPopE, NoHSP, FqhcMinDis
  3 schema fields missing from source: TotPop, NoHsP, MinDisFqhc

VALIDATE INPUT SOURCE: csv/Z_1980.csv
WARNINGS ENCOUNTERED: 2
  4 source columns missing from schema: ZCTA, Age55_59, Ov65P, PacIsP
  4 schema fields missing from source: GEOID, PacISP, HispP, Ovr65P

VALIDATE INPUT SOURCE: csv/Z_1990.csv
WARNINGS ENCOUNTERED: 2
  3 source columns missing from schema: ZCTA, Ov65P, PacIsP
  4 schema fields missing from source: GEOID, PacISP, HispP, Ovr65P

VALIDATE INPUT SOURCE: csv/Z_2000.csv
WARNINGS ENCOUNTERED: 2
  3 source columns missing from schema: ZCTA, Ov65P, PacIsP
  4 schema fields missing from source: GEOID, PacISP, HispP, Ovr65P

VALIDATE INPUT SOURCE: csv/Z_2010.csv
WARNINGS ENCOUNTERED: 2
  3 source columns missing from schema: PacIsP, MedInc, VacP
  2 schema fields missing from source: PacISP, VacantP

VALIDATE INPUT SOURCE: csv/Z_Latest.csv
WARNINGS ENCOUNTERED: 0

Fill out remaining 1980, 1990, and 2000 DS01 variables

The variables present in the DS01 historic data do not match the variables listed in the DS01 data tables documentation. This seems to be because Social Explorer's "Historic Census Data on 2010 Census Tracts" datasets do not include the counts needed for the DS01 data table documentation, which was likely caused by the historic censuses not aggregating their data directly into those relevant categories. But, as our historic DS01 data files are based on Social Explorer's data, we are also missing those categories.

However, there does seem to be a workaround for the some of the data. The historical censuses seem to have released dis-aggregated tract level race, ethnicity, age, and education attainment data from which most of the missing data can be reconstructed. I'm currently planning to download this data from IPUMS NHGIS and then crosswalk the data to 2010 census tracts using weights from the Longitudinal Tract Database, but have a few open questions about data comparability I wanted to track here that will need answered prior to merging these changes. Namely:

  1. The 1980 Census seems to have asked if respondents were of Spanish origin as opposed to Hispanic origin, which they started doing in 1990. Is it sufficient to simply note this discrepancy in documentation, or is there research indicating that the difference in wording heavily changed how respondents interpreted the question? (I don't currently believe this is the case, but it's probably wise to double check anyhow).
  2. The 1980 Census also reports "Years of School Completed" with categories such as "High School: 1-3 years" and "High School: 4 years," whereas future censuses report "Educational Attainment" with categories such as "9th to 12th grade, no diploma" and "High School graduate (includes equivalency)." At minimum, this means that any estimate of percent population with less than a high school diploma (for the noHSP variable) will exclude GEDs for the 1980 population but not for 1990 on. Are these sufficiently different that the 1980 Census education variable should be renamed or treated differently, or is it sufficient to just note this discrepancy in the documentation?
  3. Due to "major differences between the disability questions," the US Census Bureau does not advise comparisons disability data comparisons between the Censuses taken prior to 1990 and the 2000 Census. As an example of these discrepancies, disability data collected in the 1980 and 1990 Censuses only consider the civilian non institutionalized population of 16 years of age and older, whereas the 2000 Census considers the civilian non institutionalized population of 5 years of age and older. It is probably desirable to make the differences between the 1980/1990 and 2000 disability data as apparent as possible for end users. Towards that end, do we want to separate out the 1980 and 1990 disability data into a unique variable to reflect this difference in collection methodology?

Update homelessness variables

Revise/integrate mds a bit more to link the two estimates and related proxies (household type & homeless pop); remove ACS from point in estimate variable since that one only uses HUD.

Proposal: Split explorer branch into new repository

I've been reading through this repo and its accompanying wiki and other documentation in order to understand it better (which is thorough and really helpful), and I'd like to propose that the explorer branch, which holds the static website and is deployed through Netlify, is split into a separate repository. Here are a few reasons I think this would be a good move:

  • As it is, the explorer branch is completely independent of the main branch, and CSV files are manually copied from main to explorer for publication, or fetch calls in explorer access content in main through direct calls to https://github.raw...., so there is no functional logic to link the two (I may be missing something though).
  • Archives of the main branch are published to Zenodo (one per release) and associated with a DOI.
    • This means that release tags on this repo get tied to versions in Zenodo of the same DOI, so tagging releases of the explorer branch isn't really possible, at least without adding a good bit of confusion to the release lineage.
    • This also means that significant functional changes to the main branch, like moving the explorer codebase into it as a subdirectory, seem inappropriate.
  • Similarly, we will need to make changes to the explorer website in the future, at the very least to set netlify configs like pinning a node version (for example, the deployment/build process seems to break with node >= 17 (or so), due to an openssl issue, thanks @bucketteOfIvy for finding this), so developing on the main branch of a new repo will be generally less cumbersome and more sustainable.

We should be able to clone the branch independently into a new repo while retaining all relevant commit history--the process would be something like this not just a copy-paste into a new blank repo. This means that all past contribution history would remain intact (this is a requirement as far as I'm concerned).

I'm thinking the new repo would be something like healthyregions/oeps-explorer.

Anyway, I'm writing this ticket out mostly to get the idea in front of @spaykin and @nofurtherinformation, as you have been the main contributors to this repo, and I don't think a change like this can happen without your input. Like I mentioned, work will definitely need to happen on the explorer at some point this summer, so this is essentially a preparatory step for that.


to complete:

  • resolve explorer-debug branch (merge into explorer, or just document and delete?)
  • create new repo as described above
  • update this repo's readme as needed
  • update this repo's wiki as needed
  • look around for references to the explorer in other literature and update urls
  • setup netlify build from new repo (this may include a domain change as well)
  • delete explorer branch from this repo

Inspect and merge historic data

The historic datasets in OEPS-historical-data branch ultimately need to be merged into the main branch. Some updates to the datasets may need to be made, so that may as well take place before the merge is performed.

New metadata file is needed.

The following variables in data of the Census Tract scale do not appear in any metadata files. We will have to create new metadata files for them.

SocEcAdvIn
LimMobInd
UrbCoreInd
MicaInd

Homelessness proxy %

Using a doubled-up housing %; may require a literature review. Preferred at census tract scale and higher.

Remove/archive v1 datasets ahead of v2 release, promote new CSVs

To prepare for the upcoming release of the v2 datasets, which have been consolidated by spatial resolution and now include historical tables, we should at least move all of the old CSVs into a new directory, like data_final/v1.1 (must double-check the actual release number here), and move the contents of data_final/consolidated into a more primary location. Maybe something like data_final/v2.0/tables and data_final/v2.0/dictionaries.

There is other content in the data_final directory that we'll have to figure out what to do with, but this ticket only concerns the old and new CSVs.

Access to Internet

##B28002_001: Estimate!!Total:
##B28002_002: Estimate!!Total:!!Estimate!!Total:!!With an Internet subscription
##B28002_012: Estimate!!Total:!!Internet access without a subscription
##B28002_013: Estimate!!Total:!!No Internet access

We can calculate % of households without access to the Internet using B28002_013/B28002_001.

Reconcile out discrepancy between `master` and `main` branch

At some point in this repository's history, a main branch was created to become the new default. However, the old default, master still exists and when I preview a pull request from the old master into the new main there is a long list of commits that would be applied. I doubt we should actually be applying all of these commits, but I would like to look through them to figure out what they contain, and, if relevant, use git cherry-pick to pull them into main.

Ultimately, the master branch should be deleted.

Remove master branch

The master branch of this repo was deprecated in favor of main at somepoint, but there are a lot of hard-coded urls that still point to resources on GitHub in the master branch. These should all be removed in favor of the v1.0 tag.

Update metadata files for 2.0 release

We discussed how to approach the metadata update that is needed now that a lot of files structures and dictionaries have changed. The approach we agreed upon has two parts which are described below. Note: the new data dictionaries are in XSLX format, and can be found here: https://github.com/GeoDaCenter/opioid-policy-scan/tree/main/data_final/dictionaries. These will be helpful references for step 1 and in step 2 they will need to be updated. (These steps are more suggestion for how to go about the process, not required workflow).

This only concerns all non-geography markdown metadata files.

1 . Update all existing metadata markdown files.

The files here: https://github.com/GeoDaCenter/opioid-policy-scan/tree/main/data_final/metadata from v1 are still very much relevant after the reorganization, but they need to be updated in the following ways:

  • Update variable names where necessary
  • Update Themes where necessary
    • Match to values in the data dictionaries
    • Edit: On further inspection, these themes are handled elsewhere, outside of the markdown files. So this step will be handled in a different ticket.
  • Update author/modified like so:
    Author: <original author>
    Last Modified: <new date>
    Last Modified By: <updater>
    

2. Add Metadata Location column to the new Data Dictionaries

A new Metadata Location column will be added to each of the new XLSX data dictionaries, with a URL pointing to the GitHub-hosted location of the corresponding Markdown file for each variable row. These should be the "raw" urls referencing the main branch (we'll update to the 2.0 tag later, just before creating that release). For example:

https://raw.githubusercontent.com/GeoDaCenter/opioid-policy-scan/main/data_final/metadata/Access_FQHCs_MinDistance.md

This will also offer a good QA/QC opportunity for whether we are missing markdown files: each row must have a value for Metadata Location.

Find/create smaller ZCTA 2010 shapefile

The file we've been able to find so far from the Census Bureau is really big, abou 800mb. We don't need geometries with that much detail, so it would be better to have a generalized file to use going forward. I have been asking folks at the CB, maybe there is an official generalized file available, as seems to exist for more recent years.

If not, I think it would be acceptable to run a generalization operation on the file (in QGIS or elsewhere), as long as the contiguous boundaries are retained.

Add county & state-level access metrics for all resources

Update access metrics with county & state level resources.

These will likely have a different methodology (ex. % of tracts within 30-min distance, or average distance across tracts), but it's fine as long as documented and integrated within the same file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.