geodacenter / opioid-policy-scan Goto Github PK

The Opioid Environment Policy Scan provides access to data at multiple spatial scales to help characterize the multi-dimensional risk environment impacting opioid use in justice populations across the United States.

R 16.13% HTML 43.41% Jupyter Notebook 40.14% Python 0.32%

accessibility covid-19 health opioids public-health spatial-analysis spatial-data

opioid-policy-scan's People

Contributors

Stargazers

Watchers

Forkers

sterlingfearing sshresthatufts angiehetrick alexazjin barbarioli mbkranz jpustz ivn-m cavillanueva1 melissaprax bucketteofivy juhe0120 wam0

opioid-policy-scan's Issues

Standardize shapefiles

A little overhaul of the spatial data here would be good. Some tasks include:

Inspect non-join columns, like state name or county name, and perhaps add more if needed. These will be columns that can be added to joined and exported data, so formatting and making them look good is important.
Standardize GEOIDs across spatial resolutions and time (e.g. the join field for ZTCAs should be the same name in 2010 shp as 2018).
Update columns in all CSVs as needed to streamline the joins.

Add Triplicate States?

Hello! Relatively simple and powerful addition to this resource would be to add a shapefile or simple dataset of Triplicate States. See https://academic.oup.com/qje/article-abstract/137/2/1139/6427381?redirectedFrom=fulltext&login=true for the powerful contribution of this program to prescription deterrence. "We find that OxyContin distribution was more than 50% lower in “triplicate states” in the years after OxyContin's launch."

Create all NA values

In a few CSVs, there are NA values with a lot of leading spaces, like NA. These leading spaces should be removed. Ideally, we would be able to find the source of these spaces and fix that (presumably in one of the R scripts). At this point though, cleaning up the CSVs is the top priority, ahead of making the real v2 release.

Fix variable name disagreements

There are a few disagreements between the variable names in data dictionaries and in the CSV files themselves. Creating this ticket to track their resolution ahead of the v2 release.

Ran a validation script against the table definitions (which are generated directly from the data dictionaries) and the CSVs themselves. Here is the output, i.e. the variable names that this ticket should address. Note that some of these have more to do with geometry fields, and should be fixed via #68.

VALIDATE INPUT SOURCE: csv/C_1980.csv
WARNINGS ENCOUNTERED: 0

VALIDATE INPUT SOURCE: csv/C_1990.csv
WARNINGS ENCOUNTERED: 2
  1 source columns missing from schema: Age15_24P
  1 schema fields missing from source: A15_24P

VALIDATE INPUT SOURCE: csv/C_2000.csv
WARNINGS ENCOUNTERED: 2
  1 source columns missing from schema: Age15_24P
  1 schema fields missing from source: A15_24P

VALIDATE INPUT SOURCE: csv/C_2010.csv
WARNINGS ENCOUNTERED: 2
  1 source columns missing from schema: VacP
  1 schema fields missing from source: VacantP

VALIDATE INPUT SOURCE: csv/C_Latest.csv
WARNINGS ENCOUNTERED: 1
  1 source columns missing from schema: Unnamed: 0

VALIDATE INPUT SOURCE: csv/S_1980.csv
WARNINGS ENCOUNTERED: 2
  2 source columns missing from schema: STATEFP, Age15_24P
  4 schema fields missing from source: G_STATEFP, GEOID, STUSPS, A15_24P

VALIDATE INPUT SOURCE: csv/S_1990.csv
WARNINGS ENCOUNTERED: 2
  2 source columns missing from schema: STATEFP, Age15_24P
  4 schema fields missing from source: G_STATEFP, GEOID, STUSPS, A15_24P

VALIDATE INPUT SOURCE: csv/S_2000.csv
WARNINGS ENCOUNTERED: 2
  3 source columns missing from schema: STATEFP, Age15_24P, OccP
  4 schema fields missing from source: G_STATEFP, GEOID, STUSPS, A15_24P

VALIDATE INPUT SOURCE: csv/S_2010.csv
WARNINGS ENCOUNTERED: 1
  4 schema fields missing from source: G_STATEFP, STUSPS, ChildrenP, Age18_64

VALIDATE INPUT SOURCE: csv/S_Latest.csv
WARNINGS ENCOUNTERED: 2
  3 source columns missing from schema: TotPopE, NoHSP, PrMisuse20
  3 schema fields missing from source: TotPop, NoHsP, PrMsuse20P

VALIDATE INPUT SOURCE: csv/T_1980.csv
WARNINGS ENCOUNTERED: 2
  3 source columns missing from schema: NoHSP, ChildrenP, OccP
  4 schema fields missing from source: TRACTCE, COUNTYFP, STATEFP, NoHsP

VALIDATE INPUT SOURCE: csv/T_1990.csv
WARNINGS ENCOUNTERED: 2
  4 source columns missing from schema: Age15_24P, NoHsp, ChildrenP, OccP
  5 schema fields missing from source: TRACTCE, COUNTYFP, STATEFP, A15_24P, NoHsP

VALIDATE INPUT SOURCE: csv/T_2000.csv
WARNINGS ENCOUNTERED: 2
  4 source columns missing from schema: Age15_24P, NoHsp, ChildrenP, OccP
  6 schema fields missing from source: TRACTCE, COUNTYFP, STATEFP, A15_24P, NoHsP, PciE

VALIDATE INPUT SOURCE: csv/T_2010.csv
WARNINGS ENCOUNTERED: 2
  2 source columns missing from schema: GiniCoeff, VacP
  7 schema fields missing from source: TRACTCE, COUNTYFP, STATEFP, AgeOv18, NonRelFhhP, NonRelNfhhP, VacantP

VALIDATE INPUT SOURCE: csv/T_Latest.csv
WARNINGS ENCOUNTERED: 2
  3 source columns missing from schema: TotPopE, NoHSP, FqhcMinDis
  3 schema fields missing from source: TotPop, NoHsP, MinDisFqhc

VALIDATE INPUT SOURCE: csv/Z_1980.csv
WARNINGS ENCOUNTERED: 2
  4 source columns missing from schema: ZCTA, Age55_59, Ov65P, PacIsP
  4 schema fields missing from source: GEOID, PacISP, HispP, Ovr65P

VALIDATE INPUT SOURCE: csv/Z_1990.csv
WARNINGS ENCOUNTERED: 2
  3 source columns missing from schema: ZCTA, Ov65P, PacIsP
  4 schema fields missing from source: GEOID, PacISP, HispP, Ovr65P

VALIDATE INPUT SOURCE: csv/Z_2000.csv
WARNINGS ENCOUNTERED: 2
  3 source columns missing from schema: ZCTA, Ov65P, PacIsP
  4 schema fields missing from source: GEOID, PacISP, HispP, Ovr65P

VALIDATE INPUT SOURCE: csv/Z_2010.csv
WARNINGS ENCOUNTERED: 2
  3 source columns missing from schema: PacIsP, MedInc, VacP
  2 schema fields missing from source: PacISP, VacantP

VALIDATE INPUT SOURCE: csv/Z_Latest.csv
WARNINGS ENCOUNTERED: 0

Add SVI at zip scale

Opioid prescription & mortality rates

2018 data, via HepVu: https://hepvu.org/opioid-prescription-and-overdose-mortality-rates-in-the-u-s-at-the-county-level/

Public expenditures: Add county-level data

Calculate travel access metrics for additional health resources

FQHCs
Hospitals
Pharmacies
MH Providers
SUT Providers
OTPs

Fill out remaining 1980, 1990, and 2000 DS01 variables

The variables present in the DS01 historic data do not match the variables listed in the DS01 data tables documentation. This seems to be because Social Explorer's "Historic Census Data on 2010 Census Tracts" datasets do not include the counts needed for the DS01 data table documentation, which was likely caused by the historic censuses not aggregating their data directly into those relevant categories. But, as our historic DS01 data files are based on Social Explorer's data, we are also missing those categories.

However, there does seem to be a workaround for the some of the data. The historical censuses seem to have released dis-aggregated tract level race, ethnicity, age, and education attainment data from which most of the missing data can be reconstructed. I'm currently planning to download this data from IPUMS NHGIS and then crosswalk the data to 2010 census tracts using weights from the Longitudinal Tract Database, but have a few open questions about data comparability I wanted to track here that will need answered prior to merging these changes. Namely:

The 1980 Census seems to have asked if respondents were of Spanish origin as opposed to Hispanic origin, which they started doing in 1990. Is it sufficient to simply note this discrepancy in documentation, or is there research indicating that the difference in wording heavily changed how respondents interpreted the question? (I don't currently believe this is the case, but it's probably wise to double check anyhow).
The 1980 Census also reports "Years of School Completed" with categories such as "High School: 1-3 years" and "High School: 4 years," whereas future censuses report "Educational Attainment" with categories such as "9th to 12th grade, no diploma" and "High School graduate (includes equivalency)." At minimum, this means that any estimate of percent population with less than a high school diploma (for the noHSP variable) will exclude GEDs for the 1980 population but not for 1990 on. Are these sufficiently different that the 1980 Census education variable should be renamed or treated differently, or is it sufficient to just note this discrepancy in the documentation?
Due to "major differences between the disability questions," the US Census Bureau does not advise comparisons disability data comparisons between the Censuses taken prior to 1990 and the 2000 Census. As an example of these discrepancies, disability data collected in the 1980 and 1990 Censuses only consider the civilian non institutionalized population of 16 years of age and older, whereas the 2000 Census considers the civilian non institutionalized population of 5 years of age and older. It is probably desirable to make the differences between the 1980/1990 and 2000 disability data as apparent as possible for end users. Towards that end, do we want to separate out the 1980 and 1990 disability data into a unique variable to reflect this difference in collection methodology?

Access to SSP

Access to Syringe Services Programs (SSP)

See https://ehe.amfar.org/data/dist_SSP, investigate source data
Collaborate with Rural Opioids Initiative (ROI)

Develop written instructions and guidelines for admins

New data:

How to update Github repo
How to update OEPS Explorer

Capture more time periods of Opioid RXing

Request from Empower PIs

Update homelessness variables

Revise/integrate mds a bit more to link the two estimates and related proxies (household type & homeless pop); remove ACS from point in estimate variable since that one only uses HUD.

Proposal: Split explorer branch into new repository

I've been reading through this repo and its accompanying wiki and other documentation in order to understand it better (which is thorough and really helpful), and I'd like to propose that the explorer branch, which holds the static website and is deployed through Netlify, is split into a separate repository. Here are a few reasons I think this would be a good move:

As it is, the explorer branch is completely independent of the main branch, and CSV files are manually copied from main to explorer for publication, or fetch calls in explorer access content in main through direct calls to https://github.raw...., so there is no functional logic to link the two (I may be missing something though).
Archives of the main branch are published to Zenodo (one per release) and associated with a DOI.
- This means that release tags on this repo get tied to versions in Zenodo of the same DOI, so tagging releases of the explorer branch isn't really possible, at least without adding a good bit of confusion to the release lineage.
- This also means that significant functional changes to the main branch, like moving the explorer codebase into it as a subdirectory, seem inappropriate.
Similarly, we will need to make changes to the explorer website in the future, at the very least to set netlify configs like pinning a node version (for example, the deployment/build process seems to break with node >= 17 (or so), due to an openssl issue, thanks @bucketteOfIvy for finding this), so developing on the main branch of a new repo will be generally less cumbersome and more sustainable.

We should be able to clone the branch independently into a new repo while retaining all relevant commit history--the process would be something like this not just a copy-paste into a new blank repo. This means that all past contribution history would remain intact (this is a requirement as far as I'm concerned).

I'm thinking the new repo would be something like healthyregions/oeps-explorer.

Anyway, I'm writing this ticket out mostly to get the idea in front of @spaykin and @nofurtherinformation, as you have been the main contributors to this repo, and I don't think a change like this can happen without your input. Like I mentioned, work will definitely need to happen on the explorer at some point this summer, so this is essentially a preparatory step for that.

to complete:

resolve explorer-debug branch (merge into explorer, or just document and delete?)
create new repo as described above
update this repo's readme as needed
update this repo's wiki as needed
look around for references to the explorer in other literature and update urls
setup netlify build from new repo (this may include a domain change as well)
delete explorer branch from this repo

Add tutorial: How to merge OEPS data & watch for leading 0s

Inspect and merge historic data

The historic datasets in OEPS-historical-data branch ultimately need to be merged into the main branch. Some updates to the datasets may need to be made, so that may as well take place before the merge is performed.

Re-pull Homeless Census (point-in-time count) data

re-merge in the DS06 dataset (main and explorer branches)

New metadata file is needed.

The following variables in data of the Census Tract scale do not appear in any metadata files. We will have to create new metadata files for them.

SocEcAdvIn
LimMobInd
UrbCoreInd
MicaInd

County Level Expenditure Data

Homelessness proxy %

Using a doubled-up housing %; may require a literature review. Preferred at census tract scale and higher.

Create wiki - OS guidelines

Update segregation indices to more geographies

Calculate for tract, zip, and state.

Add Racial Isolation Index

We have R could to calculate this in the NJ Opioid Repo. Should be available at all scales.

Add public wiki on how to contribute and access data, data standards

Remove/archive v1 datasets ahead of v2 release, promote new CSVs

To prepare for the upcoming release of the v2 datasets, which have been consolidated by spatial resolution and now include historical tables, we should at least move all of the old CSVs into a new directory, like data_final/v1.1 (must double-check the actual release number here), and move the contents of data_final/consolidated into a more primary location. Maybe something like data_final/v2.0/tables and data_final/v2.0/dictionaries.

There is other content in the data_final directory that we'll have to figure out what to do with, but this ticket only concerns the old and new CSVs.

Access to Internet

##B28002_001: Estimate!!Total:
##B28002_002: Estimate!!Total:!!Estimate!!Total:!!With an Internet subscription
##B28002_012: Estimate!!Total:!!Internet access without a subscription
##B28002_013: Estimate!!Total:!!No Internet access

We can calculate % of households without access to the Internet using B28002_013/B28002_001.

Reconcile out discrepancy between `master` and `main` branch

At some point in this repository's history, a main branch was created to become the new default. However, the old default, master still exists and when I preview a pull request from the old master into the new main there is a long list of commits that would be applied. I doubt we should actually be applying all of these commits, but I would like to look through them to figure out what they contain, and, if relevant, use git cherry-pick to pull them into main.

Ultimately, the master branch should be deleted.

Remove master branch

The master branch of this repo was deprecated in favor of main at somepoint, but there are a lot of hard-coded urls that still point to resources on GitHub in the master branch. These should all be removed in favor of the v1.0 tag.

Zip Code Straight-line Distance to All MOUD Data Errors

Some zip codes / ZCTAs appear to have much higher distances reported. This issue is only relevant for zip codes/zcta data for the variable Distance (mi) to nearest MOUD (all)

Add Veteran % at tract to OEPS viz explorer

Showing up empty -- the others work fine.

Update metadata files for 2.0 release

We discussed how to approach the metadata update that is needed now that a lot of files structures and dictionaries have changed. The approach we agreed upon has two parts which are described below. Note: the new data dictionaries are in XSLX format, and can be found here: https://github.com/GeoDaCenter/opioid-policy-scan/tree/main/data_final/dictionaries. These will be helpful references for step 1 and in step 2 they will need to be updated. (These steps are more suggestion for how to go about the process, not required workflow).

This only concerns all non-geography markdown metadata files.

1 . Update all existing metadata markdown files.

The files here: https://github.com/GeoDaCenter/opioid-policy-scan/tree/main/data_final/metadata from v1 are still very much relevant after the reorganization, but they need to be updated in the following ways:

Update variable names where necessary
- We standardized a lot of variable names, see these lookup files for reference: https://github.com/GeoDaCenter/opioid-policy-scan/tree/main/data_raw/rename_tables
Update Themes where necessary
- Match to values in the data dictionaries
- Edit: On further inspection, these themes are handled elsewhere, outside of the markdown files. So this step will be handled in a different ticket.

Update author/modified like so:

Author: <original author>
Last Modified: <new date>
Last Modified By: <updater>

2. Add `Metadata Location` column to the new Data Dictionaries

A new Metadata Location column will be added to each of the new XLSX data dictionaries, with a URL pointing to the GitHub-hosted location of the corresponding Markdown file for each variable row. These should be the "raw" urls referencing the main branch (we'll update to the 2.0 tag later, just before creating that release). For example:

https://raw.githubusercontent.com/GeoDaCenter/opioid-policy-scan/main/data_final/metadata/Access_FQHCs_MinDistance.md

This will also offer a good QA/QC opportunity for whether we are missing markdown files: each row must have a value for Metadata Location.

Zip Code - Access Metrics outliers

Walking: outliers in bup, nal

Access to Drug treatment programs (all)

Scrape SAMHSA data - all treatment programs
Calculate access metrics

Hepatitis C: Add county-level rates

ruca map

@Makosak -- FYI, this is the script for creating the urban/rural/suburban stratification maps: ruca_map.r

Veteran, Armed Forces %

Need to add % active and veteran armed forces % by tract +

Find/create smaller ZCTA 2010 shapefile

The file we've been able to find so far from the Census Bureau is really big, abou 800mb. We don't need geometries with that much detail, so it would be better to have a generalized file to use going forward. I have been asking folks at the CB, maybe there is an official generalized file available, as seems to exist for more recent years.

If not, I think it would be acceptable to run a generalization operation on the file (in QGIS or elsewhere), as long as the contiguous boundaries are retained.

Explore & add relevant variables: Area Health Resource Files

HRSA Area Health Resource Files: https://data.hrsa.gov/topics/health-workforce/ahrf

The Area Health Resources Files (AHRF) include data on Health Care Professions, Health Facilities, Population Characteristics, Economics, Health Professions Training, Hospital Utilization, Hospital Expenditures, and Environment at the county, state and national levels, from over 50 data sources.

geodacenter / opioid-policy-scan Goto Github PK

opioid-policy-scan's People

Contributors

Stargazers

Watchers

Forkers

opioid-policy-scan's Issues

1 . Update all existing metadata markdown files.

2. Add Metadata Location column to the new Data Dictionaries

Recommend Projects

Recommend Topics

Recommend Org

2. Add `Metadata Location` column to the new Data Dictionaries