Coder Social home page Coder Social logo

crahal / nhsspend Goto Github PK

View Code? Open in Web Editor NEW
4.0 3.0 2.0 493.92 MB

A home for the NHSSpend Library

License: GNU General Public License v3.0

Python 15.81% Jupyter Notebook 36.32% HTML 47.87%
civic-tech procurement-data nhs-institutions scraper parser entity-recognition

nhsspend's Introduction

๐Ÿ“Š NHSSpend: Tools and data for NHS procurement ๐Ÿ“ˆ

coverage Generic badge Generic badge Generic badge Generic badge DOI

Introduction

This is a library to scrape and reconcile all payments made by a hiarcharcy of NHS institutions over time. It is the final of three projects on public procurement data (the first two of which were centgovspend and TSRC-NCVO-CSDP). Code for an interactive dashboard is found at src/dashboard, and an extremely unfinished prototype of the dashboard itself is at:

http://nhsspend.org/

with the help of Ian M. Knowles. Links to open-access (OSF) versions of the two headline academic papers ("The Role of Non-Profits in Public Health Service Provision: Evidence from 25,338 heterogeneous procurement datasets" with John Mohan and "Is outsourcing healthcare services to the private sector associated with higher mortality rates? An observational analysis of privatisation in England's NHS, 2013-2020" with Ben Goodair and Aaron Reeves) will be hosted on the Open Science Framework (OSF) in due course, and linked here. A full, build passing notebook for the first of these two papers can be found here. If you would like to collaborate on these or related work, please don't hestiate to get in touch! Two spin-off repositories specifically for pdf-parsing and institutional data curation can be found here and here respectively.

Pre-reqs

NHSSpend tries to minimize the number of pre-requisite installations outside of the standard library, and we recommend an Anaconda installation to provide a comprehensive set of basic tools. However, a couple are necessary due to the magnitude of the undertaking. These include a range of modules found in the requirements.txt file (generated by pipreqs). The pdfparser is based on a version of the pdftableparser library, and the Charity Commission data is extracted using the charity-commission-extract library from NCVO. The Elasticsearch functionality is a custom implementation.

Data Origination

The data originates from one of two lists of recognised NHS institutions (Trusts and CCGs) and the main NHS England data provision page. These lists are used to create mappings to websites, and update on the status of the data (data/data_support/ccg_list.xlsx and data/data_support/trust_list.xlsx) with a number of different parametres fed into the scraper (src/NHSscraper.py). The data curation exercise has stopped as of April 2020 in order to focus on the analysis of the data, with the compresse datasets found in data/merged/* subdirectory of this repository). This is also partly due to the Covid-19 pandemic and the restructuring of Clinical Commissioning groups more generally (where 18 mergers took the number of CCGs from 191 to 136). However, please do raise issues on here if you think any of those institutions are mislabelled, or outdated. If you want to update this list (and the subsequent scrapers), please do raise an issue\get in touch (this is a constant ongoing work in progress until there is a centrally covened resource provided by the Government Data Service).

The procurement data itself is provided under an Open Government License (OGL). Guidance for publishing spend over ยฃ25,000 is published by HM Treasury.

Reconciliation

The es_configure.md describes the reconciliation approach. These reconciliations are then manually verified and merged back into the procurement data.

Clean, Reconciled Data

It is possible that you are reading this most interested in a copy of the output data! A link to the scraped, parsed, cleaned and reconciled can be found at NHSSpend/data/data_final. Please see the readme.md in that subdirectory for information on each of the fields.

Structure

Repo structure is based on the tree utility.

โ”œ readme.md โ”œ es_configure.md
โ”œ requirements.txt
โ”œ src
โ”‚ย ย  โ”” analysis
โ”‚ย ย  โ”‚ย ย  โ”œ charity_analysis_notebook.ipynb
โ”‚ย ย  โ”‚ย ย  โ”œ general_analysis_functions.py
โ”‚ย ย  โ”‚ย ย  โ”œ helper_functions.py
โ”‚ย ย  โ”‚ย ย  โ”œ charity_analysis_functions.py
โ”‚ย ย  โ”œ scrape_and_parse_ccgs.py
โ”‚ย ย  โ”œ scrape_and_parse_trusts.py
โ”‚ย ย  โ”œ scraping_tools.py
โ”‚ย ย  โ”œ generate_output.py
โ”‚ย ย  โ”œ ingest_everything.py
โ”‚ย ย  โ”œ merge_and_evaluate_tools.py
โ”‚ย ย  โ”œ NHSSpend.py
โ”‚ย ย  โ”œ parsing_tools.py
โ”‚ย ย  โ”œ pdf_table_parser.py
โ”‚ย ย  โ”œ preconciliation.py
โ”œ dashboard
โ”œ data
โ”‚ย ย  โ”” data_support/*
โ”‚ย ย  โ”” data_cc/*
โ”‚ย ย  โ”” data_ch/*
โ”‚ย ย  โ”” data_dashboard/*
โ”‚ย ย  โ”” data_final/*
โ”‚ย ย  โ”” data_masteringest/*
โ”‚ย ย  โ”” data_merge/*
โ”‚ย ย  โ”” data_nhsccgs/*
โ”‚ย ย  โ”” data_nhsdigital/*
โ”‚ย ย  โ”” data_nhsengland/*
โ”‚ย ย  โ”” data_nhstrusts/*
โ”‚ย ย  โ”” data_reconciled/*
โ”‚ย ย  โ”” data_shapefiles/*
โ”‚ย ย  โ”” data_summary/*
โ”œ papers
โ”‚ย ย  โ”” corporate_networks
โ”‚ย ย  โ”” figures
โ”‚ย ย  โ”” tables
โ”‚ย ย  โ”” third_sector
โ”œ logging
โ”‚ย ย  โ”‚ย ย  โ”œ nhsspend.log
โ”‚ย ย  โ”” eval_logs
โ”œ tokens

Acknowledgements.

This work was primarily funded by the [British Academy]. In addition to this, generous funding was provided by John Mohan for the undertaking of a 'data audit' by Steve Barnard. An earlier 'proof of concept' of the project was funded by ESRC Grant ES/M010392/1 (PI John Mohan) and undertaken at the Third Sector Research Sector. Additional thanks are due to Max Hattersly, Ben Goodair and Yu Pei for all of their work on data verification.

Licensing

This code is made available under a GNU GENERAL PUBLIC LICENSE 3.0.

TODO:

  • More docstrings
  • Publish related academic papers

Last updated: 2021-07-01

nhsspend's People

Contributors

crahal avatar dependabot[bot] avatar ianknowles avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

nhsspend's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.