Coder Social home page Coder Social logo

opal-data-analysis's Introduction

A collection of useful materials for dealing with OPAL data extracts

How to analyse the data (R edition)

(We assume you're using RStudio.)

Source the opal.R file

Use the console window to set the directory in which the extract you're currently exploring lives.

OPALDATA <<- "~/Downloads/ohc.2015-10-26/"

You can now start using the various utility functions to explore the data - for example:

view_table(demographics)
age_distribution()
common_diagnoses()
common_tests()

opal-data-analysis's People

Contributors

davidmiller avatar michaeledwardmarks avatar

Stargazers

Stephanie Curtis avatar

Watchers

 avatar James Cloos avatar

opal-data-analysis's Issues

Work out the different lines each person has used

Done in STATA 13.1
This involves a collapse step, so preserve is strongly recommended.
We are using the line.csv file

* Import data*
clear
import delimited "line.csv",varname(1)

** Generate a 0/1 column for each type of catheter. This currently includes a few extra lines because the data is not perfectly clean. In the long term this will be fixed by making line type a drop down not a look-up list **

gen leaderflex = 0
replace leaderflex =1 if strpos(line_type,"Leader")
replace leaderflex =1 if strpos(line_type,"leder")
gen midline = 0
replace midline =1 if strpos(line_type,"Midline")
gen PICC = 0
replace PICC =1 if strpos(line_type,"PICC")
gen Peripheral = 0
replace Peripheral =1 if strpos(line_type,"Peripheral")
gen Portacath = 0
replace Portacath =1 if strpos(line_type,"Portacath")

** We are going to collapse the data by episode. This will give a score of 1 for each type of line the person used at any point across the episode. Note the collapse step therefore preserve is recommended. **
preserve
collapse (max) leaderflex midline PICC Peripheral Portacath,by(episode_id)

** Work out how many different types of line each person used **
gen numberofways = leaderflex + midline + PICC + Peripheral + Portacath

Summarise Line Types and Data

Done in STATA 13.1
Note this involves a collapse step so preserve is recommended

** Import Data **
import delimited "line.csv"
tab line_type
** This will be improved once we move line_type to==dropdown **

* Convert Date-Time to a STATA Date by extracting the data and then converting*
gen inserted_on = substr(insertion_datetime,1,10)
replace inserted_on = "." if inserted_on=="None"
gen inserted_date = date(inserted_on,"YMD")
gen removed_on = substr( removal_datetime,1,10)
replace removed_on = "." if removed_on=="None"
gen removed_date = date(removed_on,"YMD")
gen line_duration = removed_date - inserted_date

* Summarise the data by line type - NOTE PRESERVE STEP*
preserve
collapse (iqr) iqr=line_duration (p50) median=line_duration (min) min=line_duration (max) max=line_duration (mean) average=line_duration,by(line_type)

** Data table now shows the summary statistics for line duration for each line type **

** Restore Dataset **
restore

Antibiotic Days Per Drug for OPAT

Done in STATA 13.1

This involves a collapse step so a preserve step is advised

** Import Data and drop non-OPAT drugs **
import delimited "antimicrobial.csv"
drop if delivered_by=="Inpatient Team"
drop if delivered_by==""

** Generate days per prescription **
gen start = date(start_date,"YMD")
gen end = date(end_date,"YMD")
gen duration = end-start

** Summate all the durations by drug **
bysort drug: egen totaldays = sum(duration)

* Collapse the data to give the summary for each drug - NOTE PRESERVE STEP*
preserve
collapse (max)totaldays, by(drug)

** The data table now lists each drug and the total number of days it was prescribed across the whole OPAT dataset **

** Uncollapse the dataset **
restore

Length of time receving IV antibiotics from OPAT

This is my process for this.

Data is in the Antimicrobial.csv file

I did this in STATA 13.1 but clearly can be done in R.

This includes a collapse step so a preserve step is advised.

** Import data **
import delimited "antimicrobial.csv"
** Remove non OPAT drugs **
** Records where delivered by is blank are thought to be drugs imported from inpatient records where the route of administration isn't recorded **
drop if delivered_by=="Inpatient Team"
drop if delivered_by==""
** Drop drugs where route of administration == Oral **
drop if route=="Oral"
drop if route=="PO"

** convert the date strings in to dates **
gen start = date(start_date,"YMD")
gen end = date(end_date,"YMD")

** Collapse data across episode_id keeping the earliest start date and the latest finish date. NOTE PRESERVE STEP **
preserve
collapse (min)start (max)end,by(episode_id)

** Work out overall duration of non-Inpatient IV **
gen duration = end-start

** Get summary statistics **
sum duration,detail

** Uncollapse the dataset **
restore

Work out which routes each individual person received drugs

Done in STATA
Requires COLLAPSE so PRESERVE is recommended.

** We are using the antimicrobial dataset **
clear
import delimited "antimicrobial.csv"

** Drop drugs prescribed by inpatient team / where the delivered by field is blank - these are thought to also be inpatient prescriptions **
drop if delivered_by=="Inpatient Team"
drop if delivered_by==""

** Clean the delivered by data - this section needs reviewing based on a tab delivered_by. This will be fixed when Delivered_by becomes a dropdown not a lookup list **
drop if delivered_by=="in patient"
replace delivered_by="Self" if delivered_by=="self"
replace delivered_by ="Carer" if delivered_by=="Carer / DN"

** Data is currently long. We therefore look at each entry and generate a score of 1 for each category based on each individual prescription **
gen carer =0
replace carer =1 if strpos(delivered_by,"Carer")
gen DN =0
replace DN =1 if strpos(delivered_by,"District Nurse")
gen GP =0
replace GP =1 if strpos(delivered_by,"GP")
gen OPAT =0
replace OPAT =1 if strpos(delivered_by,"OPAT Clinic")
gen Self =0
replace Self =1 if strpos(delivered_by,"Self")
gen UCLHatHome =0
replace UCLHatHome =1 if strpos(delivered_by,"UCLH@Home")

* We collapse the data across episode_id. This gives a score of 1 or 0 for each patient for each of the different ways they could have received drugs - e.g 1 if any of the prescriptions were delivered by a district nurse - Note the Preserve/Collapse step*

preserve
collapse (max) carer DN GP OPAT Self UCLHatHome,by(episode_id)

** Summate the different ways a person can receive drugs giving a score for total of number of different ways they received drugs **
gen numberofways = carer + DN + GP + OPAT + Self + UCLHatHome

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.