Comments (7)
Hi Jack;
I just worked through dealing with this issue, dealing with POA diagnoses. I've pasted my code below in case anyone comes across it and may find it useful.
odiags <- c(paste("odiag",1:24,sep=""))
opoas <- c(paste("opoa",1:24,sep=""))
# Calculate the total number of listed ICD9 diagnoses per patient
pt$totaldiags <- apply(pt[,diags], 1, function(x) sum(!is.na(x)))
#####
# Assign Elixhauser Comorbidity
# Subset the "Other Diagnoses" (everything except the principal diagnosis), and their corresponding POA fields
elix <- pt[,c(odiags, opoas)]
# Convert factors to characters, combine with visitIds
elix <- as.data.frame(lapply(elix, as.character), stringsAsFactors = F)
elix <- cbind(visitId = pt$visitId, elix)
# Need to drop all "Other Diagnoses" that were NOT Present on Admission
# Convert from wide to long format, identify and drop all diagnoses that were not present on admission
# Then add principal diagnosis and calculate Elixhauser before merging with main data
# Convert wide to long and rename columns
elix <- gather(elix, visitId, value)
colnames(elix) <- c("visitId", "var", "value")
# Split the odiag1-24 and opoa1-24 columns into two so that I can identify the number associated with opoa==no
elix <- elix %>%
extract(var, c('diag', 'number'),
'([a-z]+)([0-9]+)') %>%
arrange(visitId, number)
# Made a DF of just the visitId and numbers associated with diagnoses NOT Present on Admission
temp <- elix[elix$value=="No",c("visitId","number")]
# drop NAs
temp <- temp[!is.na(temp$number),]
# Assign a flag for drops
temp$drop <- TRUE
# Merge working list of ICD9s with temp DF to identify rows to drop, drop them, and simplify DF, prep for icd9ComorbidElix()
elix <- elix %>%
left_join(temp) %>%
filter(is.na(drop)) %>%
filter(diag=="odiag") %>%
select(visitId, icd9=value) %>%
filter(!is.na(icd9))
# diag_p: bring in primary diagnoses
diag_p <- pt[,c("visitId", "diag_p")]
colnames(diag_p) <- c("visitId", "icd9")
elix <- rbind(elix, diag_p)
# based on ICD9s for each patient/admission, excluding ICDs NOT Present on Admission,
# make matrix of all 30 Elixhauser categories (T/F)
elix <- as.data.frame(icd9ComorbidElix(elix, visitId="visitId", icd9Field="icd9"))
# add visitId as index and drop rownames
elix$visitId <- rownames(elix)
row.names(elix) <- NULL
# Sum the total number of positive Elixhauser categories per patient, add at end of DF)
elix <- cbind(elix, elixsum = rowSums(elix[-length(elix)]))
# Merge with main data
pt <- left_join(pt, elix)
from icd.
Thanks so much for your contribution, @anobel . Looks like you're using sqldf
and tidyr
? I've scanned the code, but will need to take a bit more time to understand it. I was thinking that, if the data for diagnoses and POA was in wide format, they would likely be all in the same row representing a single hospital admission, thus logic (e.g. POA == "N") could be applied to the POA matrix, and the resulting logical matrix could then mask in or out the diagnoses in the diagnoses matrix.
I see your goal is to sum the total number of positive Elixhauser categories. I think this could be achieved more simply following the example of the Charlson and Van Walraven scores, but counting 1 for everything, instead of weighting.
I like your use of (I think) tidyr for wide to long conversion. I wrote icd9WideToLong
before tidyr existed, but found at the time that alternatives like dplyr
were a bit cumbersome, and, as I know the data structure of the input data, it was quicker (and faster) to write ICD specific functions. The other thing is that the future ICD-10 code will optionally label the data as being ICD-9, ICD-10, ICD-10-CM, etc., and by using my own wide to long conversion, I can preserve this metadata.
from icd.
Hi @jackwasey. I just used tidyr
. The POA and DIAGs are all one the same row, one for each hospitalization, so using a matrix could work. I was interested in summing elixhauser but also keeping all 30 logical fields.
I have run into some efficiency issues; I had posted the comment above on a sample of 1,000 rows. However, the full data set I'm working with has 13 million rows, so when I attempted to apply it to the full data set, performance issues made this approach impossible (was taking about 6-8 seconds per 1k rows, and 13 million hospitalizations x 50 Diag/POA fields led to ~630 million rows during reshaping.
I posted to StackOverflow and got some good feedback:
http://stackoverflow.com/questions/34230184/tidyr-wide-to-long-repeated-measures-and-efficiency
from icd.
This is something I would like to optimize, and which a general purpose data manipulation tool will never be as good at as some custom code. I think it is probably a common data layout. Maybe tidyr or similar will end up being fast enough for your use cases. Data.table seems to be the fastest general tool, but has a bizarre syntax. Did you try that?
from icd.
I tried all the solutions people posted, and turns out the step giving me the most problems with speed was the regular expression identifying columns. Instead, I made a dataframe of column names/numbers (as they were predictable), and used that to merge back with the core data. On my system this process took a few minutes.
I think it could be generalized in a function by taking diagnosis and poa field names, along with a number representing the number of fields as arguments.
# create vector listing just the fields with diagnosis codes
diags <- c("diag_p", paste("odiag",1:24,sep=""))
odiags <- c(paste("odiag",1:24,sep=""))
opoas <- c(paste("opoa",1:24,sep=""))
# Calculate the total number of listed ICD9 diagnoses per patient
pt$totaldx <- apply(pt[,diags], 1, function(x) sum(!is.na(x)))
# Subset the "Other Diagnoses" (everything except the principal diagnosis), and their corresponding POA fields
elix <- pt[,c(odiags, opoas)]
# Convert factors to characters, combine with visitIds
elix <- as.data.frame(lapply(elix, as.character), stringsAsFactors = F)
elix <- cbind(visitId = pt$visitId, elix)
# Need to drop all "Other Diagnoses" that were NOT Present on Admission
# Convert from wide to long format, identify and drop all diagnoses that were not present on admission
# Then add principal diagnosis and calculate Elixhauser before merging with main data
# Convert wide to long and rename columns
elix <- gather(elix, visitId, value, na.rm=T)
colnames(elix) <- c("visitId", "var", "value")
elix$value <- factor(elix$value)
# Split the odiag1-24 and opoa1-24 columns into two so that I can identify the number associated with opoa==no
colsplit <- rbind(data.frame(var=paste("odiag",1:24, sep=""), var="odiag", number=1:24), data.frame(var=paste("opoa",1:24, sep=""), var="opoa", number=1:24))
# Join elix data with split column names
elix <- elix %>%
left_join(colsplit) %>%
select(-var) %>%
rename(var = var.1)
rm(colsplit)
# Made a DF of just the visitId and numbers associated with diagnoses NOT Present on Admission
temp <- elix[elix$value=="No",c("visitId","number")]
# drop NAs
temp <- temp[!is.na(temp$number),]
# Assign a flag for drops
temp$drop <- TRUE
# Merge working list of ICD9s with temp DF to identify rows to drop, drop them, and simplify DF, prep for icd9ComorbidElix()
elix <- elix %>%
left_join(temp) %>%
filter(is.na(drop)) %>%
filter(var=="odiag") %>%
select(visitId, icd9=value) %>%
filter(!is.na(icd9))
rm(temp)
# diag_p: bring in primary diagnoses
load(file="rao_workingdata/pt.rda")
diag_p <- pt[,c("visitId", "diag_p")]
colnames(diag_p) <- c("visitId", "icd9")
elix <- rbind(elix, diag_p)
rm(diag_p)
# based on ICD9s for each patient/admission, excluding ICDs NOT Present on Admission,
# make matrix of all 30 Elixhauser categories (T/F)
elix <- as.data.frame(icd9ComorbidElix(elix, visitId="visitId", icd9Field="icd9"))
# add visitId as index and drop rownames
elix$visitId <- rownames(elix)
row.names(elix) <- NULL
# Sum the total number of positive Elixhauser categories per patient, add at end of DF)
elix <- cbind(elix, elixsum = rowSums(elix[-length(elix)]))
# Merge with main data
pt <- left_join(pt, elix)
# Clean Up Environment
rm(elix, diags, odiags, opoas)
from icd.
This would be a nice thing to put in a vignette... Would you consider doing that? It would need you to generate some sample data, possibly based on the Vermont or uranium data I include in the package. I'll take the liberty of assigning this issue to you!
from icd.
I'm going to put this one to rest: I still like the idea, but as we can see, it is possible to use existing R tools to reshape data, and so I think this is out of scope. Trying to keep an already fairly big package more tightly focused. Happy to re-open if someone wants to look at using wide_to_long
as a template for a Present-On-Arrival version.
from icd.
Related Issues (20)
- dependency on curl executable should be more obvious HOT 1
- short_code argument to expand_range doesn't work for ICD10 HOT 1
- Extending to ICD-O?
- expand_range errors out on ICD-10 decimal codes
- travis build failing due to qpdf HOT 2
- Add `[.icd9cm_pc` and `[.icd10cm_pc` methods
- cannot install in R 3.6.3 or 3.6.1 HOT 1
- package pulled from CRAN HOT 1
- package no longer available on cran HOT 2
- Confusion over undefined ICD codes HOT 2
- Workaround from icd 10 - global burden of disease
- Charlson score accounting twice for the same diagnosis
- Package is down on CRAN HOT 11
- Fails to download who16? HOT 1
- update ICD-10 AHRQ comorbidities
- unable to keep track of which ICD codes are explainable and which are unexplained
- E-Valued ICD10 Codes Return Incorrect Desc or Character Length 0 within `explain_code()`
- How to download the package to a local file HOT 1
- expand_range doesn't work for certain chapters HOT 1
- Not able to install ICD package HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from icd.