mit-lcp / mimic-code Goto Github PK

MIMIC Code Repository: Code shared by the research community for the MIMIC family of databases

License: MIT License

Shell 0.74% Python 1.22% Makefile 0.20% PLpgSQL 0.55% Jupyter Notebook 63.78% HTML 32.62% TeX 0.72% Visual Basic 6.0 0.17%

mimic-iii critical-care icu physionet

mimic-code's Issues

Patients Urine Output - questions

Dear all,
I have some doubts regarding the codes that should be used to analyse the urine output of the patient. I am interested in study patients with kidney injuries and I need to verify the level of the urine output volume. I need in the mL/kg/hr units. Can I use 51108/51109 from LABEVENTS or should be use another code from IOEVENTS?
Another question, what is the difference between the codes 51108 and 51109? They seem equal, in the value, date and units.
If we have to use the LABEVENTS code, how can we have the urine output in mL/kg/hr? To have in mL/kg it is just needed to divide by the weight of the patient, but what is the sample time rate of the measurements from 51108 and 51109?
Thank you!
Sincerely,
Vanessa Cunha

Oracle build script for noteevents fails due to newlines (\n) in the text

Oracle SQL Loader doesn't appear to support the RFC 4180 format for CSVs, so the control file for loading noteevents fails.

The issue is that the text field contains newline (\n) characters, which Oracle treats as the end of a record. Surprisingly Oracle doesn't seem to offer a simple solution for this.

Remove the MIMIC-II v3.0 repo at https://github.com/mimic2/v3.0

We initially created a MIMIC-II v3.0 repository at: https://github.com/mimic2/v3.0 to hold documentation and web content for the latest version of MIMIC. This repository has been superseded by https://github.com/MIT-LCP/mimic-website

We should:

ensure all of the issues posted at: https://github.com/mimic2/v3.0/issues?utf8=%E2%9C%93&q= are archived
remove the content at https://github.com/mimic2/v3.0 and point people to the new website at http://mimic.physionet.org/

Choosing a favoured database system for the public version of MIMIC

Trying to support multiple flavours of database system creates a fair amount of extra work, mostly unnecessarily. Up until now we have favoured PostgreSQL as our public database system. Switching to MySQL may be worth considering, mainly because it would move us in line with @pszolovits's group.

Some things to consider:

how much does the syntax differ from PostgreSQL? e.g. with clauses are not supported?
which syntax is most similar to Oracle: PostgreSQL or MySQL? Relevant while we continue to use Oracle internally.
are there any important features that would justify choosing one system over another?
which system is preferred by the research community?

Rewritten/Changed/Stopped differences in INPUTEVENTS_MV

Can someone please clarify the roles of CANCELREASON and STATUSDESCRIPTION in the INPUTEVENTS_MV table.
From what I have seen CANCELREASON can be 0, 1 or 2 - what is the difference between 1 and 2?
And STATUSDESCRIPTION can be Rewrtten, Changed or Stopped - what is the difference and why does the AMOUNT field sometimes match the ORIGINALAMOUNT even though the order was rewritten/changed/stopped?

I've also noticed that CANCELREASON and STATUSSDESCRIPTION don't always match. That is rewritten orders can have CANCELREASON of 0, 1 or 2.

expire_flag column in patients table should be numerical data type

The build script for the patients table creates the 'expire_flag' as a varchar(5), probably because the flag used to be a 'y/n' string: https://github.com/MIT-LCP/mimic-code/blob/master/buildmimic/postgres/postgres_create_tables.sql

    EXPIRE_FLAG VARCHAR(5) NOT NULL,

It would make sense to change the column to a numerical type to avoid people getting the following error when querying the column in the usual way for a number:

SELECT * 
FROM patients 
WHERE expire_flag = 1;

-- Returns error:
-- ERROR:  operator does not exist: character varying = integer
-- LINE 1: select * from patients where expire_flag = 1;

Duplicated rows in IOEVENTS

It appears that many events are duplicated many times (up to 24 times).

This figure shows the same order, by ORDER ID, listed 24 times with identical fields. I strongly doubt that 24 * 204 mL was actually given. In an even more extreme case 5000mL is given 9 times at the same time.

These are not cases where the order is split over multiple rows and the component volumes total the ordered volume. These and all of the other cases of 2 or 3 duplicates I have inspected have been perfect replicates.

I suggest this issue should be addressed in the next build and release. In the mean time, care should be taken to filter by unique ORDERID before any summation.

hospstay_seq is generated incorrectly in icustay_detail

Mike found a likely bug in the icustay_detail generation (https://github.com/MIT-LCP/mimic-code/blob/master/demographics/postgres/icustay_detail.sql), where the hospstay_seq is generated incorrectly and always gives a 1 for some patients. See discrepancy below.

select count(1) 
from icustay_detail 
where hospstay_seq = 1 
and icustay_seq = 1;

 count 
-------
 57328

select count(distinct(subject_id)) 
from icustay_detail 
where hospstay_seq = 1 
and icustay_seq = 1;

 count 
-------
 46428

select * 
from icustay_detail 
where subject_id = 41976;

 subject_id | hadm_id | icustay_id |      admittime      | hospstay_seq |       intime        | los_icu | icustay_seq 
------------+---------+------------+---------------------+--------------+---------------------+---------+-------------+
... etc
(15 rows)

mysql script has an extra comma

The mysql script has an extra comma on line 74, which causes error when loading data to mysql.

MIMICIII v1.3 - transfers.dbsource - NULL rows

Hello,

Field "transfers.dbsource" has a NOT NULL constraint. However, 241 rows in the data are empty. (eg: row_id = 3831; dbsource = NULL)

Thanks,

The ideal MIMIC-III system

I'm looking for the ideal specs for a system that could be used to run MIMIC-III. The aim is to keep costs down while maximising performance, stability and query speed. As an added bonus I'd like to run various types of analysis software, and to run software on the waveform database.

Specific to the MIMIC-III database, I'm interested in knowing the ideal:

operating system
SQL software
RAM
CPU

The main reason I ask is because I've tried importing the dataset in to Postgres on my laptop with windows and it's crashed a few times. I tried installing an Ubuntu virtual system on various computers but have run in to issues with crashing/speed. I'm looking to put together a dedicated system for MIMIC-III and other large datasets, but if something like a macbook is equally as useful I'll even consider purchasing one.

47 diagnoses_icd.sequence are null. Why?

I see 47 diagnoses_icd.hadm_id are null:

SELECT count(*) FROM  mimiciii.diagnoses_icd  WHERE diagnoses_icd.seq_num is null;

Bug or intended?

Creating "icustay_detail" on Mimic III

How can I create icustay_detail view/table on mimic III? Is there a script that can generate that?

Thanks!

Error in HOSPITAL_EXPIRE_FLAG column linking to DOD_HOSP in PATIENTS table

Loading the raw PATIENTS_DATA_TABLE.csv into RStudio, sorting by DOD_HOSP and filtering by HOSPITAL_EXPIRE_FLAG == 'N' finds many, many cases (~4160) where the DOD_HOSP is specified but wrongly linked to the FLAG field.

Examples shown below:

Bug-fix for the next build.

MIMIC website has wrong information about diagnoses_icd

On this page, https://mimic.physionet.org/mimictables/diagnoses_icd/, it is said that table diagnoses_icd has a column "description" of type VARCHAR(30); it is missing from the actual csv file, but still referred to by the indexing script posted here: https://github.com/MIT-LCP/mimic-code/blob/master/buildmimic/postgres/postgres_add_indexes.sql

When running the indexing script, the error message shown by Postgres is as follows:
psql:postgres_add_indexes.sql:29: ERROR: column description does not exist

Since d_icd_diagnoses includes both short and long title of each icd-9 code used in the diagnoses_icd table, I recommend that modifying the MIMIC website page to remove the description entry.

Date-shifting masking number of alcoholic beverages per week in discharge summary

I have found cases where the data-shifting algorithm masks the number of alcoholic beverages per week that may be of some clinical significance in the discharge summaries. Low priority issue.
An example (with added spaces to prevent markdown styling):

She has approximately [* * 11-26 * *] alcoholic beverages per week, does not smoke..."

HADM_ID = 120167

ICUSTAY_IDs with set tidal volume recording but no MV billing code. Why?

Why are billing codes more reliable? At least 29% of the ICU stays I have TV data for do not have a ventilation billing code recorded during the duration of the ICU stay. Does it make sense to exclude these ICU stays?

Please double check this. This is not good at all if accurate. There are two possibilities I can think of to explain this is that the patients with TV data but no billing codes for MV were on NIV. The first is that MV in the operating room has a different billing code and this was not given to us. Patients who had open heart surgery would have this billing code and not the one that we have.
The other possibility is that sometimes, TV is recorded when patients are on NIV even though these measurements are highly inaccurate. Can we check if they have billing codes for NIV?
If billing codes indeed do not capture if patients were on MV, we will have to abandon it. We should review the discharge summaries of a sample of the patients with TV but no billing codes for MV.
We should have a conference call once we've done these checks.

I rechecked my work, and 3857/14759 (26.1%) of ICUSTAY_IDs with a set TV recording have no billing code for IMV. Including NIV billing codes, 3701/14759 (25.1%) of ICUSTAY_IDs with a set TV are still unaccounted for. Next, if I subset the set TV recordings by AC/CMV only, a smaller proportion - 1424/8749 (16.3%) - of ICUSTAY_IDs are missing MV billing codes. On the flip side, every ICUSTAY_ID with a MV billing code has at least 1 set TV recording.

I've mapped billing codes to ICUSTAY_IDs based on whether the billing CHARTDATE was made on or after INTIME and on or before OUTTIME of the ICU stay. I've lost 4% of the MV billing codes this way, but it's not enough to explain these results.

Here you may review the discharge notes of the patients with a set TV recording but no IMV billing code. There's a handful of duplicate notes, for multiple ICU stays with MV within the same hospital admission.

Billing codes only have a time resolution of days, while we discussed a time resolution of hours for recording TV. To calculate duration of MV, I think it might make sense to validate the MV through the billing code and calculate duration starting from the first TV recording to avoid this unnecessary loss of resolution. Else, should I take the time difference between the CHARTTIME for TV and the CHARTDATE for the first MV billing code?

The way we calculate duration of MV previously is based on the first and last recording of MV settings. The first recording of MV settings is reliable as the start of MV but the last recording is questionable as MV settings may remain the same during the last few days of MV and may not be recorded.
For the patients who have billing codes for MV, can we please check if the last day of billing for MV correspond to the last day MV settings were recorded?
We should have a conference call once we've done these checks.

For patients who have billing codes for MV, the last day of billing for MV corresponds to the last day MV settings were recorded for 8303/11058 (75.1%) of the ICUSTAY_IDs. Of those that do not correspond, 2457/11058 (22.2%) have a set TV recording after the last day of MV billing, and 298/11058 (2.7%) have a set TV recording before the last day of MV billing.

To get a sense of the TV data, I've calculated duration of MV as suggested above. Here are the set TV recordings, named by ICUSTAY_ID.txt. The TIME column is the number of hours after the first MV recording. Let me know if this looks good and I can reformat it to what we discussed.

Merge content from the MIMIC Cookbook repository

There is an oldish MIMIC Cookbook repository at:
https://github.com/MIT-LCP/mimic-cookbook

We should

merge the content if it hasn't been merged already
delete the cookbook repository.

D_ITEMS contains entries without any corresponding instances in the database

There are many itemids in D_ITEMS that supposedly link to the chartevents table, yet there are no corresponding entries in chartevents with those itemids.

select distinct(d.label)
from mimiciii.d_items d
where linksto like 'chartevents'
and itemid not in (select distinct itemid from mimiciii.chartevents);

Per discussion with @tompollard, some of these itemids are not yet in chartevents because their raw data contains PHI that has not yet been deidentified. However, there are many others that are simply 'junk' entries.

Two options for resolving this:

remove both PHI and 'junk' entries for now, re-introducing the PHI entries as we clean the data
remove only the 'junk' entries, and add some sort of flag to the PHI entries to indicate these may be available at some point in the future

Styleguide to encourage consistent SQL

We should consider adding a brief styleguide to the repository to encourage consistent formatting and syntax of SQL. It needn't be detailed - just setting out guidance on case, spacing etc should help to improve readability.

Couple of relevant links for reference:

Add MySQL build scripts to test framework

Pushing to the code repository triggers a script that attempts to create a new instance of MIMIC using the postgres build scripts: https://github.com/MIT-LCP/mimic-code/tree/master/tests

It would be good to update the tests to run against the new MySQL scripts too: #44

\COPY doesn't support variables

The following merge has introduced some errors into the postgres build scripts:
#18. One issue is that \COPY and COPY are different commands, and \COPY doesn't appear to support variables:

Do not confuse COPY with the psql instruction \copy. \copy invokes COPY FROM STDIN or COPY TO STDOUT, and then fetches/stores the data in a file accessible to the psql client. Thus, file accessibility and access rights depend on the client rather than the server when \copy is used.

27883 ioevents.hadm_id are null. Why?

I see 27883 ioevents.hadm_id are null:

SELECT count(*) FROM  mimiciii.ioevents WHERE ioevents.hadm_id is null;

Bug or intended?

Encoding issue for CHARTEVENTS_DATA_TABLE.csv's uom column?

When I open CHARTEVENTS_DATA_TABLE.csv I see from lines 520 to 524:

327,23,124321,234044,220179,21-OCT-2157 21:40:00,21-OCT-2157 21:43:00,19050,"100",100,"mmHg",0,0,"",""
1,23,124321,234044,223761,21-OCT-2157 12:05:00,21-OCT-2157 12:05:00,16978,"95.8",95.8,"Â¿F",0,0,"",""
2,23,124321,234044,223761,21-OCT-2157 12:05:00,21-OCT-2157 12:05:00,16978,"95.8",95.8,"Â¿F",0,0,"",""
3,23,124321,234044,223761,21-OCT-2157 12:05:00,21-OCT-2157 12:05:00,16978,"95.8",95.8,"Â¿F",0,0,"",""
4,23,124321,234044,220045,21-OCT-2157 12:06:00,21-OCT-2157 12:08:00,16978,"77",77,"bpm",0,0,"",""

Â¿F looks like some encoding issues.

In Postgresql, in which I imported MIMIC using the tutorial, I see ¿F:

Initial SAPS score.

Thank you very much for this database!

I use mimic III (haven't used Mimic II before). One thing that makes understanding the data difficult for me is that I don't see an accessible indicator of disease severity. It looks like mimic ii had the Saps score precalculated.

Is this possible with mimic III too (any other score, sofa, Apache would also work).

Thank you very much for your help.
Veit

Columns in INPUTEVENTS CV and MV

I'm migrating from v1.1 to v1.2 and the website says that the columns are the same for both INPUTEVENTS_CV and MV tables but from what I've seen they aren't.

vincents-mbp:raw vincent$ head -1 INPUTEVENTS_CV_DATA_TABLE.csv 
"ROW_ID","SUBJECT_ID","HADM_ID","ICUSTAY_ID","CHARTTIME","ITEMID","AMOUNT","AMOUNTUOM","RATE","RATEUOM","STORETIME","CGID","ORDERID","LINKORDERID","STOPPED","NEWBOTTLE","ORIGINALAMOUNT","ORIGINALAMOUNTUOM","ORIGINALROUTE","ORIGINALRATE","ORIGINALRATEUOM","ORIGINALSITE"
vincents-mbp:raw vincent$ head -1 INPUTEVENTS_MV_DATA_TABLE.csv 
"ROW_ID","SUBJECT_ID","HADM_ID","ICUSTAY_ID","STARTTIME","ENDTIME","ITEMID","AMOUNT","AMOUNTUOM","RATE","RATEUOM","STORETIME","CGID","ORDERID","LINKORDERID","ORDERCATEGORYNAME","SECONDARYORDERCATEGORYNAME","ORDERCOMPONENTTYPEDESCRIPTION","ORDERCATEGORYDESCRIPTION","PATIENTWEIGHT","TOTALAMOUNT","TOTALAMOUNTUOM","ISOPENBAG","CONTINUEINNEXTDEPT","CANCELREASON","STATUSDESCRIPTION","COMMENTS_EDITEDBY","COMMENTS_CANCELEDBY","COMMENTS_DATE","ORIGINALAMOUNT","ORIGINALRATE"

I might be good to state that MV has all the columns but CV doesn't have x, y, and z for example.

MonetDB import issues - chartevents.value - backslashes in the text

Hello,

Some "chartevents.value" values contains backslashes. (eg: rowid = 43077585; itemid= 807; value= "\113"; I haven't count all occurences)
Some database (at least monetDB) must double backslashes in order to bulk load data.
Moreover this seems to disturb "chartevents.valuenum" to be computed at least for itemid 807 that need numerical values.

For now my workaround is to remove backslashes from csv thanks to "sed -i 's///g' CHARTEVENTS.csv" command.

Thanks,

Weird text in noteevents data

Starting from the line number 50740610 in noteevent file, or in the row with row_id 639042 in noteevents table, the text field is weird.

Fix Docker image after restructuring directory

The scripts used to build a MIMIC Docker image have been moved to a new directory to fix some issues with the merged code (#18): https://github.com/MIT-LCP/mimic-code/tree/master/buildmimic/docker. I haven't checked, but I expect some updates will be needed to get it working again...

Requesting SAPS-II for MIMIC-II

I'm looking for a code to get the SAPS-II score over MIMIC2v26.
I know that SAPS-I score is already precalculated, but I need to perform operations using SAPS-II
I'll be really grateful if someone has the code and share it with me.
Thanks!

Ventilator modes not found for metavision patients

I am using MIMIC III v1.1 and was unable to find information regarding ventilator modes from the Metavision data source (specifically, ITEMID== 223849 in CHARTEVENTS). Is this information available elsewhere?

Update build scripts for updated v1.2 schema

The scripts for building MIMIC will need to be updated to account for the changes between version 1.1 and version 1.2. These include:

the icustayevents table has been renamed to icustays
ioevents has been replaced with inputevents_mv, inputevents_cv, outputevents
renamed columns in DIAGNOSES_ICD and PROCEDURES_ICD tables

Why are the heights and weights missing when subject_id > 32810?

I use https://github.com/MIT-LCP/mimic-code/blob/master/demographics/postgres/HeightWeightQuery.sql to get heights and weights in MIMIC-III v1.1.

Why are the heights and weights missing for all patients when subject_id > 32810?

SELECT MAX(subject_id) FROM mimiciii.heightweight; returns 32810.

Flags

Which exp_flg are available without calculation.
Cheers
Sven

Inconsistent naming of input and output fluids 'amount' and 'value'

It appears that both INPUTEVENTS tables have AMOUNT and AMOUNTUOM columns whereas the OUTPUTEVENTS table uses VALUE and VALUEUOM.

I propose the use of a consistent name, probably VALUE to also be consistent with CHARTEVENTS, LABEVENTS and DATETIMEEVENTS.

APACHE score or similar

I'm looking to estimate disease severity using some kind of mortality prediction or comorbidity score. The trouble that I've run into is that there are very few recorded APACHE II, IV or SOFA scores or even the defined component items that constitute the scores.

Before I try to write something to search through chartevents, labevents and discharge summaries for the APACHE components, I was wondering if there is any other kind of comorbidity score that is widely prevalent in the database. I would assume that the contributing hospital would use some kind of score but I can't find it.

Platelets, RBC

Looking for transfusion with platelets and or RBC. Which approach is best? ICD code or IO ITEMID?
Cheers
Sven

Integrate the code repository with Travis CI and Jenkins

We should set up the MIMIC Code Repository (https://github.com/MIT-LCP/mimic-code) so that updates trigger test scripts to run on Travis CI against a testing version of MIMIC.

We first need to create a testing version of MIMIC (see private repo https://github.com/MIT-LCP/mimic-iii-building/issues/170).

Key is not present in table "d_icd_diagnoses" or "d_icd_procedures"

I was trying to add constraints to the clinical mimic iii datasets using the script provided here: https://github.com/MIT-LCP/mimic-code/blob/master/buildmimic/postgres/postgres_add_constraints.sql
and Postgres raised two issues,

psql:postgres_add_constraints.sql:162: ERROR: insert or update on table "diagnoses_icd" violates foreign key constraint "diagnoses_icd_fk_icd9"
DETAIL: Key (icd9_code)=(V451) is not present in table "d_icd_diagnoses".

psql:postgres_add_constraints.sql:349: ERROR: insert or update on table "procedures_icd" violates foreign key constraint "procedures_icd_fk_icd9"
DETAIL: Key (icd9_code)=(3601) is not present in table "d_icd_procedures".

I checked and confirmed that those keys were actually not present in the corresponding tables. Could someone help me with this? Thank you in advance!

Potential issue in date shifting - PATIENTS table

It appears that six rows of the PATIENTS table have the real, 20th century, patient DOB. I assume these values have not been data shifted with the corresponding DODs because otherwise they yield ages at admission of 200+ years.

Six cases identified below:

Bug fix for the next build.

Propagating indexes from CHARTEVENTS table to its inherited childrens

I just realized that when we run the current postgres_add_indexes.sql script. The 4 indexes will only be created on the mother table CHARTEVENTS. This can be meaningless, as there is no data in the mother table. The indexes will not automatically propagate to the children tables. Guess we may need to amend the script and create the indexes for each individual children table.

SOFA score

@alistairewj
In the recent SOFA score code by Alistair, I think it should use lowest BP (MeanBP_Min) not MeanBP_Max.

Also, what happens if patient is missing PaO2 or FIO2??

Some patients will have O2 flow in Liter/min instead of FiO2. there are some conversion tables, e.g 2L =28% FiO2 but those are just approximate, also there's a modified SOFA score that uses SPO2/FiO2 instead of PaO2/FiO2
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3811929/

Actually would be nice if anyone could create a code to calculate the modified SOFA in addition to the orginal one

Create a Makefile to simplify the Postgres build process

Currently several scripts need to be run to build MIMIC in a local Postgres database:
https://github.com/MIT-LCP/mimic-code/tree/master/buildmimic/postgres.

A set of instructions for running these scripts could be added to a Makefile, allowing MIMIC to be built with a single command (e.g. make mimic.

Is there an easy way to get symptoms for each admission?

I am trying to extract symptoms from 'noteevents' but it doesn't seem to be a straight forward task. Is there an easy way of populating this information for each admission?

Endoscopy reports

I was wondering if there is any way to find the endoscopy reports. I can see endoscopy done in the procedure ICD9 tables but I can't find notes in the notes events table. Any input?

Thanks

ITEMID within IOEVENTS incorrectly linked to D_ITEMS

The minimum ITEMID defined in D_ITEMS to reference the IOEVENTS table is 30001 however, the IOEVENTS table regularly contains events with ITEM_ID values with nchar() equal to 1, 2 or 3 and therefore cannot match with the correct defined item.

Common examples such as 13, 18, and 56 are either not defined anywhere in D_ITEMS (13 and 18), or wrongly correspond to something from CHARTEVENTS (56 --> Atrial Pacemaker [Sense]) which leads me to believe they actually correspond to 30013 (D5W), 30018 (0.9% Normal Saline), and 30056 (Po Intake) which are correctly described with ORIGINALROUTE as intravenous push, intravenous push and oral respectively within the IOEVENTS table.

I suggest that the documentation is updated as a temporary fix to avoid further confusion and the bug is fixed in the next database build and release.

ventilator mode recordings with unknown value

In CHARTEVENTS, there are 5484 ventilator mode recordings (ITEMID in (720, 223849)) that have blank values and another 7709 recordings that have uninformative values ("Other/Remarks")

Negative (and excessive) quantities in IOEVENTS table

I have found instances (> 5,500 in a subgroup of ~1 million rows) of negative VOLUME fields within the IOEVENTS table. An example is 250 mL of free water being ordered (ORDERID = 1757275) and the actual volume is -7250 mL - I cannot think of any reason for such a difference in volume or the different sign.

A very common ITEMID is 225183 which represents the current goal of dialysis which makes up 1000s of cases by itself.

I think that every in or out should be positive and distinguishable, as IN or OUT, in some other manner. I've been using the existence of an ORDERID to represent INs. Perhaps the negative values can be corrected in the next release.

Add ventdurations dependency for severity scores

The ventdurations table from ventilation-durations.sql is referenced in SOFA.sql, but not called out in README.md as a dependency.

Eliminating "_DATA_TABLE" from names of .csv files

The current naming convention of calling the .csv files things like ADMISSIONS_DATA_TABLE.csv seems redundant, and also causes extra work to edit the output of my heuristic structure finder (https://github.com/mitmedg/csv2mysql), which thus generates scripts like
CREATE TABLE ADMISSIONS_DATA_TABLE ...

Why not just change them all by eliminating _DATA_TABLE from the name of the file?

mit-lcp / mimic-code Goto Github PK

mimic-code's Issues

Recommend Projects

Recommend Topics

Recommend Org