mit-lcp / mimic-code Goto Github PK
View Code? Open in Web Editor NEWMIMIC Code Repository: Code shared by the research community for the MIMIC family of databases
Home Page: https://mimic.mit.edu
License: MIT License
MIMIC Code Repository: Code shared by the research community for the MIMIC family of databases
Home Page: https://mimic.mit.edu
License: MIT License
Dear all,
I have some doubts regarding the codes that should be used to analyse the urine output of the patient. I am interested in study patients with kidney injuries and I need to verify the level of the urine output volume. I need in the mL/kg/hr units. Can I use 51108/51109 from LABEVENTS or should be use another code from IOEVENTS?
Another question, what is the difference between the codes 51108 and 51109? They seem equal, in the value, date and units.
If we have to use the LABEVENTS code, how can we have the urine output in mL/kg/hr? To have in mL/kg it is just needed to divide by the weight of the patient, but what is the sample time rate of the measurements from 51108 and 51109?
Thank you!
Sincerely,
Vanessa Cunha
Oracle SQL Loader doesn't appear to support the RFC 4180 format for CSVs, so the control file for loading noteevents fails.
The issue is that the text field contains newline (\n) characters, which Oracle treats as the end of a record. Surprisingly Oracle doesn't seem to offer a simple solution for this.
We initially created a MIMIC-II v3.0 repository at: https://github.com/mimic2/v3.0 to hold documentation and web content for the latest version of MIMIC. This repository has been superseded by https://github.com/MIT-LCP/mimic-website
We should:
Trying to support multiple flavours of database system creates a fair amount of extra work, mostly unnecessarily. Up until now we have favoured PostgreSQL as our public database system. Switching to MySQL may be worth considering, mainly because it would move us in line with @pszolovits's group.
Some things to consider:
with
clauses are not supported?Can someone please clarify the roles of CANCELREASON
and STATUSDESCRIPTION
in the INPUTEVENTS_MV table.
From what I have seen CANCELREASON
can be 0, 1 or 2 - what is the difference between 1 and 2?
And STATUSDESCRIPTION
can be Rewrtten, Changed or Stopped - what is the difference and why does the AMOUNT
field sometimes match the ORIGINALAMOUNT
even though the order was rewritten/changed/stopped?
I've also noticed that CANCELREASON
and STATUSSDESCRIPTION
don't always match. That is rewritten orders can have CANCELREASON
of 0, 1 or 2.
The build script for the patients table creates the 'expire_flag' as a varchar(5), probably because the flag used to be a 'y/n' string: https://github.com/MIT-LCP/mimic-code/blob/master/buildmimic/postgres/postgres_create_tables.sql
EXPIRE_FLAG VARCHAR(5) NOT NULL,
It would make sense to change the column to a numerical type to avoid people getting the following error when querying the column in the usual way for a number:
SELECT *
FROM patients
WHERE expire_flag = 1;
-- Returns error:
-- ERROR: operator does not exist: character varying = integer
-- LINE 1: select * from patients where expire_flag = 1;
It appears that many events are duplicated many times (up to 24 times).
This figure shows the same order, by ORDER ID, listed 24 times with identical fields. I strongly doubt that 24 * 204 mL was actually given. In an even more extreme case 5000mL is given 9 times at the same time.
These are not cases where the order is split over multiple rows and the component volumes total the ordered volume. These and all of the other cases of 2 or 3 duplicates I have inspected have been perfect replicates.
I suggest this issue should be addressed in the next build and release. In the mean time, care should be taken to filter by unique ORDERID before any summation.
Mike found a likely bug in the icustay_detail generation (https://github.com/MIT-LCP/mimic-code/blob/master/demographics/postgres/icustay_detail.sql), where the hospstay_seq is generated incorrectly and always gives a 1 for some patients. See discrepancy below.
select count(1)
from icustay_detail
where hospstay_seq = 1
and icustay_seq = 1;
count
-------
57328
select count(distinct(subject_id))
from icustay_detail
where hospstay_seq = 1
and icustay_seq = 1;
count
-------
46428
select *
from icustay_detail
where subject_id = 41976;
subject_id | hadm_id | icustay_id | admittime | hospstay_seq | intime | los_icu | icustay_seq
------------+---------+------------+---------------------+--------------+---------------------+---------+-------------+
... etc
(15 rows)
The mysql script has an extra comma on line 74, which causes error when loading data to mysql.
Hello,
Field "transfers.dbsource" has a NOT NULL constraint. However, 241 rows in the data are empty. (eg: row_id = 3831; dbsource = NULL)
Thanks,
I'm looking for the ideal specs for a system that could be used to run MIMIC-III. The aim is to keep costs down while maximising performance, stability and query speed. As an added bonus I'd like to run various types of analysis software, and to run software on the waveform database.
Specific to the MIMIC-III database, I'm interested in knowing the ideal:
The main reason I ask is because I've tried importing the dataset in to Postgres on my laptop with windows and it's crashed a few times. I tried installing an Ubuntu virtual system on various computers but have run in to issues with crashing/speed. I'm looking to put together a dedicated system for MIMIC-III and other large datasets, but if something like a macbook is equally as useful I'll even consider purchasing one.
I see 47 diagnoses_icd.hadm_id are null:
SELECT count(*) FROM mimiciii.diagnoses_icd WHERE diagnoses_icd.seq_num is null;
Bug or intended?
How can I create icustay_detail view/table on mimic III? Is there a script that can generate that?
Thanks!
On this page, https://mimic.physionet.org/mimictables/diagnoses_icd/, it is said that table diagnoses_icd has a column "description" of type VARCHAR(30); it is missing from the actual csv file, but still referred to by the indexing script posted here: https://github.com/MIT-LCP/mimic-code/blob/master/buildmimic/postgres/postgres_add_indexes.sql
When running the indexing script, the error message shown by Postgres is as follows:
psql:postgres_add_indexes.sql:29: ERROR: column description does not exist
Since d_icd_diagnoses includes both short and long title of each icd-9 code used in the diagnoses_icd table, I recommend that modifying the MIMIC website page to remove the description entry.
I have found cases where the data-shifting algorithm masks the number of alcoholic beverages per week that may be of some clinical significance in the discharge summaries. Low priority issue.
An example (with added spaces to prevent markdown styling):
She has approximately [* * 11-26 * *] alcoholic beverages per week, does not smoke..."
HADM_ID = 120167
- Why are billing codes more reliable? At least 29% of the ICU stays I have TV data for do not have a ventilation billing code recorded during the duration of the ICU stay. Does it make sense to exclude these ICU stays?
Please double check this. This is not good at all if accurate. There are two possibilities I can think of to explain this is that the patients with TV data but no billing codes for MV were on NIV. The first is that MV in the operating room has a different billing code and this was not given to us. Patients who had open heart surgery would have this billing code and not the one that we have.
The other possibility is that sometimes, TV is recorded when patients are on NIV even though these measurements are highly inaccurate. Can we check if they have billing codes for NIV?
If billing codes indeed do not capture if patients were on MV, we will have to abandon it. We should review the discharge summaries of a sample of the patients with TV but no billing codes for MV.
We should have a conference call once we've done these checks.
I rechecked my work, and 3857/14759 (26.1%) of ICUSTAY_IDs with a set TV recording have no billing code for IMV. Including NIV billing codes, 3701/14759 (25.1%) of ICUSTAY_IDs with a set TV are still unaccounted for. Next, if I subset the set TV recordings by AC/CMV only, a smaller proportion - 1424/8749 (16.3%) - of ICUSTAY_IDs are missing MV billing codes. On the flip side, every ICUSTAY_ID with a MV billing code has at least 1 set TV recording.
I've mapped billing codes to ICUSTAY_IDs based on whether the billing CHARTDATE was made on or after INTIME and on or before OUTTIME of the ICU stay. I've lost 4% of the MV billing codes this way, but it's not enough to explain these results.
Here you may review the discharge notes of the patients with a set TV recording but no IMV billing code. There's a handful of duplicate notes, for multiple ICU stays with MV within the same hospital admission.
- Billing codes only have a time resolution of days, while we discussed a time resolution of hours for recording TV. To calculate duration of MV, I think it might make sense to validate the MV through the billing code and calculate duration starting from the first TV recording to avoid this unnecessary loss of resolution. Else, should I take the time difference between the CHARTTIME for TV and the CHARTDATE for the first MV billing code?
The way we calculate duration of MV previously is based on the first and last recording of MV settings. The first recording of MV settings is reliable as the start of MV but the last recording is questionable as MV settings may remain the same during the last few days of MV and may not be recorded.
For the patients who have billing codes for MV, can we please check if the last day of billing for MV correspond to the last day MV settings were recorded?
We should have a conference call once we've done these checks.
For patients who have billing codes for MV, the last day of billing for MV corresponds to the last day MV settings were recorded for 8303/11058 (75.1%) of the ICUSTAY_IDs. Of those that do not correspond, 2457/11058 (22.2%) have a set TV recording after the last day of MV billing, and 298/11058 (2.7%) have a set TV recording before the last day of MV billing.
To get a sense of the TV data, I've calculated duration of MV as suggested above. Here are the set TV recordings, named by ICUSTAY_ID.txt. The TIME column is the number of hours after the first MV recording. Let me know if this looks good and I can reformat it to what we discussed.
There is an oldish MIMIC Cookbook repository at:
https://github.com/MIT-LCP/mimic-cookbook
We should
There are many itemids in D_ITEMS that supposedly link to the chartevents table, yet there are no corresponding entries in chartevents with those itemids.
select distinct(d.label)
from mimiciii.d_items d
where linksto like 'chartevents'
and itemid not in (select distinct itemid from mimiciii.chartevents);
Per discussion with @tompollard, some of these itemids are not yet in chartevents because their raw data contains PHI that has not yet been deidentified. However, there are many others that are simply 'junk' entries.
Two options for resolving this:
We should consider adding a brief styleguide to the repository to encourage consistent formatting and syntax of SQL. It needn't be detailed - just setting out guidance on case, spacing etc should help to improve readability.
Couple of relevant links for reference:
Pushing to the code repository triggers a script that attempts to create a new instance of MIMIC using the postgres build scripts: https://github.com/MIT-LCP/mimic-code/tree/master/tests
It would be good to update the tests to run against the new MySQL scripts too: #44
The following merge has introduced some errors into the postgres build scripts:
#18. One issue is that \COPY
and COPY
are different commands, and \COPY
doesn't appear to support variables:
Do not confuse COPY with the psql instruction \copy. \copy invokes COPY FROM STDIN or COPY TO STDOUT, and then fetches/stores the data in a file accessible to the psql client. Thus, file accessibility and access rights depend on the client rather than the server when \copy is used.
I see 27883 ioevents.hadm_id are null:
SELECT count(*) FROM mimiciii.ioevents WHERE ioevents.hadm_id is null;
Bug or intended?
When I open CHARTEVENTS_DATA_TABLE.csv
I see from lines 520 to 524:
327,23,124321,234044,220179,21-OCT-2157 21:40:00,21-OCT-2157 21:43:00,19050,"100",100,"mmHg",0,0,"",""
1,23,124321,234044,223761,21-OCT-2157 12:05:00,21-OCT-2157 12:05:00,16978,"95.8",95.8,"¿F",0,0,"",""
2,23,124321,234044,223761,21-OCT-2157 12:05:00,21-OCT-2157 12:05:00,16978,"95.8",95.8,"¿F",0,0,"",""
3,23,124321,234044,223761,21-OCT-2157 12:05:00,21-OCT-2157 12:05:00,16978,"95.8",95.8,"¿F",0,0,"",""
4,23,124321,234044,220045,21-OCT-2157 12:06:00,21-OCT-2157 12:08:00,16978,"77",77,"bpm",0,0,"",""
¿F
looks like some encoding issues.
In Postgresql, in which I imported MIMIC using the tutorial, I see ¿F
:
Thank you very much for this database!
I use mimic III (haven't used Mimic II before). One thing that makes understanding the data difficult for me is that I don't see an accessible indicator of disease severity. It looks like mimic ii had the Saps score precalculated.
Is this possible with mimic III too (any other score, sofa, Apache would also work).
Thank you very much for your help.
Veit
I'm migrating from v1.1 to v1.2 and the website says that the columns are the same for both INPUTEVENTS_CV and MV tables but from what I've seen they aren't.
vincents-mbp:raw vincent$ head -1 INPUTEVENTS_CV_DATA_TABLE.csv
"ROW_ID","SUBJECT_ID","HADM_ID","ICUSTAY_ID","CHARTTIME","ITEMID","AMOUNT","AMOUNTUOM","RATE","RATEUOM","STORETIME","CGID","ORDERID","LINKORDERID","STOPPED","NEWBOTTLE","ORIGINALAMOUNT","ORIGINALAMOUNTUOM","ORIGINALROUTE","ORIGINALRATE","ORIGINALRATEUOM","ORIGINALSITE"
vincents-mbp:raw vincent$ head -1 INPUTEVENTS_MV_DATA_TABLE.csv
"ROW_ID","SUBJECT_ID","HADM_ID","ICUSTAY_ID","STARTTIME","ENDTIME","ITEMID","AMOUNT","AMOUNTUOM","RATE","RATEUOM","STORETIME","CGID","ORDERID","LINKORDERID","ORDERCATEGORYNAME","SECONDARYORDERCATEGORYNAME","ORDERCOMPONENTTYPEDESCRIPTION","ORDERCATEGORYDESCRIPTION","PATIENTWEIGHT","TOTALAMOUNT","TOTALAMOUNTUOM","ISOPENBAG","CONTINUEINNEXTDEPT","CANCELREASON","STATUSDESCRIPTION","COMMENTS_EDITEDBY","COMMENTS_CANCELEDBY","COMMENTS_DATE","ORIGINALAMOUNT","ORIGINALRATE"
I might be good to state that MV has all the columns but CV doesn't have x, y, and z for example.
Hello,
Some "chartevents.value" values contains backslashes. (eg: rowid = 43077585; itemid= 807; value= "\113"; I haven't count all occurences)
Some database (at least monetDB) must double backslashes in order to bulk load data.
Moreover this seems to disturb "chartevents.valuenum" to be computed at least for itemid 807 that need numerical values.
For now my workaround is to remove backslashes from csv thanks to "sed -i 's///g' CHARTEVENTS.csv" command.
Thanks,
Starting from the line number 50740610 in noteevent file, or in the row with row_id 639042 in noteevents table, the text field is weird.
The scripts used to build a MIMIC Docker image have been moved to a new directory to fix some issues with the merged code (#18): https://github.com/MIT-LCP/mimic-code/tree/master/buildmimic/docker. I haven't checked, but I expect some updates will be needed to get it working again...
I'm looking for a code to get the SAPS-II score over MIMIC2v26.
I know that SAPS-I score is already precalculated, but I need to perform operations using SAPS-II
I'll be really grateful if someone has the code and share it with me.
Thanks!
I am using MIMIC III v1.1 and was unable to find information regarding ventilator modes from the Metavision data source (specifically, ITEMID== 223849 in CHARTEVENTS). Is this information available elsewhere?
The scripts for building MIMIC will need to be updated to account for the changes between version 1.1 and version 1.2. These include:
I use https://github.com/MIT-LCP/mimic-code/blob/master/demographics/postgres/HeightWeightQuery.sql to get heights and weights in MIMIC-III v1.1.
Why are the heights and weights missing for all patients when subject_id > 32810?
SELECT MAX(subject_id) FROM mimiciii.heightweight;
returns 32810.
Which exp_flg are available without calculation.
Cheers
Sven
It appears that both INPUTEVENTS tables have AMOUNT
and AMOUNTUOM
columns whereas the OUTPUTEVENTS table uses VALUE
and VALUEUOM
.
I propose the use of a consistent name, probably VALUE
to also be consistent with CHARTEVENTS, LABEVENTS and DATETIMEEVENTS.
I'm looking to estimate disease severity using some kind of mortality prediction or comorbidity score. The trouble that I've run into is that there are very few recorded APACHE II, IV or SOFA scores or even the defined component items that constitute the scores.
Before I try to write something to search through chartevents, labevents and discharge summaries for the APACHE components, I was wondering if there is any other kind of comorbidity score that is widely prevalent in the database. I would assume that the contributing hospital would use some kind of score but I can't find it.
Looking for transfusion with platelets and or RBC. Which approach is best? ICD code or IO ITEMID?
Cheers
Sven
We should set up the MIMIC Code Repository (https://github.com/MIT-LCP/mimic-code) so that updates trigger test scripts to run on Travis CI against a testing version of MIMIC.
We first need to create a testing version of MIMIC (see private repo https://github.com/MIT-LCP/mimic-iii-building/issues/170).
I was trying to add constraints to the clinical mimic iii datasets using the script provided here: https://github.com/MIT-LCP/mimic-code/blob/master/buildmimic/postgres/postgres_add_constraints.sql
and Postgres raised two issues,
psql:postgres_add_constraints.sql:162: ERROR: insert or update on table "diagnoses_icd" violates foreign key constraint "diagnoses_icd_fk_icd9"
DETAIL: Key (icd9_code)=(V451) is not present in table "d_icd_diagnoses".
psql:postgres_add_constraints.sql:349: ERROR: insert or update on table "procedures_icd" violates foreign key constraint "procedures_icd_fk_icd9"
DETAIL: Key (icd9_code)=(3601) is not present in table "d_icd_procedures".
I checked and confirmed that those keys were actually not present in the corresponding tables. Could someone help me with this? Thank you in advance!
I just realized that when we run the current postgres_add_indexes.sql script. The 4 indexes will only be created on the mother table CHARTEVENTS. This can be meaningless, as there is no data in the mother table. The indexes will not automatically propagate to the children tables. Guess we may need to amend the script and create the indexes for each individual children table.
@alistairewj
In the recent SOFA score code by Alistair, I think it should use lowest BP (MeanBP_Min) not MeanBP_Max.
Also, what happens if patient is missing PaO2 or FIO2??
Some patients will have O2 flow in Liter/min instead of FiO2. there are some conversion tables, e.g 2L =28% FiO2 but those are just approximate, also there's a modified SOFA score that uses SPO2/FiO2 instead of PaO2/FiO2
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3811929/
Actually would be nice if anyone could create a code to calculate the modified SOFA in addition to the orginal one
Currently several scripts need to be run to build MIMIC in a local Postgres database:
https://github.com/MIT-LCP/mimic-code/tree/master/buildmimic/postgres.
A set of instructions for running these scripts could be added to a Makefile, allowing MIMIC to be built with a single command (e.g. make mimic
.
I am trying to extract symptoms from 'noteevents' but it doesn't seem to be a straight forward task. Is there an easy way of populating this information for each admission?
I was wondering if there is any way to find the endoscopy reports. I can see endoscopy done in the procedure ICD9 tables but I can't find notes in the notes events table. Any input?
Thanks
The minimum ITEMID defined in D_ITEMS to reference the IOEVENTS table is 30001 however, the IOEVENTS table regularly contains events with ITEM_ID values with nchar() equal to 1, 2 or 3 and therefore cannot match with the correct defined item.
Common examples such as 13, 18, and 56 are either not defined anywhere in D_ITEMS (13 and 18), or wrongly correspond to something from CHARTEVENTS (56 --> Atrial Pacemaker [Sense]) which leads me to believe they actually correspond to 30013 (D5W), 30018 (0.9% Normal Saline), and 30056 (Po Intake) which are correctly described with ORIGINALROUTE as intravenous push, intravenous push and oral respectively within the IOEVENTS table.
I suggest that the documentation is updated as a temporary fix to avoid further confusion and the bug is fixed in the next database build and release.
In CHARTEVENTS, there are 5484 ventilator mode recordings (ITEMID in (720, 223849)) that have blank values and another 7709 recordings that have uninformative values ("Other/Remarks")
I have found instances (> 5,500 in a subgroup of ~1 million rows) of negative VOLUME
fields within the IOEVENTS table. An example is 250 mL of free water being ordered (ORDERID
= 1757275) and the actual volume is -7250 mL - I cannot think of any reason for such a difference in volume or the different sign.
A very common ITEMID
is 225183 which represents the current goal of dialysis which makes up 1000s of cases by itself.
I think that every in or out should be positive and distinguishable, as IN or OUT, in some other manner. I've been using the existence of an ORDERID
to represent INs. Perhaps the negative values can be corrected in the next release.
The ventdurations
table from ventilation-durations.sql is referenced in SOFA.sql, but not called out in README.md as a dependency.
The current naming convention of calling the .csv files things like ADMISSIONS_DATA_TABLE.csv seems redundant, and also causes extra work to edit the output of my heuristic structure finder (https://github.com/mitmedg/csv2mysql), which thus generates scripts like
CREATE TABLE ADMISSIONS_DATA_TABLE ...
Why not just change them all by eliminating _DATA_TABLE from the name of the file?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.