Coder Social home page Coder Social logo

mit-lcp / mimic-code Goto Github PK

View Code? Open in Web Editor NEW
2.3K 135.0 1.5K 19.86 MB

MIMIC Code Repository: Code shared by the research community for the MIMIC family of databases

Home Page: https://mimic.mit.edu

License: MIT License

Shell 0.74% Python 1.22% Makefile 0.20% PLpgSQL 0.55% Jupyter Notebook 63.78% HTML 32.62% TeX 0.72% Visual Basic 6.0 0.17%
mimic-iii critical-care icu physionet

mimic-code's Introduction

MIMIC Code Repository DOI

The MIMIC Code Repository is intended to be a central hub for sharing, refining, and reusing code used for analysis of the MIMIC critical care database. To find out more about MIMIC, please see: https://mimic.mit.edu. Source code for the website is in the mimic-website GitHub repository.

You can read more about the code repository in the following open access paper: The MIMIC Code Repository: enabling reproducibility in critical care research.

Cloud access to datasets

The various MIMIC databases are available on Google Cloud Platform (GCP) and Amazon Web Services (AWS). To access the data on the cloud, simply add the relevant cloud identifier to your PhysioNet profile. Then request access to the dataset for the particular cloud platform via the PhysioNet project page. Further instructions are available on the MIMIC website.

Navigating this repository

This repository contains code for five databases on PhysioNet:

  • MIMIC-III - critical care data for patients admitted to ICUs at the BIDMC between 2001 - 2012
  • MIMIC-IV - hospital and critical care data for patients admitted to the ED or ICU between 2008 - 2019
  • MIMIC-IV-ED - emergency department data for individuals attending the ED between 2011 - 2019
  • MIMIC-IV Waveforms (TBD) - this dataset has yet to be published.
  • MIMIC-CXR - chest x-ray imaging and deidentified free-text radiology reports for patients admitted to the ED from 2012 - 2016

The repository contains one top-level folder containing community developed code for each datasets:

  • mimic-iii - build scripts for MIMIC-III, derived concepts which are available on the physionet-data.mimiciii_derived dataset on BigQuery, and tutorials.
  • mimic-iv - build scripts for MIMIC-IV, derived concepts which are available on the physionet-data.mimic_derived dataset on BigQuery, and tutorials.
  • mimic-iv-cxr - code for loading and analyzing both dicom (mimic-iv-cxr/dcm) and text (mimic-iv-cxr/txt) data. In order to clearly indicate that MIMIC-CXR can be linked with MIMIC-IV, we have named this folder mimic-iv-cxr, and any references to MIMIC-CXR / MIMIC-IV-CXR are interchangeable.
  • mimic-iv-ed - build scripts for MIMIC-IV-ED.
  • mimic-iv-waveforms - TBD

Each subfolder has a README with further detail regarding its content.

Launch MIMIC-III in AWS

MIMIC-III is available on AWS (and MIMIC-IV will be available in the future). Use the below Launch Stack button to deploy access to the MIMIC-III dataset into your AWS account. This will give you real-time access to the MIMIC-III data in your AWS account without having to download a copy of the MIMIC-III dataset. It will also deploy a Jupyter Notebook with access to the content of this GitHub repository in your AWS account. Prior to launching this, please login to the MIMIC PhysioNet website, input your AWS account number, and request access to the MIMIC-III Clinical Database on AWS.

To start this deployment, click the Launch Stack button. On the first screen, the template link has already been specified, so just click next. On the second screen, provide a Stack name (letters and numbers) and click next, on the third screen, just click next. On the forth screen, at the bottom, there is a box that says I acknowledge that AWS CloudFormation might create IAM resources.. Check that box, and then click Create. Once the Stack has complete deploying, look at the Outputs tab of the AWS CloudFormation console for links to your Juypter Notebooks instance.

cloudformation-launch-stack

Other useful tools

  • Bloatectomy (paper) - A python based package for removing duplicate text in clinical notes
  • Medication categories - Python script for extracting medications from free-text notes
  • MIMIC Extract (paper) - A python based package for transforming MIMIC-III data into a machine learning friendly format
  • FIDDLE (paper) - A python based package for a FlexIble Data-Driven pipeLinE (FIDDLE), transforming structured EHR data into a machine learning friendly format

Acknowledgement

If you use code or concepts available in this repository, we would be grateful if you would:

@article{johnson2018mimic,
  title={The MIMIC Code Repository: enabling reproducibility in critical care research},
  author={Johnson, Alistair E W and Stone, David J and Celi, Leo A and Pollard, Tom J},
  journal={Journal of the American Medical Informatics Association},
  volume={25},
  number={1},
  pages={32--39},
  year={2018},
  publisher={Oxford University Press}
}

Contributing

Our team has worked hard to create and share the MIMIC datasets. We encourage you to share the code that you use for data processing and analysis. Sharing code helps to make studies reproducible and promotes collaborative research. To contribute, please:

We encourage users to share concepts they have extracted by writing code which generates a materialized view. These materialized views can then be used by researchers around the world to speed up data extraction. For example, ventilation durations can be acquired by creating the ventdurations view in concepts/durations/ventilation_durations.sql.

License

By committing your code to the MIMIC Code Repository you agree to release the code under the MIT License attached to the repository.

Coding style

Please refer to the style guide for guidelines on formatting your code for the repository.

mimic-code's People

Contributors

a-chahin avatar aegis301 avatar ajb5d avatar alexmbennett2 avatar alistairewj avatar armando-fandango avatar briangow avatar cosgriffc avatar denbergvanthijs avatar ektar avatar erinhong avatar etheleon avatar hans0124sg avatar haobibo avatar jackiemium avatar jeblundell avatar kiendang avatar lbulgarelli avatar parisni avatar pedrogemal avatar pszolovits avatar qinyu-allen-zhao avatar rsravan91 avatar schu avatar shu98 avatar sichengh avatar stefanhgm avatar tmward avatar tompollard avatar yongfanbeta avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mimic-code's Issues

Remove the MIMIC-II v3.0 repo at https://github.com/mimic2/v3.0

We initially created a MIMIC-II v3.0 repository at: https://github.com/mimic2/v3.0 to hold documentation and web content for the latest version of MIMIC. This repository has been superseded by https://github.com/MIT-LCP/mimic-website

We should:

  1. ensure all of the issues posted at: https://github.com/mimic2/v3.0/issues?utf8=%E2%9C%93&q= are archived
  2. remove the content at https://github.com/mimic2/v3.0 and point people to the new website at http://mimic.physionet.org/

Eliminating "_DATA_TABLE" from names of .csv files

The current naming convention of calling the .csv files things like ADMISSIONS_DATA_TABLE.csv seems redundant, and also causes extra work to edit the output of my heuristic structure finder (https://github.com/mitmedg/csv2mysql), which thus generates scripts like
CREATE TABLE ADMISSIONS_DATA_TABLE ...

Why not just change them all by eliminating _DATA_TABLE from the name of the file?

Rewritten/Changed/Stopped differences in INPUTEVENTS_MV

Can someone please clarify the roles of CANCELREASON and STATUSDESCRIPTION in the INPUTEVENTS_MV table.
From what I have seen CANCELREASON can be 0, 1 or 2 - what is the difference between 1 and 2?
And STATUSDESCRIPTION can be Rewrtten, Changed or Stopped - what is the difference and why does the AMOUNT field sometimes match the ORIGINALAMOUNT even though the order was rewritten/changed/stopped?

I've also noticed that CANCELREASON and STATUSSDESCRIPTION don't always match. That is rewritten orders can have CANCELREASON of 0, 1 or 2.

Key is not present in table "d_icd_diagnoses" or "d_icd_procedures"

I was trying to add constraints to the clinical mimic iii datasets using the script provided here: https://github.com/MIT-LCP/mimic-code/blob/master/buildmimic/postgres/postgres_add_constraints.sql
and Postgres raised two issues,

psql:postgres_add_constraints.sql:162: ERROR: insert or update on table "diagnoses_icd" violates foreign key constraint "diagnoses_icd_fk_icd9"
DETAIL: Key (icd9_code)=(V451) is not present in table "d_icd_diagnoses".

psql:postgres_add_constraints.sql:349: ERROR: insert or update on table "procedures_icd" violates foreign key constraint "procedures_icd_fk_icd9"
DETAIL: Key (icd9_code)=(3601) is not present in table "d_icd_procedures".

I checked and confirmed that those keys were actually not present in the corresponding tables. Could someone help me with this? Thank you in advance!

Potential issue in date shifting - PATIENTS table

It appears that six rows of the PATIENTS table have the real, 20th century, patient DOB. I assume these values have not been data shifted with the corresponding DODs because otherwise they yield ages at admission of 200+ years.

Six cases identified below:
image

Bug fix for the next build.

Requesting SAPS-II for MIMIC-II

I'm looking for a code to get the SAPS-II score over MIMIC2v26.
I know that SAPS-I score is already precalculated, but I need to perform operations using SAPS-II
I'll be really grateful if someone has the code and share it with me.
Thanks!

SOFA score

@alistairewj
In the recent SOFA score code by Alistair, I think it should use lowest BP (MeanBP_Min) not MeanBP_Max.

Also, what happens if patient is missing PaO2 or FIO2??

Some patients will have O2 flow in Liter/min instead of FiO2. there are some conversion tables, e.g 2L =28% FiO2 but those are just approximate, also there's a modified SOFA score that uses SPO2/FiO2 instead of PaO2/FiO2
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3811929/

Actually would be nice if anyone could create a code to calculate the modified SOFA in addition to the orginal one

ventilator mode recordings with unknown value

In CHARTEVENTS, there are 5484 ventilator mode recordings (ITEMID in (720, 223849)) that have blank values and another 7709 recordings that have uninformative values ("Other/Remarks")

Negative (and excessive) quantities in IOEVENTS table

I have found instances (> 5,500 in a subgroup of ~1 million rows) of negative VOLUME fields within the IOEVENTS table. An example is 250 mL of free water being ordered (ORDERID = 1757275) and the actual volume is -7250 mL - I cannot think of any reason for such a difference in volume or the different sign.

image

A very common ITEMID is 225183 which represents the current goal of dialysis which makes up 1000s of cases by itself.

I think that every in or out should be positive and distinguishable, as IN or OUT, in some other manner. I've been using the existence of an ORDERID to represent INs. Perhaps the negative values can be corrected in the next release.

Flags

Which exp_flg are available without calculation.
Cheers
Sven

Encoding issue for CHARTEVENTS_DATA_TABLE.csv's uom column?

When I open CHARTEVENTS_DATA_TABLE.csv I see from lines 520 to 524:

327,23,124321,234044,220179,21-OCT-2157 21:40:00,21-OCT-2157 21:43:00,19050,"100",100,"mmHg",0,0,"",""
1,23,124321,234044,223761,21-OCT-2157 12:05:00,21-OCT-2157 12:05:00,16978,"95.8",95.8,"¿F",0,0,"",""
2,23,124321,234044,223761,21-OCT-2157 12:05:00,21-OCT-2157 12:05:00,16978,"95.8",95.8,"¿F",0,0,"",""
3,23,124321,234044,223761,21-OCT-2157 12:05:00,21-OCT-2157 12:05:00,16978,"95.8",95.8,"¿F",0,0,"",""
4,23,124321,234044,220045,21-OCT-2157 12:06:00,21-OCT-2157 12:08:00,16978,"77",77,"bpm",0,0,"",""

¿F looks like some encoding issues.

In Postgresql, in which I imported MIMIC using the tutorial, I see ¿F:

image

Patients Urine Output - questions

Dear all,
I have some doubts regarding the codes that should be used to analyse the urine output of the patient. I am interested in study patients with kidney injuries and I need to verify the level of the urine output volume. I need in the mL/kg/hr units. Can I use 51108/51109 from LABEVENTS or should be use another code from IOEVENTS?
Another question, what is the difference between the codes 51108 and 51109? They seem equal, in the value, date and units.
If we have to use the LABEVENTS code, how can we have the urine output in mL/kg/hr? To have in mL/kg it is just needed to divide by the weight of the patient, but what is the sample time rate of the measurements from 51108 and 51109?
Thank you!
Sincerely,
Vanessa Cunha

D_ITEMS contains entries without any corresponding instances in the database

There are many itemids in D_ITEMS that supposedly link to the chartevents table, yet there are no corresponding entries in chartevents with those itemids.

select distinct(d.label)
from mimiciii.d_items d
where linksto like 'chartevents'
and itemid not in (select distinct itemid from mimiciii.chartevents);

Per discussion with @tompollard, some of these itemids are not yet in chartevents because their raw data contains PHI that has not yet been deidentified. However, there are many others that are simply 'junk' entries.

Two options for resolving this:

  • remove both PHI and 'junk' entries for now, re-introducing the PHI entries as we clean the data
  • remove only the 'junk' entries, and add some sort of flag to the PHI entries to indicate these may be available at some point in the future

Update build scripts for updated v1.2 schema

The scripts for building MIMIC will need to be updated to account for the changes between version 1.1 and version 1.2. These include:

  • the icustayevents table has been renamed to icustays
  • ioevents has been replaced with inputevents_mv, inputevents_cv, outputevents
  • renamed columns in DIAGNOSES_ICD and PROCEDURES_ICD tables

The ideal MIMIC-III system

I'm looking for the ideal specs for a system that could be used to run MIMIC-III. The aim is to keep costs down while maximising performance, stability and query speed. As an added bonus I'd like to run various types of analysis software, and to run software on the waveform database.

Specific to the MIMIC-III database, I'm interested in knowing the ideal:

  • operating system
  • SQL software
  • RAM
  • CPU

The main reason I ask is because I've tried importing the dataset in to Postgres on my laptop with windows and it's crashed a few times. I tried installing an Ubuntu virtual system on various computers but have run in to issues with crashing/speed. I'm looking to put together a dedicated system for MIMIC-III and other large datasets, but if something like a macbook is equally as useful I'll even consider purchasing one.

Ventilator modes not found for metavision patients

I am using MIMIC III v1.1 and was unable to find information regarding ventilator modes from the Metavision data source (specifically, ITEMID== 223849 in CHARTEVENTS). Is this information available elsewhere?

Choosing a favoured database system for the public version of MIMIC

Trying to support multiple flavours of database system creates a fair amount of extra work, mostly unnecessarily. Up until now we have favoured PostgreSQL as our public database system. Switching to MySQL may be worth considering, mainly because it would move us in line with @pszolovits's group.

Some things to consider:

  • how much does the syntax differ from PostgreSQL? e.g. with clauses are not supported?
  • which syntax is most similar to Oracle: PostgreSQL or MySQL? Relevant while we continue to use Oracle internally.
  • are there any important features that would justify choosing one system over another?
  • which system is preferred by the research community?

Date-shifting masking number of alcoholic beverages per week in discharge summary

I have found cases where the data-shifting algorithm masks the number of alcoholic beverages per week that may be of some clinical significance in the discharge summaries. Low priority issue.
An example (with added spaces to prevent markdown styling):

She has approximately [* * 11-26 * *] alcoholic beverages per week, does not smoke..."

HADM_ID = 120167

Duplicated rows in IOEVENTS

It appears that many events are duplicated many times (up to 24 times).

image

This figure shows the same order, by ORDER ID, listed 24 times with identical fields. I strongly doubt that 24 * 204 mL was actually given. In an even more extreme case 5000mL is given 9 times at the same time.

image

These are not cases where the order is split over multiple rows and the component volumes total the ordered volume. These and all of the other cases of 2 or 3 duplicates I have inspected have been perfect replicates.

I suggest this issue should be addressed in the next build and release. In the mean time, care should be taken to filter by unique ORDERID before any summation.

Initial SAPS score.

Thank you very much for this database!

I use mimic III (haven't used Mimic II before). One thing that makes understanding the data difficult for me is that I don't see an accessible indicator of disease severity. It looks like mimic ii had the Saps score precalculated.

Is this possible with mimic III too (any other score, sofa, Apache would also work).

Thank you very much for your help.
Veit

Columns in INPUTEVENTS CV and MV

I'm migrating from v1.1 to v1.2 and the website says that the columns are the same for both INPUTEVENTS_CV and MV tables but from what I've seen they aren't.

vincents-mbp:raw vincent$ head -1 INPUTEVENTS_CV_DATA_TABLE.csv 
"ROW_ID","SUBJECT_ID","HADM_ID","ICUSTAY_ID","CHARTTIME","ITEMID","AMOUNT","AMOUNTUOM","RATE","RATEUOM","STORETIME","CGID","ORDERID","LINKORDERID","STOPPED","NEWBOTTLE","ORIGINALAMOUNT","ORIGINALAMOUNTUOM","ORIGINALROUTE","ORIGINALRATE","ORIGINALRATEUOM","ORIGINALSITE"
vincents-mbp:raw vincent$ head -1 INPUTEVENTS_MV_DATA_TABLE.csv 
"ROW_ID","SUBJECT_ID","HADM_ID","ICUSTAY_ID","STARTTIME","ENDTIME","ITEMID","AMOUNT","AMOUNTUOM","RATE","RATEUOM","STORETIME","CGID","ORDERID","LINKORDERID","ORDERCATEGORYNAME","SECONDARYORDERCATEGORYNAME","ORDERCOMPONENTTYPEDESCRIPTION","ORDERCATEGORYDESCRIPTION","PATIENTWEIGHT","TOTALAMOUNT","TOTALAMOUNTUOM","ISOPENBAG","CONTINUEINNEXTDEPT","CANCELREASON","STATUSDESCRIPTION","COMMENTS_EDITEDBY","COMMENTS_CANCELEDBY","COMMENTS_DATE","ORIGINALAMOUNT","ORIGINALRATE"

I might be good to state that MV has all the columns but CV doesn't have x, y, and z for example.

expire_flag column in patients table should be numerical data type

The build script for the patients table creates the 'expire_flag' as a varchar(5), probably because the flag used to be a 'y/n' string: https://github.com/MIT-LCP/mimic-code/blob/master/buildmimic/postgres/postgres_create_tables.sql

    EXPIRE_FLAG VARCHAR(5) NOT NULL,

It would make sense to change the column to a numerical type to avoid people getting the following error when querying the column in the usual way for a number:

SELECT * 
FROM patients 
WHERE expire_flag = 1;

-- Returns error:
-- ERROR:  operator does not exist: character varying = integer
-- LINE 1: select * from patients where expire_flag = 1;

\COPY doesn't support variables

The following merge has introduced some errors into the postgres build scripts:
#18. One issue is that \COPY and COPY are different commands, and \COPY doesn't appear to support variables:

Do not confuse COPY with the psql instruction \copy. \copy invokes COPY FROM STDIN or COPY TO STDOUT, and then fetches/stores the data in a file accessible to the psql client. Thus, file accessibility and access rights depend on the client rather than the server when \copy is used.

Propagating indexes from CHARTEVENTS table to its inherited childrens

I just realized that when we run the current postgres_add_indexes.sql script. The 4 indexes will only be created on the mother table CHARTEVENTS. This can be meaningless, as there is no data in the mother table. The indexes will not automatically propagate to the children tables. Guess we may need to amend the script and create the indexes for each individual children table.

Platelets, RBC

Looking for transfusion with platelets and or RBC. Which approach is best? ICD code or IO ITEMID?
Cheers
Sven

ICUSTAY_IDs with set tidal volume recording but no MV billing code. Why?

  1. Why are billing codes more reliable? At least 29% of the ICU stays I have TV data for do not have a ventilation billing code recorded during the duration of the ICU stay. Does it make sense to exclude these ICU stays?

Please double check this. This is not good at all if accurate. There are two possibilities I can think of to explain this is that the patients with TV data but no billing codes for MV were on NIV. The first is that MV in the operating room has a different billing code and this was not given to us. Patients who had open heart surgery would have this billing code and not the one that we have.
The other possibility is that sometimes, TV is recorded when patients are on NIV even though these measurements are highly inaccurate. Can we check if they have billing codes for NIV?
If billing codes indeed do not capture if patients were on MV, we will have to abandon it. We should review the discharge summaries of a sample of the patients with TV but no billing codes for MV.
We should have a conference call once we've done these checks.

I rechecked my work, and 3857/14759 (26.1%) of ICUSTAY_IDs with a set TV recording have no billing code for IMV. Including NIV billing codes, 3701/14759 (25.1%) of ICUSTAY_IDs with a set TV are still unaccounted for. Next, if I subset the set TV recordings by AC/CMV only, a smaller proportion - 1424/8749 (16.3%) - of ICUSTAY_IDs are missing MV billing codes. On the flip side, every ICUSTAY_ID with a MV billing code has at least 1 set TV recording.

I've mapped billing codes to ICUSTAY_IDs based on whether the billing CHARTDATE was made on or after INTIME and on or before OUTTIME of the ICU stay. I've lost 4% of the MV billing codes this way, but it's not enough to explain these results.

Here you may review the discharge notes of the patients with a set TV recording but no IMV billing code. There's a handful of duplicate notes, for multiple ICU stays with MV within the same hospital admission.

  1. Billing codes only have a time resolution of days, while we discussed a time resolution of hours for recording TV. To calculate duration of MV, I think it might make sense to validate the MV through the billing code and calculate duration starting from the first TV recording to avoid this unnecessary loss of resolution. Else, should I take the time difference between the CHARTTIME for TV and the CHARTDATE for the first MV billing code?

The way we calculate duration of MV previously is based on the first and last recording of MV settings. The first recording of MV settings is reliable as the start of MV but the last recording is questionable as MV settings may remain the same during the last few days of MV and may not be recorded.
For the patients who have billing codes for MV, can we please check if the last day of billing for MV correspond to the last day MV settings were recorded?
We should have a conference call once we've done these checks.

For patients who have billing codes for MV, the last day of billing for MV corresponds to the last day MV settings were recorded for 8303/11058 (75.1%) of the ICUSTAY_IDs. Of those that do not correspond, 2457/11058 (22.2%) have a set TV recording after the last day of MV billing, and 298/11058 (2.7%) have a set TV recording before the last day of MV billing.

To get a sense of the TV data, I've calculated duration of MV as suggested above. Here are the set TV recordings, named by ICUSTAY_ID.txt. The TIME column is the number of hours after the first MV recording. Let me know if this looks good and I can reformat it to what we discussed.

Endoscopy reports

I was wondering if there is any way to find the endoscopy reports. I can see endoscopy done in the procedure ICD9 tables but I can't find notes in the notes events table. Any input?

Thanks

APACHE score or similar

I'm looking to estimate disease severity using some kind of mortality prediction or comorbidity score. The trouble that I've run into is that there are very few recorded APACHE II, IV or SOFA scores or even the defined component items that constitute the scores.

Before I try to write something to search through chartevents, labevents and discharge summaries for the APACHE components, I was wondering if there is any other kind of comorbidity score that is widely prevalent in the database. I would assume that the contributing hospital would use some kind of score but I can't find it.

ITEMID within IOEVENTS incorrectly linked to D_ITEMS

The minimum ITEMID defined in D_ITEMS to reference the IOEVENTS table is 30001 however, the IOEVENTS table regularly contains events with ITEM_ID values with nchar() equal to 1, 2 or 3 and therefore cannot match with the correct defined item.

Common examples such as 13, 18, and 56 are either not defined anywhere in D_ITEMS (13 and 18), or wrongly correspond to something from CHARTEVENTS (56 --> Atrial Pacemaker [Sense]) which leads me to believe they actually correspond to 30013 (D5W), 30018 (0.9% Normal Saline), and 30056 (Po Intake) which are correctly described with ORIGINALROUTE as intravenous push, intravenous push and oral respectively within the IOEVENTS table.

I suggest that the documentation is updated as a temporary fix to avoid further confusion and the bug is fixed in the next database build and release.

MonetDB import issues - chartevents.value - backslashes in the text

Hello,

Some "chartevents.value" values contains backslashes. (eg: rowid = 43077585; itemid= 807; value= "\113"; I haven't count all occurences)
Some database (at least monetDB) must double backslashes in order to bulk load data.
Moreover this seems to disturb "chartevents.valuenum" to be computed at least for itemid 807 that need numerical values.

For now my workaround is to remove backslashes from csv thanks to "sed -i 's///g' CHARTEVENTS.csv" command.

Thanks,

hospstay_seq is generated incorrectly in icustay_detail

Mike found a likely bug in the icustay_detail generation (https://github.com/MIT-LCP/mimic-code/blob/master/demographics/postgres/icustay_detail.sql), where the hospstay_seq is generated incorrectly and always gives a 1 for some patients. See discrepancy below.

select count(1) 
from icustay_detail 
where hospstay_seq = 1 
and icustay_seq = 1;

 count 
-------
 57328
select count(distinct(subject_id)) 
from icustay_detail 
where hospstay_seq = 1 
and icustay_seq = 1;

 count 
-------
 46428
select * 
from icustay_detail 
where subject_id = 41976;

 subject_id | hadm_id | icustay_id |      admittime      | hospstay_seq |       intime        | los_icu | icustay_seq 
------------+---------+------------+---------------------+--------------+---------------------+---------+-------------+
... etc
(15 rows)

Weird text in noteevents data

Starting from the line number 50740610 in noteevent file, or in the row with row_id 639042 in noteevents table, the text field is weird.

MIMIC website has wrong information about diagnoses_icd

On this page, https://mimic.physionet.org/mimictables/diagnoses_icd/, it is said that table diagnoses_icd has a column "description" of type VARCHAR(30); it is missing from the actual csv file, but still referred to by the indexing script posted here: https://github.com/MIT-LCP/mimic-code/blob/master/buildmimic/postgres/postgres_add_indexes.sql

When running the indexing script, the error message shown by Postgres is as follows:
psql:postgres_add_indexes.sql:29: ERROR: column description does not exist

Since d_icd_diagnoses includes both short and long title of each icd-9 code used in the diagnoses_icd table, I recommend that modifying the MIMIC website page to remove the description entry.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.