Coder Social home page Coder Social logo

amsterdamumc / amsterdamumcdb Goto Github PK

View Code? Open in Web Editor NEW
147.0 147.0 41.0 49.03 MB

AmsterdamUMCdb - Freely Accessible ICU database. Please access our Open Access manuscript at https://doi.org/10.1097/CCM.0000000000004916

Home Page: https://amsterdammedicaldatascience.nl/

License: MIT License

Jupyter Notebook 99.55% Python 0.45%

amsterdamumcdb's People

Contributors

dependabot[bot] avatar patrickthoral avatar peiyaoli avatar tariqdam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

amsterdamumcdb's Issues

Preferred Citation

I see that there's a manuscript in the GitHub repository. Is there a preferred citation for this database?

Destinations

The destination that the patient is discharged to after ICU is just given as a number. I would really like to know a further description of the location of discharge as it is important to the interpretation of my current project. Is it possible to have more information on this? In particular the destinations corresponding to '15', '16', '19', '2', '25', '33', '40', '41', '45', '9'. But especially 15 and 16?

How should I inquire about patients admitted to the hospital for trauma?

 I searched the content about "traum" in the reason view, but only 1216 people were filtered out. Considering that the UMC database contains so many patients, I suspect that my query is not complete (MIMICIII database contains 3112 patients who were admitted to hospital due to trauma, and the EICU database contains 12009 people. The data volume of these databases is similar to that of UMC, but the trauma patients are much higher than that of UMC database).

 Apart from searching the reason view, can I search for patients admitted to hospital due to trauma in other ways?

Mental Health Patients

Hello:

I have been trying to find patients with mental diseases from the database, but, could not find any. The only information that I could find relevant to psychiatry is patients who had antipsychotics. This is very weird as Amsterdam University Medical Center has a Department of Psychiatry and the Urban Mental Health Institute.

Could you please confirm this information?

Ethnicity of patients

Hi,
Does AmsterdamUMC DB contain any information about ethnicity? Or is there a way to derive it? Thanks

fewer eosinophils than basophils?

Hi,

I find that there are many more basophil than eosinophil measurements - which is perhaps unsual since they usually come together?

eos

Thanks,
Drago

"Drukzak"-Item in AMDS Database

Hi,
my team and I are currently working on machine learning applications in the field of acute kidney injury using the AMDS database. First of all: Thank you so much for providing such high quality data to a scientific public.

We're currently working up the different features in the database and one item has caught our eye since it shows some importance in different models we're working with: Item No. 8937, called "Drukzak". This is an item measured in ml over a period of time. To us it is not completely clear though what hides behind this item. We translated it to "pressure infusion". But since in some patients only small amounts of volume (100-200ml) are recorded, we were wondering if we were right here. What can we expect a patient received clinically when "Drukzak" is recorded?

Thank you so much in advance for your kind help.

Kindest regards

Information on AUMC ICUs

I was trying to figure out some stats on the hospital where this dataset originates from. What I gathered so far:

  • in 2018, Amsterdam UMC resulted from a merger of VU Medical Center (VUmc) and Academic Medical Center (AMC). Data was collected 2003 through 2016; I assume from VUmc, is this correct?
  • Adult Intensive Care at VUmc is organized into 3 units: two intensive care units with 22 beds and a 12 bed Medium Care or High Dependency unit, while VUmc as a whole is a 733 bed hospital
  • From the 2018 HSMR report, I gather that there are roughly 20,000 yearly admissions to VUmc. I could not find a number on yearly admissions to ICUs. What would be a ballpark estimate?

@patrickthoral Are my numbers ok? Happy to hear from you in case I got something wrong.

Questions from reason for admission script

In 'concepts/diagnosis/reason_for_admission.ipynb' script:

  1. The following are marked as diagnoses variables, but not surgical. Is this correct? And if so, why are they not surgical?
    --Not surgical: 13141, --D_Algemene chirurgie_Algemeen
    --Not surgical: 16642, --DMC_Algemene chirurgie_Algemeen

  2. Most cultures (as well as antibiotics including metronidazole, co-trimoxazol, co-amoxiclav) are excluded from the definition of sepsis because they are 'routinely used'. These seem relevant to a diagnosis of sepsis, so I was wondering how this was determined? Including these may result in false positives in the data, but excluding them may perhaps lead to many more false negatives. Is it just the case that specificity was deemed more important in this script here than sensitivity?

Questions from sofa script

In the ’concepts/severityscores/sofa.ipynb’ script:

  1. Outliers for FiO2 values (>100) are discarded. Before converting to FiO2 %, there are also some entries that have a value between 1 and 20, which might also be outliers? Is it reasonable to assume this and discard them as well? (if not removed, it would give an artificially low PF ratio which is carried forward to the SOFA calculation).

  2. In the SOFA bilirubin score, itemid 6813 (Bili Totaal, 2.7% of the entries) has units umol, while itemid 9945 (Bilirubine (bloed)) has units umol/l. We want the units of everything to be the latter. Is the umol unit in fact umol/l unit? If not, how would they be converted?

  3. For the SOFA cardiovascular score, we retain cardiovascular drugs that are administered with rate >0.1. This excludes those with rate 0.1 (the minimum value). Why is this? Is it to exclude those with rate 0, and if so, should we change this to >=0.1?

Typo in calculation of liver SOFA score

Hi guys,
I noticed while looking through the calculation of liver SOFA scores in AmsterdamUMCdb/concepts/severityscores/sofa.ipynb that the calculation accidentally uses the platelet value. Its in input cell 36:

#calculate SOFA liver score:
sofa_bilirubin.loc[:,'sofa_liver_score'] = 0
sofa_bilirubin.loc[(sofa_bilirubin['value'] >= 20) & (sofa_bilirubin['value'] < 33), 'sofa_liver_score'] = 1
sofa_bilirubin.loc[(sofa_bilirubin['value'] >= 33) & (sofa_platelets['value'] < 102), 'sofa_liver_score'] = 2
sofa_bilirubin.loc[(sofa_bilirubin['value'] >= 102) & (sofa_platelets['value'] < 204), 'sofa_liver_score'] = 3
sofa_bilirubin.loc[(sofa_bilirubin['value'] >= 204), 'sofa_liver_score'] = 4

sofa_bilirubin.head()

you can see in lines 4 and 5 that it performs the logic with sofa_platelets not sofa_bilirubin

Hope this helps

SQL issue in reason_for_admission.ipynb

I noticed a minor issue with the long SQL query in the notebook for the reasons for admission when rewriting for the code using pandas direct from the data csv files (since my department didn't want me to install postgres). I investigated and it does not seem to affect very many admissions within the cohort, so it's only a small thing!

I'll explain using the diagnoses part of the SQL query as an example, though I think it may affect other parts of the query that are performed in the same way.

The relevant part of the query is

        ROW_NUMBER() OVER(PARTITION BY admissionid
        ORDER BY 
            CASE --prefer NICE > APACHE IV > II > D
                WHEN itemid = 18671 THEN 6 --NICE APACHEIV diagnosen
                WHEN itemid = 18669 THEN 5 --NICE APACHEII diagnosen                
                WHEN itemid BETWEEN 16998 AND 17017 THEN 4 --APACHE IV diagnosis        
                WHEN itemid BETWEEN 18589 AND 18602 THEN 3 --APACHE II diagnosis
                WHEN itemid BETWEEN 13116 AND 13145 THEN 2 --D diagnosis ICU
                WHEN itemid BETWEEN 16642 AND 16673 THEN 1 --DMC diagnosis Medium Care
            END DESC,
            measuredat DESC) AS rownum

The problem is that diagnoses are ordered by diagnosis type and time, but when there are multiple entries that share the same diagnosis type and time, then this ordering is not necessarily unique. In this instance, SQL has some internal sorting that will (randomly) sort these entries, and later in the SQL query only the top entry is retained. However, these entries (sharing the same 'key' in the ordering) may correspond to different diagnoses, and may not all be 'surgical' diagnoses (I noticed this because pandas had ordered these entries differently and this lead to different numbers of admissions in each category in the final table).

I think the solution would to aggregate across entries with the same sorting 'key'. In pandas you might use something like the following code chunk, instead of only taking WHERE diagnoses.rownum=1 in the SQL query.

diagnoses = diagnoses.groupby(['admissionid', 'typeid', 'measuredat']).agg(
        surgical=pd.NamedAgg(column='surgical', aggfunc='any'),
        diagnosis=pd.NamedAgg(column='diagnosis', aggfunc=lambda x: '; '.join(v for v in x.unique())),
        diagnosis_id=pd.NamedAgg(column='diagnosis_id', aggfunc=lambda x: '; '.join(v for v in x.unique()))
    ).reset_index()

I'm not experienced in SQL enough to suggest how to modify the query instead.

I hope what I've said here makes sense! As I said, this is only a small issue that affects very few admissions, so I don't know how important it is to rejig the SQL here, but I thought it was worth pointing out.

Some variables related to sepsis

I am working on query some variables related to sepsis and have found most of them from the code and issue answers, but still not sure about the following:

  1. Elixhauser score (premorbid status) and SIRS score
  2. IV fluid intake
  3. Chloride, BUN, calcium, ionized calcium, carbon dioxide, SGOT, SGPT, PTT, PT, INR, PaCO2

Could you please help me identify them?
Thanks very much!

Any possibility for Sepsis study in AmsterdamUMCdb?

Hi, thanks for contributing the interesting AmsterdamUMCdb and this code repository.
I'm currently developing algorithms to detect sepsis under Sepsis3 definition. I'm wondering whether AmsterdamUMCdb is a good choice.
But I'm not able to find Glasgow coma scale score (required by SOFA score) and microbiology (culture) information (required for suspected of infection) in this database.
Do you think above information are available or have any suggestions on studying sepsis in AmsterdamUMCdb?
Thank you.

InvalidTextRepresentation error reported

Hi, I tried to run the notebook, but error report like:

---------------------------------------------------------------------------
InvalidTextRepresentation                 Traceback (most recent call last)
<ipython-input-15-647373ae7118> in <module>
     28 
     29 csv = os.path.join('..', config['files']['datapath'], config['files'][table])
---> 30 copy_progress(csv, table) #runs copy_from using a tdqm progress bar

<ipython-input-12-ae2dba2b7e6d> in copy_progress(csv, table)
     57 
     58     pfile = ProgressFile(pbar, csv, 'r') #create a ProgressFile for showing progress
---> 59     cursor.copy_from(pfile, table, null="NULL")
     60 
     61     #close the objects

InvalidTextRepresentation: invalid input syntax for integer: "1,1,1,IC,0,,0,2010-2016,96120000,26,15,Man,60-69,,70-79,Anamnestisch,170-179,Anamnestisch,Cardiochirurgie"
CONTEXT:  COPY admissions, line 1, column patientid: "1,1,1,IC,0,,0,2010-2016,96120000,26,15,Man,60-69,,70-79,Anamnestisch,170-179,Anamnestisch,Cardiochir..."

Locations MC&IC and IC&MC

There are four different locations in the admission table as stated in the wiki they are as follows:
the department the patient has been admitted to, either IC, MC or both (IC&MC or MC&IC)

I have two question about these locations:

My first question is does the order of the ward acronyms indicate the patients journey. E.g. did a IC&MC move from an IC ward to an MC ward or are they interchangeble?

I have noticed that the combined locations (IC&MC or MC&IC) are a newer location description the admission to these wards have increased after 2010 see the admission counts by ward and year below.

Screenshot from 2022-03-28 14-30-15

So the second question is are these physically new wards that were added to the hospital at some point?

Best Practice to Extract In-hospital Mortality Label

Hi, I want to be working on in-hospital mortality prediction based on a variety of lab measurements provided in the dataset.

For this task, I would like to kindly ask what would be the best practice to extract high-quality in-hospital mortality labels.

I know that dateofdeath and dischargedat in admissions table could be used for such purpose. However, as stated in this issue dateofdeath is coming from a different source. Therefore, I want to ask you about how to handle the time difference between dateofdeath and dischargedat . To make it more concrete, would you recommend the following: if (dateofdeath - dischargedat) is less than one day (i.e. dateofdeath is later but not more than 1 day), it still can be counted as in-hospital mortality?

PS: The time diff (dateofdeath - dischargedat) didn't look to have a clear cut-off point for the above scenario (i.e. 12 hours, 1 day, 2 days etc.). That's why I need your expert opinion before I get to start.

Best regards,

Where to find culture information for sepsis-3 cohort

Hi,
We are searching cohort based on sepsis-3 which involves the concept of “suspected infection” – i.e., either patients those had body fluids sampled for culture and received antibiotics. Does AmsterdamUMCdb contains any culture information? I cannot find any by searching “culture” on all tables. Is there any synonym for culture at AmsterdamUMCdb?

encoding problem

sorry to bother. When I set up the database, the encoding problem struggled me. I had to change SET CLIENT_ENCODING TO 'utf8' for the following 2 cells, otherwise error occured as below. When I set up the database, I try to join two tables in one sql query which using two encoing , conflicts and error occurred again. I don't know how I could solve this. Could you help me,THX in advance!
image
image
image

ventilator weaning

Hi,
I was wondering if there were some indicators of ventilator weaning in this data set, such as the time of weaning, etc. I have searched for the information of ventilation mode in the "listitems" table, but can't find the information about weaning.
Thanks for your help

follow up time

hi, I want to calculate survival days like 90-day or longer mortality from the date of ICU admission. I didn't find related information about follow-up time after discharge. And also I find in this paper 1-year mortality was calculated.
Sharing ICU Patient Data Responsibly Under the Society of Critical Care Medicine/European Society of Intensive Care Medicine Joint Data Science Collaboration: The Amsterdam University Medical Centers Database (AmsterdamUMCdb) Example.
Could you please clarify this? Thx a lot!
image

Are these numbers reasonable?

I have no idea how to fact check other way, still trying to figure out how the database works. I have a question about mechanical ventilation modes - I am trying to extract NIV patients. Found in process items "Beademen non-invasief", google translate says that should be patients receiving noninvasive ventilation.

SELECT admissionid ,start ,stop ,duration FROM processitems WHERE itemid = 10740 --Ventilatie Mode (Set)

I get around 1000 patients, is it reasonable or are they stored somewhere else too?

Also, when I try to extract SpO2 for them, seems like 50% of the patients have SpO2 in the first two hours of their NIV ventilation.

WITH spo AS ( SELECT n.admissionid, n.itemid, n.item, n.value, CASE WHEN NOT n.registeredby IS NULL THEN TRUE ELSE FALSE END as validated, (n.measuredat - a.admittedat)/(1000*60*60) AS time, FROM numericitems n LEFT JOIN admissions a ON n.admissionid = a.admissionid WHERE itemid IN ( 12311, --O2-Saturatie (bloed) 6709 --Saturatie (Monitor) ) and n.admissionid IN UNNEST(@admissionids) ) SELECT * FROM spo

So, either I am looking at this data wrong, or it is possible that not all of these patients are monitored?

Comorbidities

Hi,
Could you please tell me if there are comorbidities? If yes, where are they stored? Are they encoded with ICD9/ICD10/DRG?

Thanks

"value too large"

I using postgreSQL to install UMCdb, when it copy "listitems.csv" and "numericitems.csv", the softwaer prompts error message: "psql:postgres-load-data-csv.sql:16: error: unable to extract files "listitems.csv" status: value too large" and "psql:postgres-load-data-csv.sql:18: error: unable to extract files "numericitems.csv" status: value too large".
What's wrong about it? What can I do to solve these problems?
Thank you !

ventilator weaning

Hi,
I was wondering if there were some indicators of ventilator weaning in this data set, such as the time of weaning, etc. I have searched for the information of ventilation mode in the "listitems" table, but can't find the information about weaning.
Thanks for your help

Measurements after date of death

I have noticed that there are some patients with measurements after the date of death.

I am a little confused, because from this issue, it seems that dateofdeath should be exact for patients who died in ICU, but from this other issue, it seems that dateofdeath may be approximated.

Should I truncate the inpatient stay to the date of death and treat the measurements as errors? Or should I treat the date of death as an error and match it to the discharge offset?
Thank you in advance!


Reproducible example: admissionid=1013, has two creatinine measurements after dateofdeath.

Urine output and serum creatinine measurements

Hi, I have questions about urine output and serum creatinine measurements in the DB.

I found several itemid related to urine output: UrineCAD, UrineSupraPubic, UrineUP, UrineSpontaan, UrineIncontinence should I consider all of them or just a few as valid measurements extracted from the patient’s catheter? Also, some acronyms used are unclear to me (ex: UrineUP, UrineCAD), could you provide me with more details?

About serum creatinine: could you explain the difference between Kreatinine (bloed), A_Serum_Kreatinine, MCA_Serum_Kreatinine and RA_Serum_Kreatinine? They are all expressed in umol/l, but are all of them extracted from blood (except from the first one where is specified)?

Thanks

Duplicate measurements

I have noticed a few duplicate laboratory measures which are as follows:

  • APPT (bloed) is duplicated with slightly different data seems to have an extra space before (bloed). IDs: 17982 & 11944
  • B.E. and B.E. (bloed). IDs: 6807 & 9994
  • Glucose (bloed) and Glucose Astrup. IDs: 9947 & 9557
  • Kalium (bloed) and Kalium Astrup. IDs: 9927 & 9556
  • PH and ph (bloed). IDs: 6848 & 12310
  • Prothrombinetijd (bloed) duplicated seems to have an extra space before (bloed). IDs: 11894 & 11893
  • pCO2 and pCO2 (bloed). IDs: 6848 & 9990
  • Laktaat, Laktaat Astrup & Laktaat (bloed). IDs: 6837, 9580, 10053

As stated in the list some seem to have a rogue space in the name and others are slight variations in name. When looking at recordings with these labels it can be noticed that some disappear after 2010 presumably because someone combined them.
To pick an example Laktaat and Laktaat Astrup get included into Lactaat (bloed) see table below.

missingness_pattern_years

My question has two parts:
Are these indeed duplicate names for the same blood test or is there another reason for this (eg. conducted in different labs or with different methods)?
Can I simply combine these into one?

Update :

I have looked further into this to check whether a combination of either of the measures would cause a data conflict by creating several different entries for the same timestamp.
This is mostly not the case. Duplicated entries for the same timestamp do exist. The good news is that the vast majority of those have an identical recording so either of the duplicated entries can be used.
A very small proportion has different entries for the same timestamp see table below:
Screenshot from 2022-03-29 15-11-29
To put this into context the total numbers of entries for these are:

  • B.E. n=679875
  • Glucose n =841093
  • Kalium n=240869
  • pCO2 n=685155
  • pH n= 685148

Conclusion:
I think based on the fact that the vast majority of data points do not have any conflicts I assume that the duplicated labels can be used interchangeably. I suggest for anyone running into the same issue to combine the labels drop one record for duplicated entries if the values are the same and discard the very few where the values differ.

Diabetes diagnosis

Hi,

I was wondering if it was possible to reliably determine whether patients are diabetic or not?

Thanks!

Apache II bugs

From: @alexfabregat

I found two bugs in the ApacheII determination notebook that, although could have very little impact on the cohort, maybe you want to fix.
Regarding the sodium score
The line

sodium.loc[(sodium['value'] >= 120) & (sodium['value'] < 129), 'a2_sodium_score'] = 2

should be

sodium.loc[(sodium['value'] >= 120) & (sodium['value'] < 130), 'a2_sodium_score'] = 2
  1. Regarding the temperature
#cleanup temperatures
#assumes temperatures  > 100 have been entered without a decimal separator
temperature.loc[temperature['value'] > 100, 'value'] = temperature['value']/100

If you divide by 100 a value with missing decimals you will not get a "corrected" one. For instance, a value of 360 degrees Celsius should be divided by 10 and not 100 to obtain a meaningful value, namely 36.

Error when admissionid is negative in get_fluidbalance amsterdamumcdb/fluidbalance.py

The check .isnumeric() fails when the int is negative, as can sometimes happen with admissionid.

Is more correct to try to convert to the type and check for an error if impossible, replacing:

    assert str(admissionid).isnumeric(), "admissionid is not a number: %r" % admissionid
    assert str(from_date).isnumeric(), "from_date is not a number: %r" % from_date
    assert str(to_date).isnumeric(), "to_date is not a number: %r" % to_date

with:

     try:
      admissionid = np.int64(admissionid)
    except:
      raise Exception("admissionid is not a number: %r" % admissionid)

    try:
      from_date = np.int64(from_date)
    except:
      raise Exception("from_date is not a number: %r" % from_date)

    try:
      to_date = np.int64(to_date)
    except:
      raise Exception("to_date is not a number: %r" % to_date)

Best regards.

CHF, ESRD and Sepsis

Hi,
I want to evaluate admitted patients with ESRD, CHF, and Sepsis. I don't know if I consider the following values for CHF and ESRD, I have all of them or not? Could you please help me with how I can consider all ESRD and CHF patients? Considering the 'value' is a good approach to extract patients?
I have found different numbers of septic patients. Could you please tell me how many patients with sepsis exist? 3136 admission is correct?!

Values of CHF: 'Non-operatief Cardiovasculair - Congestief hart falen', 'Non-operative cardiovascular - CHF, congestive heart failure', 'Congestief hart falen', 'CHF, congestive heart failure'.

Values of ESRD: 'Dialyselijn Subclavia', 'DMC_Interne Geneeskunde_Renaal', 'Non-operatief Renaal', 'Dialyselijn', 'Apache II Operatief Renaal', 'Graft for dialysis, insertion of', 'D_Interne Geneeskunde_Renaal', 'Apache II Non-Operatief Renaal','Renaal', 'Post-operative cardiovascular - Graft for dialysis, insertion of'.

Thanks for your kind consideration,

High Resolution Waveforms

Hello,

According to ST Vistesen et al letter in BJA (forthcoming, on VitalDB database in Korea), the AmsterdamUMCdb has vital sign sample resolution of once per minute. I'm looking for high resolution waveform databases at academic medical centres and have found 7 in US, 1 in Canada, and none in Europe. In United States I've seen: U Michigan, U Virginia, UCLA/UCI, Case Western Reserve U, Johns Hopkins U, U Maryland, MIT (MIMIC at BIDMC). In Canada, 'Artemis' system at Toronto Children's, built by U Toronto.

My questions are: 1) Does AmsterdamUMCdb have high-resolution (125 hz sample rate or higher) ECG and/or invasive BP waveforms? and 2) Are you aware of any other academic medical centres doing high temporal resolution waveform capture and curation?

Thanks

Ebenezer Tolman
Anesthesia Technician Supervisor
Tufts Medical Center
Boston MA USA

Where to find APACHE IV scores

Problem: APACHE IV scores

While looking for APACHE IV scores, we stumbled upon 2 item codes in the 'numericitems' table:

  1. "NICE Apache IV Score" - 326 distinct patients
  2. "A_Apache_Score" - 8327 distinct patients

The value ranges for "NICE Apache IV Score" resemble what is expected for APACHE IV while "A_Apache_Score" has values that resemble APACHE II.

2 questions:

  1. Are APACHE IV scores present for most patients in this db and if so where can we find them?
  2. Which version of APACHE does "A_APACHE_Score" represent?

can't find the blood urea nitrogen values

I am querying the blood urea nitrogen (BUN) in the numericitems.csv table of the AUMCdb database, but I cannot find any itemid for this feature. Can you please tell me the itemid of blood urea nitrogen? I also used amsterdamumcdb.get_dictionary(), but I still could not find it.
Thanks!

Units of measurement

Hi,

I have a couple of questions about various units of measurement in the dataset. I am trying to convert the units so they match the units used in MIMIC/eICU. I have the following questions:

  • base excess (itemids 6807 and 9994)
    Screen Shot 2020-08-10 at 13 16 56
    It seems like the distribution is asymmetric, and different from the other 3 large databases. Any thoughts as to why this might be?
  • insulin (itemids 2663 / 4218 / 6929 (not all of these actually appear in the data))
    Screen Shot 2020-08-10 at 13 17 12
    The standard unit for insulin is in units... and it seems the unit used in AUMC is miligrams. Any thoughts on how to best convert from mg to units?
  • basophils (itemid 14256) and lymphocytes (itemid 14258)
    Screen Shot 2020-08-10 at 13 21 37
    So the target unit used in most datasets is in fact % (or proportion). The unit used in AUMC is "10^9/L". Is there any hope of converting between the two? Or do you think that I would need all cells counts at the same time to convert between these two ways of measurement?

Thank you in advance!

Best,
Drago

ventilatory mode

Do you have more information in the dataset under listitems that gives data on parameters besides ventilatory mode? E.g., ventilator settings, blood/gas parameters, anesthesia? And permission to access this data requires one to follow instructions here, correct? Thanks for your help!

ICD codes

Does AmsterdamUMCdb contain diagnostic ICD codes (preferably ICD-9, as this is what MIMIC contains)? I could not find anything, but this may be down to my Dutch language skills. I could find some "ICD" items in processitems, but I assume this has nothing to do with ICD as in "International Classification of Diseases", right?

Building SOFA on AUMC

Hi,

We are trying to include the aumc dataset in our ricu R-package which currently loads the MIMIC, eICU and HiRID data simultaneously. To this end, we are also trying to construct SOFA scores at continuous time points on the dataset.

There are several questions that came up and I think it would be great if you could help us out with:

  • What would be the best way of determining whether a patient is mechanically ventilated (the fact that things are in dutch makes figuring this out a bit complicated)
  • I found various different item IDs which do not appear in the drugitems table. For instance, norepinephrine has ID 8676, but the following query
> subset(aumc$drugitems, itemid == 8676)
Empty data.table (0 rows and 31 cols): admissionid,orderid,ordercategoryid,ordercategory,itemid,item...

shows there is apparently no norepinephrine data in the table. The drugitems table for me has 4 907 269 rows... is it possible that this is not the complete table?

  • Are the Glasgow Coma Scale (GCS) scores available in the dataset? Also, what do you think would be the best way of determining whether a patient is sedated?

Thank you in advance.

Best,
Drago

Negative Date of Death?

There are a small number of rows where the dateofdeath is negative (suggesting that they died before their first admission?):
SELECT COUNT(*) FROM admissions WHERE dateofdeath < 0
269

Is this a data error? Am interpreting the field incorrectly?

How do I obtain the patient's diagnosis?

How would I obtain a patient's diagnosis information in the dataset? I'm looking for patients with AMI. I couldn't directly find any diagnosis columns, and running searches (e.g., for "myocardial") on the listitems did not yield any results. If there isn't a "diagnosis" table/column I can directly look into, could you suggest some ways to obtain this cohort?

confused numeric items

Hi, I am working on query some of laboratory test items, but I am not sure whether some of those are the same things or not. This is because I am not familar with this database, on the other hand, some of context in the table are not in pure English which brings some trouble. I have to explore with guess with correlation and count number. Here are what I have found not not sure, could you please help me identify the correct items. If I miss something, please let me know.

laboratory test items itemid label in AUMC count similar items
albumin 9937 Alb.Chem (bloed) 104659
creatinine 9941 Kreatinine (bloed) 197746 Kreatinine
glucose 9947 Glucose (bloed) 825213
platelet 9964 Thrombo's (bloed) 215873
white blood count 9965 Leuco's (bloed) 192600
bicarbonate 9992 Act.HCO3 660773
lactate 10053 Lactaat (bloed) 182612
sodium 9924 Natrium (bloed) 228315 Na (onv.ISE) (bloed)
potassium 9927 Kalium (bloed) 222598 K (onv.ISE) (bloed)
hemoglobin 9960 Hb (bloed) 217560 Hb (onv.ISE) (bloed)
Blood Urea Nitrogen Not found

Many thanks!

language problem of the database!

DANS only offered the version that is partly in Dutch and partly in English. If there is another version completely in English?
Thank you very much!

SOFA platelet calculation

In sofa.ipynb, the platelet score is calculated as follows:

#calculate SOFA coagulation score:
sofa_platelets.loc[:,'sofa_coagulation_score'] = 0
sofa_platelets.loc[(sofa_platelets['value'] < 150) & 
                     (sofa_platelets['value'] >= 100), 'sofa_coagulation_score'] = 1
sofa_platelets.loc[(sofa_platelets['value'] < 100) & 
                     (sofa_platelets['value'] >= 50), 'sofa_coagulation_score'] = 3
sofa_platelets.loc[(sofa_platelets['value'] < 50) & 
                     (sofa_platelets['value'] >= 20), 'sofa_coagulation_score'] = 3
sofa_platelets.loc[(sofa_platelets['value'] < 20), 'sofa_coagulation_score'] = 4

Platelet values between 50 and 100 are now awarded a value of 3, but shouldn't this be a value of 2?

Recordings labelled as "(Set)"

Some numeric recordings which are not lab results have an addon "(Set)". These look to me as if they are all ventilator settings. Could someone please confirm if this is a correct interpretation?

Issues extracting zip Archive

Is it just me or are other people also having problems extracting the zip archive distributed via the filesender instance at https://filesender.surf.nl? On macOS 10.15.6, using zipinfo v3.00 I get

❯ zipinfo AmsterdamUMCdb-v1.0.2.zip
Archive:  AmsterdamUMCdb-v1.0.2.zip
Zip file size: 9143127113 bytes, number of entries: 7
warning [AmsterdamUMCdb-v1.0.2.zip]:  4848159318 extra bytes at beginning or within zipfile
  (attempting to process anyway)
error [AmsterdamUMCdb-v1.0.2.zip]:  start of central directory not found;
  zipfile corrupt.
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)

I'm sorry if I'm reporting this issue in the wrong place and I'm happy to be redirected.

ITEM dictionary required

Hi, I would suggest to build a dictionary of items for numericitems like MIMIC does. In current settings, it takes a long time to search and locate correct labtest items.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.