Hello!
I am trying to reproduce some of the results from your paper. In particular, I am interested in getting a plot like the one below to find out what combination of modalities justifies a multimodal approach compared to a visual only.
![haim_roc](https://user-images.githubusercontent.com/98698362/230433663-516c04c4-fb6c-44d0-b7a9-0a3acd69029f.png)
For example, for fracture, the smallest data, I was able to get a 5-fold cross-validation test average macro AUROC of about 0.78 for the unimodal model (fusing per-image and multi-image dense visual embeddings), but when I add new (and less informative) modalities to it, the results stay almost the same (somewhere getting a bit better, somewhere a bit worse). Perhaps because XGBoost handles the curse of dimensionality well. Since the number of combinations of input modalities is high,1023, I only tested a subset, but could not get close to 0.84 in average macro AUROC.
Could you please share the supportive information about the plot above like what combination of modalities is considered typical?
Also about the number of experiments performed in the article.
I understand how you got 1023 as the number of possible models for pathology diagnosis tasks. 1023 = Number of models of 1 modality + Number of models of 2 modality + Number of models of 3 modality + Number of models of 4 modality.
Where the number of models of 1 modality is calculated based on the number of combinations of the corresponding sources:
Tabular: 1
Time series: C(3, 1) + C(3, 2) + C(3,3) = 3 + 3 + 1 = 7
Notes (excluding radiology): C(2,1) + C(2,2) = 2 + 1 = 3
Visual: C(4,1) + C(4,2) + C(4,3) + C(4,4) = 4 + 6 + 4 + 1 = 15
Total: 26
And so on, up to 4 modalities. I also get a total of 1023 experiments.
However, I don't get the same number of experiments for the 48-hours length of stay and mortality prediction tasks, for which the difference is that radiology notes are included.
Could you please explain how you get 2047(2046)?
Thank you!