In 1854 there was a massive cholera outbreak in Soho, London. In three days, more than 120 people died from the disease.
Everyone thought that cholera was spreading through air. However, John Snow, an english doctor, was convinced that it was spreading via very small microbes in the water supply. But how could he prove that?
Like most good researchers, he turned to visualization to help him unlock the clues. He made a map of London, marking the cholera deaths as they added up.
Thanks to an outlier in his data, he found out that the origin of the epidemic was the contaminated water from the Broad Street water pump.
John Snow's analysis is considered to be the first epidemiological and geographical analysis of disease data. To honor him, today you are going to perform an analysis on cholera.
The main questions that you need to investigate are:
- Does cholera cause dehydration?
- Is the gastric acid index correlated to feces consistency and vomit color?
Feel free to ask yourself more questions and analyze the data any way you find convenient.
To answer the questions above, you are provided with this dataset.
You can find the description of each dataset feature here.
You have 3 hours to complete your analysis. Ideally, your schedule should be:
Duration | Task |
---|---|
30m | Organization & Brainstorming |
1h | Data Cleaning |
1h30m | Data Analysis |
After the time limit, you will need to present your findings in a 10-minute presentation. The only material you can use as support for your presentation is a clean notebook with your analysis. No slides are allowed.