Demo and assignment materials for the AI4PH course: Developing and Deploying Transparent and Reproducible Algorithms for Public Health.
AI4PH_example.Rmd
: The demo code that will be reviewed in class, which includes building and evaluating a logistic regression (lr) model on astroke
dataset usingtidymodels
, and calling theplumber
script to generate an API. The students are recommended to run through this file themselves to make sure that their R environment is set-up correctly, all required packages have been installed successfully, and get familiar with the data,tidymodels
, andplumber
.
- Read in:
train_data.rds
(included, the harmonized train data set) - Produce:
stroke_lr_workflow.rds
(not included, the trained workflow object including recipes and fitted lr model)
AI4PH_assignment.Rmd
: The assignment file. In this assignment, the student will validate thestroke
model we developed in class (AI4PH_example.Rmd
) using a different dataset:valid_data.rds
. You will run into issues using this dataset as it is because this is a raw dataset without data harmonization, which means that some variables in this dataset are different from the harmonized dataset we used to train and evaluate the model. Your job here is to harmonize the validation data so that it's in the same format as the example data we used in class (see line 79-83 in this file). You can refer totrain_data_variables.csv
to see the format in the harmonized train data.
- Read in:
valid_data.rds
(included, the unharmonized validation set)stroke_lr_workflow.rds
(not included, this is generated by runingAI4PH_example.Rmd
)
- Reference:
train_data_variables.csv
(metadata of the train set) - Produce:
harmonized_valid_data.rds
(the harmonized validation set)
stroke_lr_plumber.R
: Theplumber
script, which will be used in both in class demo and assignment, no modification needed.
- Read in:
stroke_lr_workflow.rds
(not included, this is generated by runingAI4PH_example.Rmd
)
train_data.rds
: the harmonized train data set, used inAI4PH_example.Rmd
to train the model.valid_data.rds
: the un-harmonized validation data set, used inAI4PH_assignment.Rmd
.
train_data_variables.csv
(data dictionary of the train set). You’ll also use this to help you harmonize the validation data in the assignment.
To submit your work on the assignment, please send us the assignment file with your code and all the output, you can find our emails on Canvas. Please rename the file as AI4PH_assignment_YourName.Rmd
. E.g. my name is Juan Li and I will rename my submission as AI4PH_assignment_JuanLi.Rmd
.
If you have any question, you are encouraged to join the office hour on Feburary 19th.