Data Science Campus - A study guide for reproducible research using secondary (health care) data
This study guide is a resource for graduate students, PhD candidates, and researchers performing applied empirical research in economics and management sciences. The guide is meant for the field of analysis of health care markets using secondary data, that means data that is not originally collected or generated by the researcher for the purpose of the study. Secondary data covers any existing data generated by companies, institutions, and individuals. Many textbook examples use readily available datasets for analysis of econometric problems. For students generating their own analysis dataset, important steps that lead to a final analysis dataset are often missing. Additionally, many resources focus on labor economics problems. Resources that consider processing and generating secondary data are scarce. One reason is that these data sources are often subject to confidentiality and data protection issues.
This guide explains the five major steps needed to create a reproducible research project. We introduce important terminology, highlight relevant tasks, and provide key resources in the form of textbooks and websites available via open access. We provide a concise guide that users can easily access when starting academic research. Each section takes about 10 to 15 minutes to read. We do not cover any specific data science or econometric method, but point to the relevant resources. Guide users should have basic knowledge in statistics, econometrics and program evaluation, as well as in statistical packages such as R or Stata. For maximum benefit readers should have background knowledge and a research idea in mind.
The goal is to set up and carry out a data science project using secondary data. Students will learn all steps starting with hypothesis formulation, data generation and analysis, and presentation of empirical results.
After reading and applying the principles introduced in this study guide, you will be able to:
- Recognize the features of using secondary (health care) data in empirical research.
- Execute the steps of a reproducible research project.
- Implement an empirical research project.
- Recall the steps taken to execute a reproducible research project using secondary data.
The study guide consists of five chapters that include the essential steps of a reproducible research project. Each step is covered in three parts.
- An introduction to the basic concepts and key terminology.
- A resources box that includes textbooks, articles and references to current web resources with emphasis on open access material.
- A checklist for each step of the reproducible research project to follow.
- A showcase example of an empirical project reproduced based on the article of Hellerstein, Judith K. 1998. “The Importance of the Physician in the Generic versus Trade-Name Prescription Decision.” The RAND Journal of Economics 29 (1): 108–36. https://doi.org/10.2307/2555818.