Constructing Models to Deal with Missing Data | SciPy 2016 | Deborah Hanus
Most scientists carefully collect data and select data resources. In a perfect world, we would have pristine, complete datasets. Yet, we are frequently challenged by incomplete and missing data. We are often taught to "ignore" missing data. In practice, however, ignoring the wrong types of data may build biases into our datasets, invalidating our conclusions. Here, we discuss three types of missing data (data missing completely at random, missing at random, and missing not at random) and heuristics for identifying and dealing with each type. Then we delve into an example, where we impute missing data for a simulator that utilizes reinforcement learning to predict effective HIV treatments. When we finish, you will know how to identify each of the three types of missing data and how to deal with each in your own projects.