The role of generative data models for predictive and causal inference, with examples from electronic health records data

12-1 pm

PSTC Seminar Room 205, Mencoff Hall

Joe Hogan, Professor of Public Health at Brown University

A large part of data science is concerned with generating either predictive or causal inferences from observed data. In the context of electronic health records, using individual level features to flag those at risk for an adverse outcome is predictive inference.  Analyses designed to quantify impact of a treatment or intervention is causal inference. Although the goals of predictive and causal inference are different, each can be based on the same generative data model. In this talk I will explain what a generative model is and why it can be helpful for drawing principled inferences from large and complex data. I will also argue that conceptualizing inferences in terms of a generative model helps to bring focus back to fundamental statistical questions such as bias, uncertainty quantification and generalizability -- all of which are central to drawing principled inferences -- and away from issues about technical implementation (e.g., which machine learning algorithm or causal inference method to use). To keep the discussion grounded, I will illustrate with examples that use data from a large HIV care program in western Kenya.

Joseph Hogan is Carole and Lawrence Sirovich Professor of Public Health, and Professor and Chair in the Department of Biostatistics  at Brown.  His research concerns the development and application of statistical methods for large-scale observational data.  He is interested in causal inference, missing data, and quantifying uncertainty associated with untestable assumptions. Nearly all of his work is motivated by applications in HIV/AIDS and infectious disease.  For the past several years he has co-led an NIH-funded international training program designed to build research capacity in biostatistics at Moi University in Kenya.