Data science lives at the intersection of statistics, computational sciences, and domain matter knowledge. The Center for Statistical Sciences is heavily invested in health data science through a variety of projects in areas such as computational biology, machine learning, Bayesian statistics, network analysis, causal inference with big data, and analysis of neuroimaging data. The Department of Biostatistics is one of four core departments in Brown's Data Science Initiative. Dr. Hogan serves as the Deputy Director of the Initiative, Dr. De Vito is currently a member of the Data Science Executive Committee, and Dr. Eloyan is a member of the DSI Campus Advisory Board.
Data Fusion
In many applications in public health, medicine and social science, patient characteristics are dispersed over multiple files, platforms, and/or studies. Analysis that links two or more separate data sources is increasingly important as researchers seek to integrate administrative and clinical datasets while adapting to privacy regulations that limit access to unique identifiers. Dr. Gutman has developed novel Bayesian procedures to link units that appear in two datasets by treating the unknown linking as missing data. He is collaborating with health services researchers and clinicians to estimate the effects of policies and interventions as well as predict health outcomes from clinical and demographic variables. Also Dr. De Vito has developed novel statistical techniques to integrate multiple studies in one task, to concurrently estimate common characteristics shared among all the studies and study-specific component.
Social network analysis
Statistical and causal inference problems routinely assume that subjects in data are independent of one another. However, this assumption is easily violated when subjects are interacting with others through network ties in a large, high-dimensional dataset.Groups in the Center for Statistical Sciences have been developed new approaches that would be valid even though subjects are interconnected with others. Applications of the new methods vary in diverse fields including HIV, alcohol and substance use research, and neuroimaging networks. Furthermore, we are working on how to utilize network interactions from diverse sources of dataset to improve overall public health outcomes.
Causal inference and big data
Causal inference problems are often challenged by complexities in data from different sources, such as massive online experiments or electronic medical records. To unravel the causal relationships buried in a large data set, we
-
establish identification conditions needed for causal identification,
-
develop nonparametric methods to estimate meaningful causal quantities flexibly, and
-
deliver impactful causal implications for public health from big data.