Join Advance-CTR and the Data Science Initiative at Brown for this 5-part series exploring machine learning, its methodology, and application in biomedicine and health. The purpose of this series is to serve as an introduction to machine learning for researchers, clinician scientists, and others who may be interested in using these methods in their research.
Friday, October 8, 2021:
Roberta De Vito, PhD: “Cross-Study Machine Learning Techniques: Reproducibility and Differences Across Studies”
Biostatistics and computational biology are increasingly facing the urgent challenge of efficiently dealing with a large amount of experimental data. In particular, high-throughput assays are transforming the study of biology, as they generate a rich, complex, and diverse collection of high-dimensional data sets. Through compelling statistical analysis, these large data sets lead to discoveries, advances and knowledge that were never accessible before, via compelling statistical analysis. Building such systematic knowledge is a cumulative process which requires analyses that integrate multiple sources, studies, and technologies. The increased availability of ensembles of studies on related clinical populations, technologies, and genomic features poses four categories of important multi-study statistical questions: 1) To what extent is biological signal reproducibly shared across different studies? 2) How can this global signal be extracted? 3) How can we detect and quantify local signals that may be masked by strong global signals? 4) How do these global and local signals manifest differently in different data types? We will answer these four questions by introducing a novel class of methodologies for the joint analysis of different studies. The goal is to separately identify and estimate 1) common factors reproduced across multiple studies, and 2) study-specific factors. We present different medical and biological applications. In all the cases, we clarify the benefits of a joint analysis compared to the standard methods. Our method could accelerate the pace at which we can combine unsupervised analysis across different studies, and understand the cross-study reproducibility of signal in multivariate data.
About the Speaker
Roberta De Vito is a statistician with a passion for teaching and developing statistical tools for cancer research and disorder risk, with particular focus on epidemiology and genomics. Currently, she is Assistant Professor in the department of Biostatistics and at the Data Science Initiative at Brown University. She completed her Ph.D. in Statistical Science at the University of Padua, advised by Giovanni Parmigiani at Harvard University where she developed her thesis work. The main research interest is latent variable model, Bayesian non parametric, variable selection via sparsity prior, machine learning and big data with particular focus on genomics and epidemiology. She was a postdoc at Princeton University in Barbara Engelhardt’s group where she developed Bayesian and latent variable discrete model in high-dimensional biological and epidemiological data. Her passion for teaching developed at Princeton University where she taught some classes and had the opportunity to mentor Master and PhD students. Some of her previous mentees are now pursuing successful research careers in biostatistics and data science also across Ivy League universities, like Harvard University, Princeton and MIT. Her website https://rdevito.github.io/web/ provides complete details.