Statistical Theory and Methods

Bioinformatics research includes the development and application of novel statistical methodology for analyzing coChrimplex biological data typically at a molecular level (nucleic acid, proteins and metabolites), often referred to as –omics data. The methods development includes: data preprocessing for obtaining more accurate and precise measurements from new technologies; identifying biomarkers associated with various phenotypes of interest; identifying the interaction between genetic components; identifying the interaction between genetic and environmental factors in development, disease etiology and evolution; discovering biological networks and their dynamics in the biological systems ranging from single cell to large human populations.  Examples of bioinformatics research at the Center include novel methods for analyzing -omics data including genome, epigenome and transcriptome, as well as collaborative projects involving bioinformatics in cancer, evolution, aging and development.  

Lorin Crawford

Roberta DeVito

Fenghai Duan

Zhijin Wu

BACK TO THEORY AND METHODS

 

 

In diagnostic medicine, biomarker evaluation is the development and application of statistical methods to assess the diagnostic accuracy and predictive values of biomarkers. Prof. Gatsonis has contributed extensively to statistical methods for the evaluation of diagnostic tests and biomarkers. He has published on methodology for ROC analysis for detection and prediction and on broader issues of study design in diagnostic test and imaging biomarker evaluation and validation. Prof. Duan focuses on the development and application of statistical methods to evaluate the performance of various high-throughput biomarkers in clinical cancer studies (e.g., radiogenomics).

Fenghai Duan

Constantine Gatsonis

BACK TO THEORY AND METHODS

Randomized clinical trials are the gold standard for estimating the effects of interventions. However, in many studies in medicine, epidemiology and public health, randomized trials may suffer from unintended complications or are infeasible because of financial, ethical or logistical considerations. For example, randomized trials may suffer from complications, such as low adherence to the intervention, modification of the intervention over time, and unequal follow-up time. Dr. Gutman has been developing statistical methods to address these issues. Another example arises when investigating the effects of interventions in non-randomized settings. These interventions were not randomized and the assignment to intervention is confounded with the outcomes. Dr. Hogan, Dr. Liu and Dr. Gutman are developing statistical methods to estimate the effect of interventions in observational studies while adjusting for confounding. 

Stavroula Chrysanthopoulou

Roee Gutman

Joseph Hogan

Tao Liu

George Papandonatos

Christopher Schmid

 

 
BACK TO THEORY AND METHODS

 

 

In many applications in public health, medicine and social science, patient characteristics are dispersed over multiple files. Analysis that links two or more separate data sources is increasingly important as researchers seek to integrate administrative and clinical datasets while adapting to privacy regulations that limit access to unique identifiers. Dr. Gutman has developed novel Bayesian procedures to link units that appear in two datasets by treating the unknown linking as missing data. He is collaborating with health services researchers and clinicians to estimate the effects of policies and interventions as well as predict health outcomes from clinical and demographic variables.   

Roee Gutman

BACK TO THEORY AND METHODS

Data science lives at the intersection of statistics, computational sciences, and domain matter knowledge. The Center for Statistical Sciences is heavily invested in health data science through a variety of projects in areas such as computational biology, machine learning, and analysis of neuroimaging data. The Department of Biostatistics is one of four core departments in Brown's Data Science Initiative, and Dr. Hogan serves as the Deputy Director of the Initiative.

Roberta DeVito

Stavroula Chrysanthopoulou

Lorin Crawford

Fenghai Duan

Constantine Gatsonis

Joseph Hogan

Tao Liu

Christopher Schmid

Jon Steingrimsson

Zhijin Wu

 

 
BACK TO THEORY AND METHODS

 

Loosely speaking, deep learning is a branch of machine learning that uses multi-layer neural networks to create prediction models. The unknown parameters, commonly referred to as weights, are estimated by minimizing a loss function often subject to some form of regularization. Deep learning has shown promise in many domains, with imaging based analysis being a standout example of an application area in which deep learning based models excel. Several Center members are working on deep learning related research that includes analyzing medical images, quantifying uncertainty and interpretability of deep learning models.

Lorin Crawford

Fenghai Duan

Constantine Gatsonis

Tao Liu

Christopher Schmid

Jon Steingrimsson

BACK TO THEORY AND METHODS

With the rapid advancement of technology, data collection procedures have continuously improved. This has created a crucial need for statistical methods that can handle massive (and often noisy) data sets in many application areas. Center faculty have been at the forefront of developing theory and software that address key challenges when working in such high-dimensional settings. This broadly includes, but is not limited to dealing with missing data, finding scalable solutions for estimating model parameters, overcoming combinatorial issues when trying to identify nonlinear interactions, effectively modeling non-continuous outcomes (e.g. categorical data), and quantifying uncertainty with novel model validation/calibration techniques.

Stavroula Chrysanthopoulou

Lorin Crawford

Roberta DeVito

 

Fenghai Duan Zhijin Wu    
BACK TO THEORY AND METHODS

Latent variable models link observed (or manifest) variables to unobserved (or latent) constructs. They comprise of two parts: a measurement model specifying the relationship between manifest and latent variables, and a structural model delineating the relationships among the latent variables themselves. Both the manifest and the latent variables can be either discrete or continuous in nature. When both are continuous, one obtains the factor analytic models used widely in psychology, e.g., to measure latent constructs such as human intelligence. When both are discrete, one obtains the latent class models used to categorize observations into distinct groups, e.g., to classify individuals into diseased vs. non-diseased according to their constellation of symptoms. Widely used in educational testing are Item Response Theory models (also known as Latent Trait models) that relate a group of categorical manifest variables to a continuous latent variable, e.g., using answers to a multiple choice test to measure mastery of a particular academic subject. Finally, finite mixture models  (also known as Latent Profile Analysis) relate a set of continuous manifest variables to underlying categorical constructs, e.g., by partitioning clinical trial participants into homogeneous groups across behavioral and cognitive dimensions of engagement with physical activity interventions. Originally developed for cross-sectional data, latent variable models have recently been generalized to longitudinal data. For example, Latent Transition Analysis has been used to model movement across stages of change in studies of smoking cessation. An example of latent variable modeling by our faculty is given by the 2-parameter logistic IRT models fit to the DSM-IV criteria for nicotine dependence by Dr. Papandonatos and his students. They uncovered a 2-dimensional structure with two positively correlated latent factors, thus contradicting conventional wisdom that DSM-IV symptoms measure a single dimension of liability to nicotine dependence.

Stavroula Chrysanthopoulou

Lorin Crawford

Roberta DeVito

Roee Gutman

George Papandonatos

Christopher Schmid

Zhijin Wu

 
BACK TO THEORY AND METHODS

 

Longitudinal data are repeated measurements collected on each unit at several time points during the observation period. Special statistical methods (e.g., Generalized Estimating Equations (GEE) and Mixed-Effects models) are required to analyze these non-independent data to appropriately adjust for the underlying covariance and correlation structures. Multivariate statistical methodology involves detecting, analyzing, and characterizing associations among multidimensional data. Related supervised or unsupervised techniques are mainly concerned with the dimension reduction of a system. Center faculty conduct extensive research on novel statistical techniques for analyzing longitudinal and multivariate data.

Stavroula Chrysanthopoulou Roberta DeVito Joseph Hogan
George Papandonatos Christopher Schmid  
BACK TO THEORY AND METHODS

 

Center faculty are leaders in the development and application of methods for meta-analysis, the quantitative combination of results from different studies. Prof. Gatsonis has pioneered the use of hierarchical summary ROC curves for assessing sensitivity and specificity and is developing methods for summarizing the predictive accuracy of diagnostic tests. Prof. Schmid works with the Center for Evidence Synthesis in Health on methods for multivariate meta-analysis, network meta-analysis and software. He heads the Evidence Synthesis Academy whose aim is to promote the wider use and understanding of meta-analysis among decision makers.

Roberta DeVito

George Papandonatos

Christopher Schmid

Thomas A. Trikalinos

BACK TO THEORY AND METHODS

Health Services Research (HSR) is a field in public health that investigates the how social factors, policies, insurance systems, organizational structures and processes, health technologies, and personal behaviors influence access, quality and cost of health care. When HSR studies involves comparison of interventions, these studies are sometimes referred to as comparative effectiveness studies. Studies in HSR commonly involve data sources that are not collected for research purposes (e.g. claims, electronic health records (EHR), etc.). Thus, they may suffer from complexities that are mitigated in well-designed prospective studies. For example, EHR may suffer from missing values because patients change their providers. Identifying patient chronic condition may be recorded inaccurately in claims data when a condition does not affect payments. Statistical methods to address these complexities attempt to obtain accurate and precise estimates. Dr. Hogan, Dr. Gatsonis, Dr. Liu, Dr. Steingrimsson and Dr. Gutman develop various methods to address different complexities that arise in such studies.

Stavroula Chrysanthopoulou

Ilana Gareen

Roee Gutman

Christopher Schmid

BACK TO THEORY AND METHODS

Statistical methodology research on HIV/AIDS spans a broad spectrum and includes statistical causal inference (e.g. causal pathway analysis of HIV intervention involving behavioral changes);  statistical/machine learning methods (e.g. super-learning for risk modeling of treatment failure and prediction); statistical modeling of the treatment continuum; clinical decision making for optimizing HIV treatment in resource limited settings; micro-simulation modeling; etc.  Professors Hogan, Liu, and Chrysanthopoulou’s collaborative and methodological research has secured rich research fund from NIAID, NIAAA, NIAID, NHLBI, NICHD, USAID, etc.

 

Stavroula Chrysanthopoulou

Joseph Hogan

Tao Liu

Jon Steingrimsson

BACK TO THEORY AND METHODS

Missing data are unavoidable in many studies, especially those that collect information on humans. Failure to address missing data may result in misleading conclusions. A dataset may contain missing values for a variety of reasons. For example, survey respondents may refuse to answer questions of a sensitive nature or patients participating in longitudinal studies may drop out before its conclusion. Center faculty have been at the forefront of developing statistical methods to handle missing data. Specifically, Prof. Hogan has done significant work on missing data in longitudinal studies and sensitivity analysis. Prof. Gutman has developed various imputation methods for application in health services research.

Stavroula Chrysanthopoulou

Roee Gutman

Joseph Hogan

Tao Liu

Christopher Schmid

Jon Steingrimsson
BACK TO THEORY AND METHODS

 

Predictive models have been broadly used in medical decision making for cost-effectiveness analysis, comparative effectiveness research, etc. Recent advancements in computing technology have facilitated the development of increasingly intricate predictive models aimed at describing complex health processes and systems. Depending on the specific characteristics there is a large variety of complex predictive models including but not limited to state transition, discrete event simulation, dynamic transmission, compartmental, microsimulation, and agent-based models. Center faculty have extensive expertise in complex predictive modeling. Dr Chrysanthopoulou specializes in statistical techniques for calibration, validation, and predictive accuracy assessment of microsimulation models. She has developed the open-source MIcrosimulation Lung Cancer (MILC) model of the natural history of lung cancer, and is involved in collaborative projects with Brown and Boston University on building complex models for cancer, dementia, opioid use disorder, and sexually transmitted diseases.

Stavroula Chrysanthopoulou

Joseph Hogan

Christopher Schmid

Thomas A. Trikalinos

BACK TO THEORY AND METHODS

 

N-of-1 trials are randomized multi-crossover experiments conducted on a single individual in order to determine the personalized relative efficacy of two or more treatments measured repeatedly over time. Prof. Schmid is developing time series and multilevel methods and software for the meta-analysis of a series of N-of-1 trials in order to obtain more informative estimates of individual treatment effects as well as population average treatment effects. His team serves as the analytic hub for several large national clinical trial series. It is collaborating with Computer Science to develop a mobile app that can flexibly setup, run and analyze and interpret data from one or more N-of-1 trials.

Christopher Schmid
BACK TO THEORY AND METHODS

We develop novel statistical models and methods for analyzing spatio-temporal data. Our research is primarily motivated by inter-disciplinary collaborations from researchers in neuroscience, psychiatry, epidemiology and public health. The common statistical themes in our research are spectral methods using localized waveforms, dimension reduction, spatio-temporal covariance modeling and Bayesian hierarchical models.

Matthew Harrison

BACK TO THEORY AND METHODS

Statistical Learning is a framework under the broad umbrella of Machine Learning that uses techniques from functional analysis to understand data. Statistical learning is often divided into two common categories: (i) supervised, and (ii) unsupervised learning. Briefly, supervised learning involves building a predictive model based on some response or outcome of interest; while, unsupervised learning learns about relationships and data structure without any supervising output variable. Many faculty in the Center are developing novel statistical learning approaches to tackle specific public health related problems. Some of these areas include: artificial neural networks for medical imaging, anomaly detection methods for clinical trials, online learning techniques for real-time clinical prognostics, and dimensionality reduction and structured prediction models in genome-wide association studies.

Lorin Crawford

Roberta DeVito

Fenghai Duan

Constantine Gatsonis

Matthew Harrison

Joseph Hogan

Tao Liu

Christopher Schmid

Jon Steingrimsson

Zhijin Wu    
BACK TO THEORY AND METHODS

Survival analysis is the branch of statistics that deals with analyzing data when the outcome of interest is time to some event, such as death or disease progression. Such outcomes are commonly only partially observed due to participants dropping out of the study or not having experienced the event of interest at the end of study (referred to as censoring). Dr. Steingrimsson works on developing adaptations of machine learning algorithm that can handle the complications arising from time-to-event outcomes. Dr. Chrysanthopoulou works on statistical approaches for predicting time-to-event data, with applications to complex predictive models (e.g., microsimulation models) used in Medical Decision Making and Comparative Effectiveness Research. In addition, several faculty members are involved in interdisciplinary collaborations that involve analysis of time-to-event outcomes. 

Stavroula Chrysanthopoulou
BACK TO THEORY AND METHODS

Topological data analysis (TDA) visualizes the “shape” of data from the spatial connectivity between discrete points. Prof. Crawford and his lab group use TDA to summarize complex patterns that underlie high-dimensional biological data. They are particularly interested in the “sub-image” selection problem where the goal is to identify the physical features of a collection of 3D shapes (e.g., tumors and single cell formations) that best explain the variation in a given trait or phenotype. Actively collaborating with faculty in the School of Engineering and the Robert J. & Nancy D. Carney Institute for Brain Science, the Crawford Lab works to develop unified statistical frameworks that generalize the use of topological summary statistics in 3D shape analyses. Current application areas include: radiomics with clinical imaging of brain-based diseases, molecular biology with 3D microscopy of cells, and anthropology with computed tomography (CT) scans of bones.

Lorin Crawford
BACK TO THEORY AND METHODS