Seminar Archive

  • Sep
    20

    Dr. Amanda Meija, Assistant Professor, Department of Statistics, Indiana University

    Using empirical population priors to provide accurate subject-level insights into functional brain organization through template ICA

    Abstract: A primary objective in resting-state fMRI studies is localization of functional areas (i.e. resting-state networks) and the functional connectivity (FC) between them. These spatial and temporal properties of brain organization may be related to disease progression, development, and aging, making them of high scientific and clinical interest. Independent component analysis (ICA) is a popular tool to estimate functional areas and their FC. However, due to typically low signal-to-noise ratio and short scan duration of fMRI data, subject-level ICA results tend to be highly noisy and unreliable. Thus, group-level functional areas are often used in lieu of subject-specific ones, ignoring inter-subject variability in functional topology. These group-average maps also form the basis for estimating FC, leading to potential bias in FC estimates given the topological differences in underlying functional areas. An alternative to these two extremes (noisy subject-level ICA and one-size-fits-all group ICA) is Bayesian hierarchical ICA, wherein information shared across subjects is leveraged to improve subject-level estimation of spatial maps and FC. However, fitting traditional hierarchical ICA models across many subjects is computationally intensive. Template ICA is a computationally convenient hierarchical ICA framework using empirical population priors derived from large fMRI databases or holdout data. Template ICA produces more accurate and reliable estimates of subject-level functional areas compared with popular ad-hoc approaches. The flexible Bayesian framework also facilitates incorporating other sources of a-priori information. In this talk, I will describe the template ICA framework, as well as two extensions to the baseline model: the first incorporates spatial priors to leverage information shared across neighboring brain locations, and the second incorporates empirical population priors on the FC between functional areas. I will also present recent findings from a study of the effects of psilocybin (the prodrug compound found in “magic mushrooms”) on the organization of the thalamus.
    Bio: Mandy Mejia is an assistant professor in the Department of Statistics at Indiana University. Her research aims to develop statistical techniques to extract accurate individual insights from functional MRI data, which is noisy, big and complex. Her group pursues this goal in three primary ways: (1) developing computationally efficient Bayesian techniques, which leverage information shared across space and across individuals to produce more accurate estimates at the individual level; (2) developing statistically principled noise-reduction techniques, and (3) analyzing data on the cortical surface and subcortical gray matter to facilitate spatial modeling and improve inter-subject alignment. Her group has developed several software tools to facilitate cortical surface and Bayesian analysis of fMRI data in R.
    More Information Biology, Medicine, Public Health, BioStatsSeminar, Graduate School, Postgraduate Education, Mathematics, Technology, Engineering, Research, Training, Professional Development
  • Sep
    13
    Virtual and In Person
    12:00pm - 12:50pm

    Statistics Seminar Series | Dr. Alyssa Bilinski

    Dr. Alyssa Bilinski, Brown University, Department of Health Services, Policy & Practice

    O Decision Tree, O Decision Tree: Interpretable classification metamodels for health policy (w/Nicolas Menzies, Jeffrey Eaton, John Giardina, and Joshua Salomon)

    Over the past decade, researchers have developed a rich set of metamodeling techniques for complex decision analytic models. These create parsimonious model emulators, improving the tractability of computationally-intensive analyses. However, such techniques typically focus on reproducing a full model, requiring high fidelity to the full space of parameters and outcomes, and can be difficult to interpret. In this paper, we use decision tree classifiers to create metamodels of policy-important binary outcomes. We first detail methods to fit and test classifiers optimizing out-of-sample performance, to upsample strategically in regions of high uncertainty, and to develop and test interpretable decision rules for policymakers. We apply these to a previously published agent-based simulation model of COVID-19 transmission in schools, with >99% out-of-sample predictive validity and minimal training data requirements. We compare the identified decision rules to those proposed by policymakers and to output from alternative metamodels. Our approach can reduce the computational and analytic burden of creating a metamodel, optimize performance for decisions of interest and comparability across models, and provide interpretable, easy-to-update summaries for policymakers

    More Information Biology, Medicine, Public Health, BioStatsSeminar, Graduate School, Postgraduate Education, Mathematics, Technology, Engineering, Research, Training, Professional Development
  • Hyunseung Kang, PhD

    Assistant Professor, Department of Statistics, University of Wisconsin-Madison

    Title:
    Assumption-Lean Analysis of Cluster Randomized Trials in Infectious Diseases for Intent-to-Treat Effects and Spillover Effects Among A Vulnerable Subpopulation
    Abstract:
    Cluster randomized trials (CRTs) are a popular design to study the effect of interventions in infectious disease settings. However, standard analysis of CRTs primarily relies on strong parametric methods, usually a Normal mixed effect models to account for the clustering structure, and focus on the overall intent-to-treat (ITT) effect to evaluate effectiveness. The paper presents two methods to analyze two types of effects in CRTs, the overall and heterogeneous ITT effects and the spillover effect among never-takers who cannot or refuse to take the intervention. For the ITT effects, we make a modest extension of an existing method where we do not impose parametric models or asymptotic restrictions on cluster size. For the spillover effect among never-takers, we propose a new bound-based method that uses pre-treatment covariates, classification algorithms, and a linear program to obtain sharp bounds. A key feature of our method is that the bounds can become dramatically narrower as the classification algorithm improves and the method may also be useful for studies of partial identification with instrumental variables. We conclude by reanalyzing a CRT studying the effect of face masks and hand sanitizers on transmission of 2008 interpandemic influenza in Hong Kong. This is joint work with Chan Park (UW-Madison)
    Link to Paper More Information Biology, Medicine, Public Health, BioStatsSeminar, Graduate School, Postgraduate Education, Mathematics, Technology, Engineering, Research, Training, Professional Development
  • Sudipto Banerjee, PhD
    Professor and Chair of the Department of Biostatistics
    UCLA Fielding School of Public Health

    Title: Bayesian Finite Population Survey Sampling from Spatial Process Settings

    Abstract:
    We develop a Bayesian model-based approach to finite population estimation accounting for spatial dependence. Our innovation here is a framework that achieves inference for finite population quantities in spatial process settings. A key distinction from the small area estimation setting is that we analyze finite populations referenced by their geographic coordinates (point-referenced data). Specifically, we consider a two-stage sampling design in which the primary units are geographic regions, the secondary units are point-referenced locations, and the measured values are assumed to be a partial realization of a spatial process. Traditional geostatistical models do not account for variation attributable to finite population sampling designs, which can impair inferential performance. On the other hand, design-based estimates will ignore the spatial dependence in the finite population. This motivates the introduction of geostatistical processes that will enable inference at arbitrary locations in our domain of interest. We demonstrate using simulation experiments that process-based finite population sampling models considerably improve model fit and inference over models that fail to account for spatial correlation. Furthermore, the process based models offer richer inference with spatially interpolated maps over the entire region. We reinforce these improvements and also scalable inference for spatial BIG DATA analysis with millions of locations using Nearest-Neighbor and Meshed Gaussian processes. We will demonstrate our framework with an example of groundwater Nitrate levels in the population of California Central Valley wells by offering estimates of mean Nitrate levels and their spatially interpolated maps.

    More Information Biology, Medicine, Public Health, BioStatsSeminar, Graduate School, Postgraduate Education, Mathematics, Technology, Engineering, Research, Training, Professional Development
  • Dr. Feng Liang

    Associate Professor at the Department of Statistics, University of Illinois at Urbana-Champaign

    Title: Learning Topic Models: Identifiability and Rate of Convergence

    Abstract: Topic models provide a useful text-mining tool for learning, extracting, and discovering latent structures in large text corpora. Although a plethora of algorithms have been proposed for topic modeling, little work is done to study the statistical accuracy of the estimated structures. In this paper, we propose an MLE of latent topics based on an integrated likelihood. We further introduce a new set of conditions for topic model identifiability, which are weaker than conditions that reply to the existence of anchor words. In addition, we study the estimation consistency and establish the convergence rate of the proposed estimator. Our algorithm, which is an application of the EM algorithm, is demonstrated to have competitive performance through simulation studies and a real application.

    This is based on joint work with Yinyin Chen, Shishuang He, and Yun Yang.

    More Information Biology, Medicine, Public Health, BioStatsSeminar, Graduate School, Postgraduate Education, Mathematics, Technology, Engineering, Research, Training, Professional Development
  • Marina Vannucci, PhD

    Noah Harding Professor of Statistics, Rice University

    Title: Dirichlet-Multinomial Regression Models with Bayesian Variable Selection for Microbiome Data

    Abstract:

    I will describe Bayesian models developed for understanding how the microbiome varies within a population of interest. I will focus on integrative analyses, where the goal is to combine microbiome data with other available information (e.g. dietary patterns) to identify significant associations between taxa and a set of predictors. For this, I will describe a general class of hierarchical Dirichlet-Multinomial (DM) regression models which use spike-and-slab priors for the selection of the significant associations. I will also describe data augmentation techniques to efficiently embed DM regression models into joint modeling frameworks, in order to investigate how the microbiome may affect the relation between dietary factors and phenotypic responses, such as body mass index. I will discuss advantages and limitations of the proposed methods with respect to current standard approaches used in the microbiome community, and will present results on the analysis of real datasets.

    More Information Biology, Medicine, Public Health, BioStatsSeminar, Graduate School, Postgraduate Education, Mathematics, Technology, Engineering, Research, Training, Professional Development
  • Prof. Frauke Kreuter, University of Maryland, Ludwig-Maximilian University of Munich, Germany, and Institute for Employment Research (IAB) Nuremberg, Germany

    Bio:Frauke Kreuter is Co-Director of the Social Data Science Center and Professor in the Joint Program in Survey Methodology at the University of Maryland, Professor of Statistics and Data Science at the Ludwig Maximilian University of Munich, and head of the statistical methods group at the German Institute for Employment Research in Nuremberg. Her research focuses on data quality, privacy and the combination of surveys and alternative data sources. In her work at JPSM she maintains strong ties to the Federal Statistical System, and served in advisor roles for the National Center for Educational Statistics and the Bureau of Labor Statistics. In addition to her academic work, Dr. Kreuter is the Founder of the International Program for Survey and Data Science, developed in response to the increasing demand from researchers and practitioners for the appropriate methods and right tools to face a changing data environment; Co-Founder of the Coleridge Initiative, whose goal is to accelerate data-driven research and policy around human beings and their interactions for program management, policy development, and scholarly purposes by enabling efficient, effective, and secure access to sensitive data about society and the economy coleridgeinitiative.org; and Co-Founder of the German language podcast Dig Deep. For her efforts in education innovation, she was recently awarded the Warren Mitofsky Innovators Award from the American Association for Public Opinion Research.

    Practical challenges with confidentiality and privacy in large- and small-scale data collections. Examples from the Global Facebook-UMD COVID-19 survey and German labor market studies.

    Abstract: How can we collect and analyze data about human beings without harming the privacy of the people being analyzed? This question has accelerated in importance, given the fast growing availability of data and the increased needs of making research data available. Most recently “Differential privacy” – a simple mathematical definition of when publishing results or datasets can be considered “private” in a specific sense – has gained the attention of Statistical Offices. This presentation will discuss the possible effects on public health and social science research, should data be only available in a differentially private way. We will discuss pros and cons of such implementations. We will outline different scenarios of social science data use that likely require different solutions to the privacy – data accuracy trade-off. The presentation will showcase lessons learned from privacy first approached as part of the Global Facebook-UMD COVID-19 data collection and present results from experime

    More Information Biology, Medicine, Public Health, BioStatsSeminar, Graduate School, Postgraduate Education, Mathematics, Technology, Engineering, Research, Training, Professional Development
  • Terrance Savitsky, PhD

    Research Mathematical Statistician

    Mathematical Statistics Research Center

    U. S. Bureau of Labor Statistics

    Title:  Pseudo Posterior Mechanism under Differential Privacy

    Abstract:  We propose a Bayesian pseudo posterior mechanism to generate record-level synthetic datasets equipped with a differential privacy (DP) guarantee from any proposed synthesis model. The pseudo posterior mechanism employs a data record-indexed, risk-based weight vector with weights ∈ [0, 1] to surgically downweight high-risk records for the generation and release of record-level synthetic data. The differentially private pseudo posterior synthesizer constructs weights using Lipschitz bounds for a log-pseudo likelihood utility for each data record, which provides a practical, general formulation for using weights based on record-level sensitivities that we show achieves dramatic improvements in the DP expenditure as compared to the unweighted posterior mechanism. By selecting weights to remove likelihood contributions with non-finite log-likelihood values, we achieve a local privacy guarantee at every sample size. We compute a local sensitivity specific to our Consumer Expenditure Surveys dataset for family income, published by the U.S. Bureau of Labor Statistics, and reveal mild conditions that guarantee its contraction to a global sensitivity result over the space of databases. We further employ a censoring mechanism to lock-in a local result with desirable risk and utility performances to achieve a global privacy result as an alternative to relying on asymptotics. We show that utility is better preserved for our pseudo posterior mechanism as compared to the exponential mechanism (EM) estimated on the same non-private synthesizer due to the use of targeted downweighting. Our results may be applied to any synthesizing model envisioned by the data disseminator in a computationally tractable way that only involves estimation of a pseudo posterior distribution for parameter(s) θ, unlike recent approaches that use naturally-bounded utility functions under application of the EM.   

    (Joint work with Matthew R. Williams and Jingchen Hu)


    Keywords: Differential privacy, Pseudo posterior, Pseudo posterior mechanism, Synthetic data

    For more information about the Statistics Seminar Series go here.

    More Information Biology, Medicine, Public Health, BioStatsSeminar, Graduate School, Postgraduate Education, Mathematics, Technology, Engineering, Research, Training, Professional Development
  • Nov
    16

    Juned Siddique, DrPH

    Associate Professor

    Departments of Preventive Medicine and Psychiatry and Behavioral Sciences

    Northwestern University Feinberg School of Medicine

    Bio:  Dr. Siddique’s research efforts focus on developing statistical methods for handling incomplete or missing data. He applies these methods to a range of problems including rater bias, participant dropout, data harmonization in individual participant data analysis, and measurement error. He collaborates closely with lifestyle intervention researchers and is interested in the analysis of diet and physical activity data.

    Title:  “Measurement error correction and sensitivity analysis in longitudinal dietary intervention studies using an external validation study”

    Abstract: In lifestyle intervention trials, where the goal is to change a participant’s weight or modify their eating behavior, self-reported diet is a longitudinal outcome variable that is subject to measurement error. We propose a statistical framework for correcting for measurement error in longitudinal self-reported dietary data by combining intervention data with auxiliary data from an external biomarker validation study where both self-reported and recovery biomarkers of dietary intake are available. In this setting, dietary intake measured without error in the intervention trial is missing data and multiple imputation is used to fill in the missing measurements. Since most validation studies are cross-sectional, they do not contain information on whether the nature of the measurement error changes over time or differs between treatment and control groups. We use sensitivity analyses to address the influence of these unverifiable assumptions involving the measurement error process and how they affect inferences regarding the effect of treatment. We apply our methods to self-reported sodium intake from the PREMIER study, a multi-component lifestyle intervention trial.

    For more information about the Statistics Seminar Series go here.

    More Information Biology, Medicine, Public Health, BioStatsSeminar, Graduate School, Postgraduate Education, Mathematics, Technology, Engineering, Research, Training, Professional Development
  • Thomas Jaki, PhD

    Professor of Statistics, Department of Mathematics and Statistics

    Lancaster University and     

    Programme Leader at the MRC Biostatistics Unit, University of Cambridge

     

    Bio: Thomas Jaki is a Professor in Statistics in the Department of Mathematics and Statistics at Lancaster University. His research interests include design and analysis of clinical trials, early phase drug development, personalized medicine and biostatistics.

    An Information-Theoretic Approach for Selecting Arms in Clinical Trials

     Abstract: The question of selecting the “best” amongst different choices is a common problem in statistics. In drug development, our motivating setting, the question becomes, for example: which treatment gives the best response rate or which dose of a treatment gives an acceptable risk of toxicity. In this talk I will introduce a flexible adaptive experimental design that is based on the theory of context-dependent information measures. I will show that the design leads to a reliable selection of the correct arm in the settings of Phase I and Phase II clinical trials.

    For more information about the Statistics Seminar Series go here.

    More Information Biology, Medicine, Public Health, BioStatsSeminar, Graduate School, Postgraduate Education, Mathematics, Technology, Engineering, Research, Training, Professional Development