Statistics Seminars Series

Semenir series1.png

Brown Statistical Seminars are hosted collaboratively by the Department of Biostatistics and the Center for Statistical Sciences to provide educational and research opportunities to graduate, undergraduate, and medical students as well as to researchers across the University.  The seminars take place on selected Mondays throughout the academic year and feature leading researchers from the US and internationally.   

To receive our seminar announcements, contact us.

 

  • Dr. Feng Liang

    Associate Professor at the Department of Statistics, University of Illinois at Urbana-Champaign

    Title: Learning Topic Models: Identifiability and Rate of Convergence

    Abstract: Topic models provide a useful text-mining tool for learning, extracting, and discovering latent structures in large text corpora. Although a plethora of algorithms have been proposed for topic modeling, little work is done to study the statistical accuracy of the estimated structures. In this paper, we propose an MLE of latent topics based on an integrated likelihood. We further introduce a new set of conditions for topic model identifiability, which are weaker than conditions that reply to the existence of anchor words. In addition, we study the estimation consistency and establish the convergence rate of the proposed estimator. Our algorithm, which is an application of the EM algorithm, is demonstrated to have competitive performance through simulation studies and a real application.

    This is based on joint work with Yinyin Chen, Shishuang He, and Yun Yang.

    Biology, Medicine, Public Health, BioStatsSeminar, Graduate School, Postgraduate Education, Mathematics, Technology, Engineering, Research, Training, Professional Development
  • Sudipto Banerjee, PhD
    Professor and Chair of the Department of Biostatistics
    UCLA Fielding School of Public Health

    Title: Bayesian Finite Population Survey Sampling from Spatial Process Settings

    Abstract:
    We develop a Bayesian model-based approach to finite population estimation accounting for spatial dependence. Our innovation here is a framework that achieves inference for finite population quantities in spatial process settings. A key distinction from the small area estimation setting is that we analyze finite populations referenced by their geographic coordinates (point-referenced data). Specifically, we consider a two-stage sampling design in which the primary units are geographic regions, the secondary units are point-referenced locations, and the measured values are assumed to be a partial realization of a spatial process. Traditional geostatistical models do not account for variation attributable to finite population sampling designs, which can impair inferential performance. On the other hand, design-based estimates will ignore the spatial dependence in the finite population. This motivates the introduction of geostatistical processes that will enable inference at arbitrary locations in our domain of interest. We demonstrate using simulation experiments that process-based finite population sampling models considerably improve model fit and inference over models that fail to account for spatial correlation. Furthermore, the process based models offer richer inference with spatially interpolated maps over the entire region. We reinforce these improvements and also scalable inference for spatial BIG DATA analysis with millions of locations using Nearest-Neighbor and Meshed Gaussian processes. We will demonstrate our framework with an example of groundwater Nitrate levels in the population of California Central Valley wells by offering estimates of mean Nitrate levels and their spatially interpolated maps.

    Biology, Medicine, Public Health, BioStatsSeminar, Graduate School, Postgraduate Education, Mathematics, Technology, Engineering, Research, Training, Professional Development
  • Hyunseung Kang, PhD

    Assistant Professor, Department of Statistics, University of Wisconsin-Madison

    Title:
    Assumption-Lean Analysis of Cluster Randomized Trials in Infectious Diseases for Intent-to-Treat Effects and Spillover Effects Among A Vulnerable Subpopulation
    Abstract:
    Cluster randomized trials (CRTs) are a popular design to study the effect of interventions in infectious disease settings. However, standard analysis of CRTs primarily relies on strong parametric methods, usually a Normal mixed effect models to account for the clustering structure, and focus on the overall intent-to-treat (ITT) effect to evaluate effectiveness. The paper presents two methods to analyze two types of effects in CRTs, the overall and heterogeneous ITT effects and the spillover effect among never-takers who cannot or refuse to take the intervention. For the ITT effects, we make a modest extension of an existing method where we do not impose parametric models or asymptotic restrictions on cluster size. For the spillover effect among never-takers, we propose a new bound-based method that uses pre-treatment covariates, classification algorithms, and a linear program to obtain sharp bounds. A key feature of our method is that the bounds can become dramatically narrower as the classification algorithm improves and the method may also be useful for studies of partial identification with instrumental variables. We conclude by reanalyzing a CRT studying the effect of face masks and hand sanitizers on transmission of 2008 interpandemic influenza in Hong Kong. This is joint work with Chan Park (UW-Madison)
    Biology, Medicine, Public Health, BioStatsSeminar, Graduate School, Postgraduate Education, Mathematics, Technology, Engineering, Research, Training, Professional Development