Seminar Archive

  • Terrance Savitsky, PhD

    Research Mathematical Statistician

    Mathematical Statistics Research Center

    U. S. Bureau of Labor Statistics

    Title:  Pseudo Posterior Mechanism under Differential Privacy

    Abstract:  We propose a Bayesian pseudo posterior mechanism to generate record-level synthetic datasets equipped with a differential privacy (DP) guarantee from any proposed synthesis model. The pseudo posterior mechanism employs a data record-indexed, risk-based weight vector with weights ∈ [0, 1] to surgically downweight high-risk records for the generation and release of record-level synthetic data. The differentially private pseudo posterior synthesizer constructs weights using Lipschitz bounds for a log-pseudo likelihood utility for each data record, which provides a practical, general formulation for using weights based on record-level sensitivities that we show achieves dramatic improvements in the DP expenditure as compared to the unweighted posterior mechanism. By selecting weights to remove likelihood contributions with non-finite log-likelihood values, we achieve a local privacy guarantee at every sample size. We compute a local sensitivity specific to our Consumer Expenditure Surveys dataset for family income, published by the U.S. Bureau of Labor Statistics, and reveal mild conditions that guarantee its contraction to a global sensitivity result over the space of databases. We further employ a censoring mechanism to lock-in a local result with desirable risk and utility performances to achieve a global privacy result as an alternative to relying on asymptotics. We show that utility is better preserved for our pseudo posterior mechanism as compared to the exponential mechanism (EM) estimated on the same non-private synthesizer due to the use of targeted downweighting. Our results may be applied to any synthesizing model envisioned by the data disseminator in a computationally tractable way that only involves estimation of a pseudo posterior distribution for parameter(s) θ, unlike recent approaches that use naturally-bounded utility functions under application of the EM.   

    (Joint work with Matthew R. Williams and Jingchen Hu)


    Keywords: Differential privacy, Pseudo posterior, Pseudo posterior mechanism, Synthetic data

    For more information about the Statistics Seminar Series go here .

    More > Biology, Medicine, Public Health, BioStatsSeminar, Graduate School, Postgraduate Education, Mathematics, Technology, Engineering, Research, Training, Professional Development
  • Juned Siddique, PhD
    Nov
    16

    Juned Siddique, DrPH

    Associate Professor

    Departments of Preventive Medicine and Psychiatry and Behavioral Sciences

    Northwestern University Feinberg School of Medicine

    Bio:  Dr. Siddique’s research efforts focus on developing statistical methods for handling incomplete or missing data. He applies these methods to a range of problems including rater bias, participant dropout, data harmonization in individual participant data analysis, and measurement error. He collaborates closely with lifestyle intervention researchers and is interested in the analysis of diet and physical activity data.

    Title:  “Measurement error correction and sensitivity analysis in longitudinal dietary intervention studies using an external validation study”

    Abstract: In lifestyle intervention trials, where the goal is to change a participant’s weight or modify their eating behavior, self-reported diet is a longitudinal outcome variable that is subject to measurement error. We propose a statistical framework for correcting for measurement error in longitudinal self-reported dietary data by combining intervention data with auxiliary data from an external biomarker validation study where both self-reported and recovery biomarkers of dietary intake are available. In this setting, dietary intake measured without error in the intervention trial is missing data and multiple imputation is used to fill in the missing measurements. Since most validation studies are cross-sectional, they do not contain information on whether the nature of the measurement error changes over time or differs between treatment and control groups. We use sensitivity analyses to address the influence of these unverifiable assumptions involving the measurement error process and how they affect inferences regarding the effect of treatment. We apply our methods to self-reported sodium intake from the PREMIER study, a multi-component lifestyle intervention trial.

    For more information about the Statistics Seminar Series go here .

    More > Biology, Medicine, Public Health, BioStatsSeminar, Graduate School, Postgraduate Education, Mathematics, Technology, Engineering, Research, Training, Professional Development
  • Thomas Jaki, PhD

    Professor of Statistics, Department of Mathematics and Statistics

    Lancaster University and     

    Programme Leader at the MRC Biostatistics Unit, University of Cambridge

     

    Bio: Thomas Jaki is a Professor in Statistics in the Department of Mathematics and Statistics at Lancaster University. His research interests include design and analysis of clinical trials, early phase drug development, personalized medicine and biostatistics.

    An Information-Theoretic Approach for Selecting Arms in Clinical Trials

     Abstract: The question of selecting the “best” amongst different choices is a common problem in statistics. In drug development, our motivating setting, the question becomes, for example: which treatment gives the best response rate or which dose of a treatment gives an acceptable risk of toxicity. In this talk I will introduce a flexible adaptive experimental design that is based on the theory of context-dependent information measures. I will show that the design leads to a reliable selection of the correct arm in the settings of Phase I and Phase II clinical trials.

    For more information about the Statistics Seminar Series go here .

    More > Biology, Medicine, Public Health, BioStatsSeminar, Graduate School, Postgraduate Education, Mathematics, Technology, Engineering, Research, Training, Professional Development
  • Alekandra Slavković, PhD

    Professor, Departments of Statistics and Public Health Sciences

    Associate Dean for Graduate Education, Eberly College of Science

    Title: Valid statistical inference with privacy constraints

    Abstract: Limiting the disclosure risk of sensitive data and statistical analyses is a long-standing problem in statistics. Differential privacy (DP), provides a framework for a strong provable privacy protection against arbitrary adversaries while allowing the release of summary statistics and potentially synthetic data. DP methods/mechanisms require the introduction of randomness which reduces the utility of the results especially in finite samples. In this talk we give an overview of statistical data privacy and its links to DP. We also describe a general framework, built on sound statistical principles from measurement error, robustness and the likelihood-based inference, and give specific examples of how to achieve optimal statistical inference under formal privacy, focused on survey and census data.

    Bio:  Aleksandra Slavković is Professor of Statistics and Public Health Sciences at Pennsylvania State University. Her research interests include statistical disclosure limitation, algebraic statistics, characterization of discrete distributions, and application of statistics to social sciences.

    For more information about the Statistics Seminar Series go here .

    More > Biology, Medicine, Public Health, BioStatsSeminar, Graduate School, Postgraduate Education, Mathematics, Technology, Engineering, Research, Training, Professional Development
  • Jared Murray
    Oct
    26

    Jared Murray, PhD

    Assistant Professor

    Information, Risk and Operations Management

    University of Texas at Austin, McCombs School of Business

     

    Title: “Scaling Bayesian Probabilistic Record Linkage with Post-Hoc Blocking: An Application to the California Great Registers”

    Abstract:  Probabilistic record linkage (PRL) is the process of determining which records in two databases correspond to the same underlying entity in the absence of a unique identifier. Bayesian solutions to this problem provide a powerful mechanism for propagating uncertainty due to uncertain links between records (via the posterior distribution). However, computational considerations severely limit the practical applicability of existing Bayesian approaches. We propose a new computational approach, providing both a fast algorithm for deriving point estimates of the linkage structure that properly account for one-to-one matching and a restricted MCMC algorithm that samples from an approximate posterior distribution. Our advances make it possible to perform Bayesian PRL for larger problems, and to assess the sensitivity of results to varying prior specifications. We demonstrate the methods on a subset of an OCR’d dataset, the California Great Registers, a collection of 57 million voter registrations from 1900 to 1968 that comprise the only panel data set of party registration collected before the advent of scientific surveys.

    Bio:  Dr. Murray is an assistant professor of statistics in the Department of Information, Risk, and Operations Management at the McCombs School of Business, University of Texas in Austin. Until July of 2017 he was a visiting assistant professor in the Department of Statistics at Carnegie Mellon University. He completed his Ph.D. in Statistical Science at Duke University with Jerry Reiter. He also holds a B.S. in Interdisciplinary Mathematics (Statistics) from the University of New Hampshire and an M.S. in Statistical Science from Duke University. His current research interests are in developing flexible Bayesian models for heterogeneous and structured data, with applications to causal inference, record linkage, multiple imputation for missing data, and latent variable modeling.

    For more information about the Statistics Seminar Series go here .

    More > Biology, Medicine, Public Health, BioStatsSeminar, Graduate School, Postgraduate Education, Mathematics, Technology, Engineering, Research, Training, Professional Development
  • Mauricio Sadinle, PhD
    Oct
    19

    Mauricio Sadinle, PhD

    Assistant Professor, Department of Biostatistics

    University of Washington, School of Public Health

    Sequentially additive nonignorable missing data modelling using auxiliary marginal information

    Abstract: We study a class of missingness mechanisms, referred to as sequentially additive nonignorable, for modelling multivariate data with item nonresponse. These mechanisms explicitly allow the probability of nonresponse for each variable to depend on the value of that variable, thereby representing nonignorable missingness mechanisms. These missing data models are identified by making use of auxiliary information on marginal distributions, such as marginal probabilities for multivariate categorical variables or moments for numeric variables. We prove identification results and illustrate the use of these mechanisms in an application.

    Article: https://doi.org/10.1093/biomet/asz054

    Bio: Mauricio Sadinle, PhD is an Assistant Professor in the Department of Biostatistics at the University of Washington. Previously, he was a Postdoctoral Associate in the Department of Statistical Science at Duke University and the National Institute of Statistical Sciences, working under the mentoring of Jerry Reiter. He completed his PhD in the Department of Statistics at Carnegie Mellon University, where his advisor was Stephen E. Fienberg. Dr. Sadinle’s undergraduate studies are from the National University of Colombia in Bogota, where he majored in statistics.  Dr. Sadinle’s methodological research mainly focuses on 1. Record linkage techniques to combine datafiles that contain information on overlapping sets of individuals but lack unique identifiers and 2. Nonignorable missing data modeling, and the usage of auxiliary information to identify nonignorable missing data mechanisms. Dr. Sadinle also has experience working with social network models for valued ties, capture-recapture models in the context of human rights violations, and set-valued classifiers that output sets of plausible labels for ambiguous sample points.

    For more information about the Statistics Seminar Series go here .

    More > Biology, Medicine, Public Health, BioStatsSeminar, Graduate School, Postgraduate Education, Mathematics, Technology, Engineering, Research, Training, Professional Development
  • Stephanie Shipp, PhD
    Oct
    5

    Stephanie Shipp, PhD

    Deputy Director and Professor

    Social and Decision Analytics Division, Biocomplexity Institute

    University of Virginia

    Abstract Title: ”Ethical Principles and Data Science - Repurposing Administrative & Opportunity Data”

    The data revolution is changing the conduct of research as increasing amounts of internet-based and administrative data become accessible for use. At the same time, the new data landscape has created significant tension around data privacy and confidentiality. To bridge this gap, conversations about ethics, privacy, transparency, and reproducibility need to play a prominent role in both research partnerships and policymaking. At the research level, these conversations must be translated to action. We have created a comprehensive framework that forms the foundation to data science problem solving through defining rigorous, flexible, and iterative processes where learning at each stage informs the other stages. Embedded in this framework is close attention to ethics. The Institutional Review Board structure is well known in parts of academia and industry, but our public and local government partners are not always aware of these processes. The IRB framework could help them think about informed consent and privacy, as well as ethical considerations around the benefits and risks to individuals and communities under study. Through case studies, these principles are demonstrated.

    Keywords: confidentiality, ethics, trust but verify, data science framework

    Brief Bio: Data scientists have the opportunity to use their skills to influence and improve society, especially vulnerable populations who need champions. Stephanie Shipp enthusiastically works with communities, policy makers and other data scientists who have also taken that challenge to heart.

    For more information about the Statistics Seminar Series go here .

    More > Biology, Medicine, Public Health, BioStatsSeminar, Graduate School, Postgraduate Education, Mathematics, Technology, Engineering, Research, Training, Professional Development
  • Despina Kontos, PhD
    Sep
    28

    Despina Kontos, PhD

    Associate Professor of Radiology

    Department of Radiology

    Perelman School of Public Health

    University of Pennsylvania

    Bio:  Dr. Despina Kontos, Ph.D., is an Associate Professor of Radiology and director of the Computational Biomarker Imaging Group (CBIG) in the Center for Biomedical Image Computing and Analytics (CBICA) at the Radiology Department of the University of Pennsylvania. Dr. Kontos received her C.Eng. Diploma in Computer Engineering and Informatics from the University of Patras in Greece and her MSc and Ph.D. degrees in Computer Science from Temple University in Philadelphia. She completed her postdoctoral training in radiologic physics and biostatistics at the University of Pennsylvania. Her research interests focus on investigating the role of quantitative imaging as a predictive biomarker for guiding personalized clinical decisions in cancer screening, prognosis, and treatment. She is leading several research studies, funded both by the NIH/NCI and private foundations, to incorporate novel quantitative multi-modality imaging measures of breast tumor and tissue composition into cancer risk prediction models.

    Title:  “Radiomic Biomarkers for Deciphering Tumor Heterogeneity”

    Abstract - Breast cancer is a heterogeneous disease, with known inter-tumor and intra-tumor heterogeneity. Established histopathologic prognostic biomarkers generally acquired from a tumor biopsy may be limited by sampling variation. Radiomics is an emerging field with the potential to leverage the whole tumor via non-invasive sampling afforded by medical imaging to extract high throughput, quantitative features for personalized tumor characterization. Identifying imaging phenotypes via radiomics analysis and understanding their relationship with prognostic markers and patient outcomes can allow for a non-invasive assessment of tumor heterogeneity. In this study, we identified and independently validated intrinsic radiomic phenotypes of tumor heterogeneity for invasive breast cancer that have independent prognostic value when predicting 10-year recurrence. The independent and additional prognostic value of imaging heterogeneity phenotypes suggests that radiomic phenotypes can provide a non-invasive characterization of tumor heterogeneity to augment personalized prognosis and treatment.

    For more information about the Statistics Seminar Series, click here .

    More > Biology, Medicine, Public Health, BioStatsSeminar, Graduate School, Postgraduate Education, Mathematics, Technology, Engineering, Research, Training, Professional Development
  • Jianqiang Fan, PhD
    Sep
    14

    Jianqing Fan, PhD, Professor of Statistics

    Frederick L. Moore ’18 Professor of Finance

    Princeton University

    Title: Communication—Efficient Accurate Statistical Estimation

    Abstract: When the data are stored in a distributed manner, direct application of traditional statistical inference procedures is often prohibitive due to communication cost and privacy concerns. This paper develops and investigates two Communication-Efficient Accurate Statistical Estimators (CEASE), implemented through iterative algorithms for distributed optimization. In each iteration, node machines carry out computation in parallel and communicates with the central processor, which then broadcasts aggregated gradient vector to node machines for new updates. The algorithms adapt to the similarity among loss functions on node machines, and converge rapidly when each node machine has large enough sample size. Moreover, they do not require good initialization and enjoy linear converge guarantees under general conditions. The contraction rate of optimization errors is derived explicitly, with dependence on the local sample size unveiled. In addition, the improved statistical accuracy per iteration is derived. By regarding the proposed method as a multi-step statistical estimator, we show that statistical efficiency can be achieved infinite steps in typical statistical applications. In addition, we give the conditions under which one-step CEASE estimator is statistically efficient. Extensive numerical experiments on both synthetic and real data validate the theoretical results and demonstrate the superior performance of our algorithms.

    (Joint work with Yongyi Guo and Kaizheng Wang)

     

    For more information about the Statistics Seminar Series, click here

    More > Biology, Medicine, Public Health, BioStatsSeminar, Graduate School, Postgraduate Education, Mathematics, Technology, Engineering, Research, Training, Professional Development
  • Daniel Almirall, PhD, Associate Professor, Co-Director, Data Science for Dynamic Intervention Decision-making Laboratory (d3lab), Survey Research Center, Institute for Social Research; Department of Statistics, College of Literature Sciences and the Arts, University of Michigan

    Bio: Daniel Almirallis Associate Professor in the Institute for Social Research and the Department of Statistics at the University of Michigan. He is a methodologist and statistician who develops methods to form evidence-based adaptive interventions. Adaptive interventions can be used to inform individualized intervention guidelines for the on-going management of chronic illnesses or disorders such as drug abuse, depression, anxiety, autism, obesity, or HIV/AIDS. More recently, Dr. Almirall has been developing methods to form adaptive implementation interventions, to inform how best to tailor sequences of organizational-level strategies to improve the implementation of evidence-based practices. His work includes the development of approaches related to the design, execution, and analysis of sequential multiple assignment randomized trials (SMARTs) which can be used to build adaptive interventions, and of clustered SMARTs to build adaptive implementation interventions. He is particularly interested in applications in child and adolescent mental health research.

    Abstract to be posted

    More > Biology, Medicine, Public Health, BioStatsSeminar, Education, Teaching, Instruction, Graduate School, Postgraduate Education, Mathematics, Technology, Engineering, Training, Professional Development