May912:00pm - 1:00pm
Dr. Yoav Benjamini, Professor Emeritus of Applied Statistics at the Department of Statistics and Operations Research at Tel Aviv University, and a member of the Sagol School of Neuroscience and the Edmond Safra Bioinformatics Center.
Replicability Issues in Medical Research: Science and Politics
Selective inference and irrelevant variability are two statistical issues hindering replicability across science. I will review the first in the context of secondary endpoint analysis in clinical and epidemiological research. This leads us to discuss the debate about p-values and statistical significance and the politics involved. I will present practical approaches that seem to accommodate the concerns of NEJM editors, as reflected in their guidelines.
I shall discuss more briefly the issue of addressing the relevant variability, in the context of in preclinical animal experiments, and the implication of this work about assessing replicability in meta-analysis.
Major parts of this work done jointly with Iman Jaljuli, Orestis Panagiotou and Ruth Heller.
Dr. Yoav Benjamini
Yoav Benjamini is Professor Emeritus of Applied Statistics at the Department of Statistics and Operations Research at Tel Aviv University, and a member of the Sagol School of Neuroscience and the Edmond Safra Bioinformatics Center. He was a visiting professor at the University of Pennsylvania, University of California, Berkeley, Stanford, and Columbia Universities. Yoav is a co-developer of the widely used False Discovery Rate concept and methodology. His other research topics are replicability and reproducibility in science and data mining, with applications in Biostatistics, Bioinformatics, Animal Behavior, Geography, Meteorology, Brain Imaging and Health Informatics. He is a member of the Israel Academy of Sciences and Humanities and the US National Academy of Sciences, and received the Israel Prize in Statistics and Economics and the Founders of Statistics Prize of the International Statistical Institute.
Mar28Virtual and In Person12:00pm - 1:00pm
Dr. Despina Kontos, Matthew J. Wilson Associate Professor of Research Radiology II, Associate Vice-Chair for Research, and Director of the Computational Biomarker Imaging Group (CBIG) in the Center for Biomedical Image Computing and Analytics (CBICA) at the Radiology Department of the University of Pennsylvania
The Role of Imaging as a Biomarker in Integrated Precision Diagnostics for Cancer Care
As new options for breast cancer screening, early detection and treatment become available it is essential to provide accurate, clinically relevant methods to identify women that would benefit most from specific approaches. An emerging approach to improve individualized risk assessment in clinical decision making for breast cancer is the incorporation imaging biomarkers. Our studies with multi-modality breast imaging suggest that imaging can play an important role for personalizing patient care. Quantitative measures of breast density and parenchymal texture can improve the prediction accuracy of breast cancer risk estimation models and potentially, help guide personalized breast cancer screening protocols. Tumor phenotypic characteristics, such as shape, morphology, and heterogeneity of contrast enhancement kinetics from magnetic resonance imaging are indicative of molecular subtypes of breast cancer and correlate with the probability of future recurrence. Such phenotypic tumor imaging markers can also be used as surrogates for treatment response, including neo-adjuvant chemotherapy, and help identify earlier patients that are most likely to respond to treatment. This emerging evidence therefore suggests a new clinical paradigm that will necessitate integrating multi-modality imaging biomarkers with genomics, histopathology, and clinical risk factors to assess individualized patient risk and help better guide clinical decisions for breast cancer. This talk will provide an overview of investigations currently on-going at our institution that include digital mammography, digital breast tomosynthesis and magnetic resonance imaging biomarkers and their potential clinical utility in guiding personalized screening, prevention, and treatment approaches for breast cancer.
Dr. Despina Kontos Ph.D., is the Matthew J. Wilson Associate Professor of Research Radiology II, Associate Vice-Chair for Research, and Director of the Computational Biomarker Imaging Group (CBIG) in the Center for Biomedical Image Computing and Analytics (CBICA) at the Radiology Department of the University of Pennsylvania. Dr. Kontos received her C.Eng. diploma in Computer Engineering and Informatics from the University of Patras in Greece and her Ph.D. degrees in Computer Science from Temple University in Philadelphia. Her research interests focus on investigating the role of quantitative imaging as a predictive biomarker for guiding personalized clinical decisions in breast cancer screening, prognosis and treatment. She has been the recipient of the ECOG-ACRIN Young Investigator Award of Distinction for Translational Research and is currently leading several on-going research studies, funded both by the NIH/NCI and private foundations, to incorporate novel quantitative multi-modality imaging measures of breast tumor and normal tissue composition into cancer risk prediction models.
Mar14Virtual12:00pm - 1:00pm
Dr. Nicholas Petrick, Deputy Director for the Division of Imaging, Diagnostics and Software Reliability at the Center for Devices and Radiological Health, U.S. Food and Drug Administration and member of the FDA Senior Biomedical Research Service
Current regulatory validation methods for artificial intelligence models applied to medical imaging data
Statical decision making, artificial intelligence and machine learning (AI/ML) methods have a long history being applied to digital medical image data with mammography computer-aided detection devices approved back in 1998 by FDA and other quantitative tools/measures approved or cleared even earlier. The number of AI/ML tools applied to medical image data remained relatively consistent until a few years ago. The FDA is currently seeing a substantial increase in the number of submitted AI/ML tools because of recent advances in deep learning methods in other commercial areas with the potential for these tools to have a much wider impact on clinical decision-making. Some newer medical AI/ML applications include detection and diagnostic tools to aid in disease detection and assessment, triage tools to aid in prioritizing time-sensitive imaging studies, quantitative measurement tools, structural segmentation tools, image reconstruction or denoising tools, and optimization tools to aid in image acquisition to name a few. In this talk, I will introduce the audience to FDA’s medical device regulatory processes with the goal of demystifying how medical devices are regulated in the U.S. The main focus of my talk will be on the validation methods currently being applied to AI/ML device assessment and a discussion of our ongoing regulatory research developing methods to potentially improve AI/ML algorithm generalizability, robustness analysis as well as AI/ML device performance assessment.
Mar712:00pm - 1:00pm
Dr. Katherine Heller, Research Scientist at Google
Towards Trustworthy Machine Learning in Medicine and the Role of Uncertainty
As ML is increasingly used in society, we need methods that we have confidence that we can rely on, particularly in the medical domain. In this talk I discuss 3 pieces of work, the role uncertainty plays in understanding and combating issues with generalization and bias, and particular mitigations that we can take into consideration.
1) Sepsis Watch - I present a Gaussian Process (GP) + Recurrent Neural Network (RNN) model for predicting sepsis infections in Emergency Department patients. I will discuss the benefit of uncertainty given by the GP. I will then discuss the social context in introducing such a system into a hospital setting.
2) Uncertainty and Electronic Health Records (EHR) - I will discuss Bayesian RNN models developed for mortality prediction, and the distinction between population level predictive performance and individual level predictive performance, and its implications for bias.
3) Underspecification and the credibility implications of hyperparameter choices in ML models – I will discuss medical imaging applications and how using the uncertainty of model performance conditioned on choice of hyperparameters can help identify situations in which methods may not generalize well outside the training domain.
Feb14Virtual12:00pm - 1:00pm
Dr. Kristian Lum, Senior Staff Machine Learning Researcher at Twitter
Closer Than They Appear: A Bayesian Perspective on Individual-level Heterogeneity in Risk Assessment
Risk assessment instruments are used across the criminal justice system to estimate the probability of some future behavior given covariates. The estimated probabilities are then used in making decisions at the individual level. In the past, there has been controversy about whether the probabilities derived from group-level calculations can meaningfully be applied to individuals. Using Bayesian hierarchical models applied to a large longitudinal dataset from the court system in the state of Kentucky, we analyze variation in individual-level probabilities of failing to appear for court and the extent to which it is captured by covariates. We find that individuals within the same risk group vary widely in their probability of the outcome. In practice, this means that allocating individuals to risk groups based on standard approaches to risk assessment, in large part, results in creating distinctions among individuals who are not meaningfully different in terms of their likelihood of the outcome. This is because uncertainty about the probability that any particular individual will fail to appear is large relative to the difference in average probabilities among any reasonable set of risk groups.
Jan31Virtual12:00pm - 1:00pm
Dr. Amy Herring PhD, Sara and Charles Ayres Distinguished Professor of Statistical Science and Research Professor of Global Health at Duke University
Informative Priors for Clustering
Based on challenges in a large national study of birth defects, we consider a canonical problem in epidemiology of “lumping” versus “splitting” of groups. In many cases, groups may be unknown in advance, adding the additional challenge of determining group or cluster membership. While there is a very rich literature proposing Bayesian approaches for clustering starting with a prior probability distribution on partitions, most approaches assume exchangeability. Even though there have been some proposals to relax the exchangeability assumption, allowing covariate-dependence and partial exchangeability, limited consideration has been given on how to include concrete prior knowledge on the partition itself. For example, we are motivated by an epidemiological application, in which we wish to cluster birth defects into groups and we have prior knowledge of an initial clustering, provided by experts. As a general approach for including such prior knowledge, we propose a Centered Partition (CP) process. Some properties of the CP prior are described, a general algorithm for posterior computation is developed, and we illustrate the methodology through simulation examples and an application to the motivating epidemiology study of birth defects.
Jan24Virtual12:00pm - 1:00pm
Tamara Broderick, PhD, Associate Professor in the Department of Electrical Engineering and Computer Science at MIT
An Automatic Finite-Sample Robustness Metric: Can Dropping a
Little Data Change Conclusions?
One hopes that data analyses will be used to make beneficial decisions regarding people’s health, finances, and well-being. But the data fed to an analysis may systematically differ from the data where these decisions are ultimately applied. For instance, suppose we analyze data in one country and conclude that microcredit is effective at alleviating poverty; based on this analysis, we decide to distribute microcredit in other locations and in future years. We might then ask: can we trust our conclusion to apply under new conditions? If we found that a very small percentage of the original data was instrumental in determining the original conclusion, we might expect the conclusion to be unstable under new conditions. So we propose a method to assess the sensitivity of data analyses to the removal of a very small fraction of the data set. Analyzing all possible data subsets of a certain size is computationally prohibitive, so we provide an approximation. We call our resulting method the Approximate Maximum Influence Perturbation. Our approximation is automatically computable, theoretically supported, and works for common estimators — including (but not limited to) OLS, IV, GMM, MLE, MAP, and variational Bayes. We show that any non-robustness our metric finds is conclusive. Empirics demonstrate that while some applications are robust, in others the sign of a treatment effect can be changed by dropping less than 0.1% of the data — even in simple models and even when standard errors are small.
Dec6Virtual and In Person12:00pm - 1:00pm
Dr. Li-Xuan Qin, Associate Member in Biostatistics; PhD in biostatistics; Memorial Sloan Kettering Cancer
Transcriptomics Data Normalization: Let’s Put It into Context
This talk will describe an assessment of transcriptomics data normalization (for removing artifacts due to inconsistent experimental handling in data collection) in the context of downstream analysis. With robustly benchmarked data and novel re-sampling-based simulations, I will illustrate several caveats of data normalization for biomarker discovery, sample classification, and survival prediction. I will then discuss the underlying causes for these caveats and provide alternative approaches that are more effective for dealing with the data artifacts.
Nov22Virtual12:15pm - 1:00pm
Dr. Jonathan Bartlett PhD, Reader in Statistics, Department of Mathematical Sciences, University of Bath, England
Hypothetical estimands in clinical trials - a unification of causal inference and missing data methods
In clinical trials events may take place which complicate interpretation of the treatment effect. For example, in diabetes trials, some patients may require rescue medication during follow-up if their diabetes is not well controlled. Interpretation of the intention to treat effect is then complicated if the level of rescue medication is imbalanced between treatment groups. In such cases we may be interested in a so-called hypothetical estimand which targets what effect would have been seen in the absence of rescue medication. In this talk I will discuss estimation of such hypothetical estimands. Currently such estimands are typically estimated using standard missing data techniques after exclusion of any outcomes measured after such events take place. I will define hypothetical estimands using potential outcomes, and exploit standard results for identifiability of causal effects from observational data to describe assumptions sufficient for identification of hypothetical estimands in trials. I will then discuss both ‘causal inference’ and ‘missing data’ methods (such as mixed models) for estimation, and show that in certain situations estimators from these two sets are identical. These links may help those familiar with one set of methods but not the other. They may also identify situations where currently adopted estimation approaches may be relying on unrealistic assumptions, and suggest alternative approaches for estimation.
Nov1Virtual12:00pm - 1:00pm
Dr. Jean Feng, Assistant Professor, Department of Epidemiology and Biostatistics, University of California, San Francisco (UCSF)
Safe approval policies for continual learning systems in healthcare
The number of machine learning (ML)-based medical devices approved by the US Food and Drug Administration (FDA) has been rapidly increasing. The current regulatory policy requires these algorithms to be locked post-approval; subsequent changes must undergo additional scrutiny. Nevertheless, ML algorithms have the potential to improve over time by training over a growing body of data, better reflect real-world settings, and adapt to distributional shifts. To facilitate a move toward continual learning algorithms, the FDA is looking to streamline regulatory policies and design Algorithm Change Protocols (ACPs) that autonomously approve proposed modifications. However, the problem of designing ACPs cannot be taken lightly. We show that policies without error rate guarantees are prone to “bio-creep” and may not protect against distributional shifts. To this end, we investigate the problem of ACP design within the frameworks of online hypothesis testing and online learning and take the first steps towards developing safe ACPs.