Department of Biostatistics Ph.D. Research Presentations

January 30, 2023 at 12:00pm

Chang Yu

Adviser: Zhijin Wu

Title: Accurate Detection of MicroRNAs from NanoString nCounter with a Latent Mixture Model
Abstract: MicroRNAs (miRNA) are promising biomarker candidates for their association with a wide range of diseases and their presence in easy-to-obtain biofluids. The NanoString nCounter platform is a popular platform measuring miRNA for it avoids amplification bias. Existing methods for nCounter data processing and analysis rely heavily on the handful of control probes and housekeeping genes for background estimation and/or normalization. Motivated by the observations from hundreds of samples compiled from multiple studies, we propose a multi-study joint processing method, multi-study miRNA detection (MMD). MMD is based on a latent mixture model that accounts for both probe-specific and sample-specific effects. The probe effects are estimated jointly from all samples. Sample-specific background and normalization factors are estimated from all probes instead of relying on a few controls. We demonstrate that MMD outperforms the built-in method from Nanostring in signal detection and has greater power in identifying differentially present miRNAs which are largely overlooked by alternative methods, in both simulation and real data comparison.

___________________________________________________________________________________________________________________________________

February 6, 2023 at 12:30pm

Esteban Fernandez Morales

Adviser: Arman Oganisian & Youjin Lee

Title: Causal Cost-Effectiveness Analysis of Sequential Treatments: A Bayesian Approach in Continuous Time
Abstract: Observational cost-effectiveness analyses (CEAs) are used in health economics to compare interventions in terms of their effectiveness (usually survival time) and financial cost to inform various policy decisions. In practice, many interventions are sequential, i.e., consisting of a sequence of treatment decisions applied at varying, patient-specific times. CEAs are complex in these settings since subjects may drop out at any point before completing the treatment sequence, which leaves cost and effectiveness measures unobserved. Additionally, in observational data, treatment decisions are not sequentially randomized, making cost-efficacy estimation prone to time-varying confounding. To address these issues, we formulate a Bayesian method that jointly models the waiting times between each treatment in the sequence with the incremental costs accrued during those periods. Under assumptions about the censoring and treatment mechanism, we use a g-computation procedure to contrast counterfactual cost-effectiveness metrics under different treatment strategies. We present simulation results assessing the frequentist properties of the posterior estimates of these causal contrasts.

___________________________________________________________________________________________________________________________________

February 13, 2023 at 12:00pm

Sarah Voter

Adviser: Joseph Hogan

Title:  TBD
Abstract: TBD

___________________________________________________________________________________________________________________________________

February 13, 2023 at 12:30pm

Yimo Zhang

Adviser: Ani Eloyan

Title: Feature Extraction on PET Images Using an ICA-based Method
Abstract: Independent component analysis (ICA) has been widely used in brain images such as functional MRIs to efficiently extract features from original images. However, little effort has been made in the area of PET images using similar approaches. This paper proposes a method to extract features of PET images based on ICA and variational autoencoder (VA) under certain statistical assumptions. We ran simulations to demonstrate that the proposed method is capable of accomplishing feature-extraction task in different scenarios. This method will be finally used in processing TAU PET images, and the obtained features can be further utilized in other tasks such as prediction.

___________________________________________________________________________________________________________________________________

February 27, 2023 at 12:00pm

Gauri Kamat

Adviser: Roee Gutman

Title: A Bayesian Record Linkage Approach that Adjusts for Variables in One File
Abstract: In many healthcare applications, information about units is dispersed across multiple datasets. Linking records across datasets becomes necessary when the goal is to estimate associations among variables exclusively appearing in each dataset. Record linkage is a statistical technique that identifies records representing the same entity across multiple datasets when unique identifiers are absent. Common Bayesian record linkage algorithms rely on similarities between variables recorded in both datasets, and do not adjust for relationships between variables that are exclusive to each dataset. We extend existing Bayesian record linkage methods to integrate associations between variables that are exclusive to one dataset. We show analytically, and using simulations, that our method improves the linking process and results in accurate estimates when identifying information is limited. We apply our method to link Meals on Wheels recipients to Medicare Enrollment records, and examine the association between activities of daily living and healthcare utilization among Meals on Wheels recipients.

___________________________________________________________________________________________________________________________________

February 27, 2023 at 12:30pm

Shuo Feng

Adviser: Alyssa Bilinski

Title:  Robust inference methods for longitudinal causal inference
Abstract: TBD

___________________________________________________________________________________________________________________________________

March 20, 2023 at 12:00pm

Jerson Cochancela

Adviser: Roee Gutman

Title: Principal Stratification under Diagnostic Testing
Abstract: The National Lung Screening Trial (NLST 2011) enrolled participants at high risk for lung cancer and is the only screening trial to report a reduction in all-cause mortality. Over the course of the NLST screen-ing, significant incidental findings (SIFs) unrelated to lung cancer were detected. We examine whether SIF detection is associated with the reported reduction in all-cause mortality. To estimate this association, we formalize the relationship between screening arms and SIF detection using the potential outcomes framework (Imbens and Rubin, 1997) and principal stratification (PS) (Frangakis and Rubin, 2002). Using this classification of patients, screening effects are estimated within each latent stratum and the hetero-geneity across strata provides evidence of the relationship between SIF detection and mortality. Our previous work on PS under diagnostic testing relied on EM and data augmentation algorithms. We now present PS and principal causal effects (PCE) under an entirely Bayesian framework. While some PCE are weakly identified, exhibiting regions of flatness in their posterior distributions, they are nevertheless informative regarding screening strata. The identification and estimation of PS and PCE rely on some assumptions which are untestable. We provide sensitivity analyses for monotonicity (Ding and Lu, 2017) and different combinations of exclusion restrictions (Hirano et al., 2000). Understanding which screening strata have clinical significance is crucial in understanding the cascade of care that follows abnormality detection during diagnostic screening.

___________________________________________________________________________________________________________________________________

March 20, 2023 at 12:30pm

Ruya Kang

Adviser: Constantine Gatsonis and Jon Steingrimsson

Title:  TBD
Abstract: TBD

___________________________________________________________________________________________________________________________________

April 10, 2023 at 12:00pm

Blake Hansen

Adviser: Roberta De Vito

Title: Fast Variational Methods for Multi-study Non-negative Matrix Factorization Models
Abstract: Non-negative Matrix Factorization (NMF) is a common latent variable model used to decompose mutation rates in biological data into two components: a matrix of mutational signatures and associated exposures. Many studies have established the importance and replicability of mutational signatures across multiple cancer datasets. Recently, NMF methods have been developed to analyze mutation rates in multi-study settings through a fully Bayesian perspective. However, these methods are computationally expensive and do not scale well in high dimensions. In this talk, we adapt multistudy-NMF models and propose scalable Variational Inference (VI) algorithms to deliver approximate Bayesian inference at fast speeds.

___________________________________________________________________________________________________________________________________

April 10, 2023 at 12:30pm

Nick Lewis

Adviser: Joseph Hogan

Title: Predicting Time of Return to Care for HIV Patients in AMPATH's Clinical system
Abstract: Consistent healthcare is an essential need for people living with HIV (PLWH). Since the advent of Anti-retroviral (ART) drugs, a patient can suppress their HIV count to be low enough where transmission is nigh impossible. This arguably makes retention in HIV care and compliance with ART the most effective tools available for bringing the pandemic to an end. In our work, we aim to use a Bayesian framework to build a longitudinal Machine Learning model capable of predicting the time of arrival of a patient in order to flag those who are at risk of dropping out as to keep individuals on their scheduled regime.

___________________________________________________________________________________________________________________________________

April 24, 2023 at 12:00pm

Rob Zielinski

Adviser: Ani Eloyan

Title:  TBD
Abstract: TBD

___________________________________________________________________________________________________________________________________

April 24, 2023 at 12:30pm

Taylor Fortnam

Adviser: Joseph Hogan

Title: A Method for Vaccine Effectiveness Surveillance with Application to the BA.1 and BA.2 sub-lineages of the Omicron Variant
Abstract: Efficient use of existing methods for estimating vaccine effectiveness in public health surveillance is limited by the necessity of conducting a distinct study, which entails data collection from a large cohort or sampling test-negative controls and introducing bias. In the context of COVID-19, new variants arise frequently with different viral properties that can impact the effectiveness of the vaccines. We propose a dynamically-updating method for rapid estimation of vaccine effectiveness against a new variant without the need to conduct a traditional study. It relies on an existing estimate of effectiveness against a previously circulating variant and a measure of relative vaccine effectiveness produced from local surveillance data. We demonstrate the utility of this method on the BA.1 and BA.2 sub-lineages of the Omicron variant. The method produces estimates of vaccine effectiveness comparable to those produced using traditional methods, although with increased standard error. The increase in error, however, is reasonable given a much smaller sample size than other studies, and error ranges of the estimates could be significantly improved by sequencing a larger proportion of identified cases. Our method can be applied using routinely-collected data to produce timely, rigorous VE estimates to alert health departments to potential changes in VE.

___________________________________________________________________________________________________________________________________

May 1, 2023 at 12:00pm

Patrick Gravelle

Adviser: Roee Gutman

Title: Designing an adaptive trial to address noncompliance
Abstract: Randomized controlled trials (RCTs) aim to estimate the effect of an intervention. Estimating this effect becomes difficult when there is noncompliance in the study population, which can lead to misleading conclusions. Numerous methods have been developed to address noncompliance following the collection of data from an RCT during the analysis phase. Alternatively, adaptive designs provide researchers the ability to specify possible modifications during the design phase of a randomized trial. Investigators are able to evaluate interventions as data are accrued and apply any necessary adaptations to the trial. The adaptive design literature has historically addressed issues such as dose-determination, sample size reformulation, and sequential stopping procedures. Limited literature has addressed noncompliance in the design phase of a trial. We propose an adaptive design to address noncompliance within a causal inference framework using Bayesian methods.

___________________________________________________________________________________________________________________________________

May 1, 2023 at 12:30pm

Anthony Sisti

Adviser: Roee Gutman

Title: A Latent Principal Stratification Method to Address Cluster and Individual Noncompliance in Cluster RCTs
Abstract: In pragmatic cluster randomized controlled trials (CRTs), patients and providers noncompliance to the intervention can occur. Poor compliance of providers and patients can impact the estimates of the treatment effect. Some studies track metrics that describe providers’ implementation of the intervention and patients’ adherence to it. The compliance metrics offer insights on the effects of an intervention among providers and the patients that adhere to the intervention. We propose a Bayesian procedure that estimates latent compliance classes of the providers for one-sided non-compliance. Using the latent class model, we impute compliance metrics for providers assigned to control, and classify all providers into compliance strata. Within providers’ compliance stratum we impute the unobserved patient binary compliance status and estimate the treatment effect among complying individuals. This allows for comparison of the effect of the intervention between similar groups of providers defined by their latent compliance strata. We apply the procedure to analyze METRICaLL, a pragmatic CRT studying the effect of music therapy on the behaviors of nursing home residents with dementia.

___________________________________________________________________________________________________________________________________

May 15, 2023 at 12:00pm

Jenn Scodes

Adviser: Roee Gutman

Title: Comparison of methods for analyzing proxy responses in patient-reported outcomes
Abstract: A major challenge in patient-reported outcomes is missingness due to patient non-response. In some cases, when a patient is unable to respond, a proxy such as a family member or clinician will respond on behalf of the patient. However, patients who self-report versus patients with proxy-reports likely differ in important ways making it important to control for these differences when analyzing proxy-reported data. There are currently no standard ways for analyzing these data; therefore, we aim to compare various methods of addressing proxy responses (e.g., substitution, regression adjustments, propensity score matching, predictive mean matching, and item-response theory based graded response models) in patient-reported outcome data.

___________________________________________________________________________________________________________________________________

May 15, 2023 at 12:30pm

Ruofan Bie

Adviser: Jon Steingrimsson

Title: Inverse probability weighting for Missing Data in Causally Interpretable Meta-Analysis
Abstract: Meta-analysis has been widely applied in multiple fields. It synthesizes independent primary studies, which we call as trials, that focus on the same question and then make inference for the question on a new dataset, which we call as the target. However, in real-world meta-analysis, the covariates collected in trials might be different due to different study designs and there might be incomplete observations in trials and the target. One of the commonly used method to deal with missing data is complete case analysis. However, when the missing pattern is missing at random (MAR), complete case analysis would cause bias. Another commonly used method is multiple imputation. However, the method would also introduce bias when distribution of observed data is different from the distribution of the missing data. In this paper, we use inverse probability weighting to tackle the missing data problem and proposed outcome estimator, IPW estimator and doubly-robust estimator for the estimation of the causal average treatment effect in the target. The simulation study showed that our proposed estimators are unbiased compared to complete-case estimators and multiple imputation estimators when dealing with missing data. The simulation study also showed that the proposed doubly-robust estimator is robust to model-misspecification in either trial-participation model or outcome-generating model.

___________________________________________________________________________________________________________________________________