Biostatistics Seminars

The Department of Biostatistics at the University of Michigan is proud to invite leading scholars from around the world to visit Ann Arbor to share their expertise, wisdom and experience. All are welcome to attend these seminars, which are held in-person.

Xiao-Hua Andrew Zhou

Endowed Distinguished Chair Professor of Biostatistics
Peking University

Learn more about this presenter

DATE: Thursday, September 5, 2024
TIME: 11:00 a.m.
LOCATION: SPH II, Room M1152

TITLE: Causal Inferences and Statistical Methods for Diagnostic Medicine

ABSTRACT: Two important areas in biostatistics are causal inference and statistical methods in diagnostic medicine. In this talk, I discuss some new developments in these two areas. Particularly, I discuss some new developments in the statistical methodology for making causal inference in clinical trials with concurrent events and machine learning. In addition, I give an overview on some new developments in statistical methods in the evaluation of the accuracy of medical devices.

TOPICS: To be announced


Volodymyr Minin

Professor of Statistics
University of California, Irvine

Learn more about this presenter

DATE: Thursday, September 12, 2024
TIME: 3:30 p.m.
LOCATION: SPH I, Room 1690

TITLE: Pathogen Transmission Inference, Nowcasting, and Forecasting Using Multiple Surveillance Data Steams

ABSTRACT: The area of statistical modeling of infectious disease dynamics is actively responding to the challenges and opportunities offered by the increasing abundance of relevant data from electronic surveillance systems, seroprevalence studies, genetic sequencing of pathogens, and wastewater sampling. Determining what combinations of data streams are optimal for particular inferential or forecasting tasks remains an open question. We describe our work in progress developing novel statistical methods to combine multiple surveillance data streams to improve both inference, including nowcasting, and forecasting of infectious disease dynamics. We furnish a series of semi-parametric Bayesian compartmental models and demonstrate that this class of models can effectively integrate passively collected time series of diagnostic tests, mortality data, seroprevalence data, and wastewater pathogen concentrations. Using retrospective inference of California COVID-19 data sets we evaluate the utility of each data stream in the context of nowcasting and short-term forecasting. Lastly, we focus on healthcare demand forecasting during epidemic surges of pathogen variants capable of immune escape. We incorporate time series of cases, hospitalizations, ICU admissions, deaths, and genetic sequence counts into a Bayesian model and show that using genetic information leads to superior forecasting performance, compared to traditional models.

TOPICS: Bayesian Statistics, COVID-19, Data Integration, Epidemiology and Public Health, Infectious Diseases


Jonathan Bradley

Associate Professor of Statistics
Florida State University

Learn more about this presenter

DATE: Thursday, September 19, 2024
TIME: 3:30 p.m.
LOCATION: SPH I, Room 1690

TITLE: Exact MCMC-free Bayesian Inference for Data of Any Size with Application to Joint Spatial Analysis of Fine Particulate Matter and Aerosol Optical Thickness

ABSTRACT: Fine particulate matter and aerosol optical thickness are two variables of interest to scientists for understanding air quality and its various health and environmental impacts. The available data on these variables are extremely large, making uncertainty quantification in a fully Bayesian framework quite difficult, as traditional implementations with Markov chain Monte Carlo (MCMC) and approximate Bayesian strategies do not scale to the size of the data. We specifically consider 8 million observations obtained from the National Aeronautics and Space Administration (NASA)'s Moderate Resolution Imaging Spectroradiometer (MODIS). To analyze this big dataset in an exact (i.e., without approximate Bayesian strategies) fully Bayesian context, we introduce Scalable Exact Posterior Regression (S-EPR) which combines two recently introduced methodologies: the data subset approach and exact posterior regression (EPR). The "data subset approach” is a new Bayesian method that assumes a parametric model for a low-dimensional training dataset and assumes the remaining holdout data follows its true non-parametric data-generating mechanism. The split into training and holdout is treated as a parameter in the Bayesian model, and it is shown that posterior samples of parameters from this model only use (at most) the low-dimensional training data, making Bayesian inference from this model scalable to arbitrary dimensions. We combine the data subset approach with another recently introduced Bayesian hierarchical model that introduces additional additive terms (called "discrepancy terms") to a spatial generalized linear mixed effects model. It is possible to sample independent replicates of the discrepancy term and fixed and random effects directly from the posterior without the use of MCMC or approximate Bayesian techniques. Samples from the posterior distribution have an efficient projection form, and hence, are referred to as EPR. The combination of the data subset approach with EPR allows one to perform exact Bayesian inference without MCMC for effectively any sample size in generalized linear mixed model contexts. We demonstrate our new S-EPR method using our motivating big remote sensing data application and provide several simulations. Our novel S-EPR approach provided dramatic computational gains over EPR, MCMC-based Bayesian implementation, and INLA, while simultaneously producing similar to better values of metrics that measure the quality of prediction and prediction uncertainty.

TOPICS: Bayesian Statistics, Environmental Health Sciences, High-Dimensional Data, Nonparametric / Semiparametric Modeling, Spatial Statistics


Bret Hanlon

Scientist III
University of Wisconsin, Madison

Learn more about this presenter

DATE: Thursday, September 26, 2024
TIME: 3:30 p.m.
LOCATION: SPH I, Room 1690

TITLE: Statistical support for data monitoring committees: the role of the independent biostatistics group

ABSTRACT: The University of Wisconsin Statistical Data Analysis Center (SDAC) promotes statistical practice, applications, and research in the design and analysis of clinical trials. SDAC serves as an independent biostatistics group (IBG) providing interim analyses of accumulating data from ongoing clinical trials for review by independent data monitoring committees (DMCs). We prioritize responsiveness to the DMC and flexibility in the face of changing requirements. Our reports are graphically based, allowing DMC members to easily identify differences between treatment groups and/or changes over time and to review a large amount of information in a short amount of time. In this talk, I describe the IBG’s role in industry-sponsored clinical trials. I highlight the challenge of balancing priorities from multiple stakeholders as well as the challenge of working with “dirty data,” e.g., interim clinical trial data can be missing, inconsistent, or incomplete. As an illustrative case study, I discuss supporting an endpoint trial that utilizes an event classification committee (ECC). Use of an ECC necessarily introduces a delay between the time that an event is reported and the time that the final classification is known. I provide recommendations for producing maximally informative summaries of the endpoint adjudication process for the DMC, incorporating clinical event data from all relevant sources of information.


Dylan Cable and Bingkai Wang

Dylan Cable & Bingkai Wang

Assistant Professors
University of Michigan

Learn more about the presenters:
Dylan Cable | Bingkai Wang

DATE: Thursday, October 3, 2024
TIME: 3:30 p.m.
LOCATION: SPH I, Room 1690

TITLE: Meet the New Michigan Biostatistics Faculty

Dylan Cable is an assistant professor in biostatistics at the University of Michigan. His research involves developing rigorous statistical modeling approaches for emerging high-throughput genomics technologies, such as spatial transcriptomics and single-cell RNA-sequencing. Dr. Cable completed his PhD in computer science at the Massachusetts Institute of Technology and a bachelors degree in mathematics at Stanford University. Dr. Cable is interested in the application of high-throughput genomics technologies to better understand human health and disease, as well as integration with clinical settings and drug discovery pipelines.

Bingkai Wang's research focuses on causal inference based on randomized trials, observational studies, and a combination of both. He is an expert in the design and analysis of complex clinical trials, including covariate-adaptive randomization, cluster/stepped-wedge trials, longitudinal studies, and test-negative designs. He is interested in leveraging modern tools, such as machine learning and large language models, to improve the practice of clinical studies. His recent contribution is developing model-robust and efficient causal inference methods for the above-mentioned trial designs.


John Neuhaus

Professor of Biostatistics and Epidemiology
University of California, San Francisco

Learn more about this presenter

DATE: Thursday, October 10, 2024
TIME: 3:30 p.m.
LOCATION: SPH I, Room 1690

TITLE: Improved prediction and flagging of extreme/unusual random effects for non-Gaussian outcomes using weighted methods

ABSTRACT: A common objective of investigators is to accurately predict extreme random effects from mixed effects models fitted to longitudinal and clustered data as well as to use predicted random effects to identify or “flag” extreme or outlying values such as poorly performing hospitals or patients with rapid declines in their health. Our recent work with Gaussian outcomes showed that weighted prediction methods can provide substantially reduced mean square error of prediction and appreciably higher correct flagging rates than previously proposed methods for flagging extreme values, while controlling the incorrect flagging rates. While many existing prediction approaches have focused on Gaussian outcomes, predicted random effects for binary, count and other non-Gaussian outcomes such as hospital readmission are often of interest. Closed-form expressions for predicted random effects and probabilities of correct and incorrect flagging are not available for the usual non-Gaussian outcomes and the computational challenges are significantly more complicated. Therefore, our results include the development of theory to support the implementation of algorithms to tune predictors that we call self-calibrated (which control the incorrect flagging rate using very simple flagging rules) and innovative numerical methods to calculate weighted predictors and evaluate their performance. Comprehensive numerical evaluation shows that the novel weighted predictors for non-Gaussian outcomes have substantially lower mean square error of prediction and considerably higher correct flagging rates than previously proposed methods for flagging extreme values, while controlling the incorrect flagging rates. We illustrate our new methods using data on emergency room readmissions for children with asthma.

TOPICS: Longitudinal / Correlated Data, Predicted random effects


2024 Rod Little lectureship

Amy Herring

Sara & Charles Ayres Distinguished Professor of Biostatistics
Duke University

Learn more about this presenter

DATE: Thursday, October 24, 2024
TIME: 4:30 p.m.
LOCATION: Rackham Graduate School Amphitheatre
A reception for all registered attendees will be held following the seminar in the adjacent Rackham Graduate School Assembly Hall.

TITLE: A Little Goes a Long Way: Why Things That Are Unseen Are More Important Than Ever

ABSTRACT: Missing data is a mature subfield of statistics due in large part to the defining contributions of Michigan’s own Professor Little. It has also aged very well, and the demand and need for principled solutions has never been greater. We will contemplate “things that are unseen” in a wide variety of contexts and consider the detailed example of the search for endotypes of sepsis, a life-threatening medical condition with high mortality worldwide.

RSVP FOR THE 2024 ROD LITTLE LECTURESHIP SEMINAR & RECEPTION


Yuanjia Wang

Professor of Biostatistics
Columbia University

Learn more about this presenter

DATE: Thursday, November 07, 2024
TIME: 3:30 p.m.
LOCATION: SPH I, Room 1690

TITLE: Machine Learning Approaches for Optimizing Treatment Strategies

ABSTRACT: Among currently available pharmacological and behavioral interventions for mental disorders, no single therapy is universally effective and treatment responses are far from adequate. As such, there is an urgent need to optimize treatment responses. Various factors are associated with positive treatment responses, providing evidence for improving response rate by incorporating patient-specific characteristics in treatment decisions to achieve precision psychiatry. However, individualized treatment decision making for mental disorders faces challenges of extensive diagnostic heterogeneity, substantial between-patient variation in biological and clinical disease manifestation, and mismatch between diagnostic categorization and the underlying pathophysiology. We propose novel machine learning methods to address emerging challenges through probabilistic generative models and reinforcement learning. We discuss several studies to discover reliable individualized treatment strategies that factor in a patient’s clinical, psychosocial, and biological markers, and integrate evidence from multi-domain data sources and multiple studies to increase generalizability and reproducibility. We will also discuss extensions to using real world data to improve decision making.

TOPICS: Machine Learning, Mental Health, Personalized Medicine, Precision Health


Chengchun Shi

Associate Professor of Statistics
London School of Economics and Political Science

Learn more about this presenter

DATE: Thursday, November 14, 2024
TIME: 3:30 p.m.
LOCATION: SPH I, Room 1690

TITLE: Optimal Design for A/B Testing in Two-sided Marketplaces

ABSTRACT: Time series experiments, in which experimental units receive a sequence of treatments over time, are prevalent in technological companies, including ride sharing platforms and trading companies. These companies frequently employ such experiments for A/B testing, to evaluate the performance of a newly developed policy, product, or treatment relative to a baseline control. Many existing solutions require that the experimental environment be fully observed to ensure the data collected satisfies the Markov assumption. This condition, however, is often violated in real-world scenarios. Such gap between theoretical assumptions and practical realities challenges the reliability of existing approaches and calls for more rigorous investigations of A/B testing procedures. In this paper, we study the optimal experimental design for A/B testing in partially observable environments. We introduce a controlled (vector) autoregressive moving average model to effectively capture a rich class of partially observable environments. Within this framework, we derive closed-form expressions, i.e., efficiency indicators, to assess the statistical efficiency of various sequential experimental designs in estimating the average treatment effect (ATE). A key innovation of our approach lies in the introduction of a weak signal assumption, which significantly simplifies the computation of the asymptotic mean squared errors of ATE estimators in time series experiments. We next proceed to develop two data-driven algorithms to estimate the optimal design: one utilizing constrained optimization, and the other employing reinforcement learning. We demonstrate the superior performance of our designs using a dispatch simulator and two real datasets from a ride-sharing company.


Leslie McClure

Dean of College for Public Health & Social Justice; Professor of Epidemiology & Biostatistics
St. Louis University

Learn more about this presenter

DATE: Thursday, November 21, 2024
TIME: 3:30 p.m.
LOCATION: SPH I, Room 1690

TITLE: Measuring Neighborhood Socioeconomic Status in the Diabetes LEAD Network

ABSTRACT: The Diabetes Location, Environmental Attributes, and Determinants (LEAD) Network was implemented to examine how community factors contribute to risk of Type 2 diabetes across 3 large cohorts. Over the course of the collaboration, we developed a set of Network aims, evaluated several different exposures, and examined associations between those exposures and new onset Type 2 diabetes. Of central importance to our scientific questions was the role of neighborhood socioeconomic status in new onset Type 2 diabetes. In this talk, I will describe the process by which our Network developed a measure of neighborhood socioeconomic status and how we evaluated its suitability to meet our goals. Throughout the talk, I will emphasize the importance and challenges of collaborative research and how collaborations create opportunities for growth as a statistician.

TOPICS: Data Integration, Diabetes, Epidemiology and Public Health, Spatial Statistics, Scientific Collaboration