Department of Biostatistics at Columbia University

Department of Biostatistics at Columbia University Biostatistics is the science of developing and applying statistical methods for quantitative studies in biomedicine, health, and population sciences.

As one of the nation’s premier centers of biostatistical research pertaining to clinical trials, brain imaging, cancer, mental health, and more, the Department of Biostatistics at Columbia’s Mailman School offers students a myriad of opportunities for advanced study. Faculty in the Department of Biostatistics work at the frontier of public health, leading research teams that investigate some of today’s most pressing health issues. Recruited from the top universities from around the world, the faculty bring to the School a wealth of experience that serves to inform their research and teaching.

This Thursday, October 30th, Bibhas Chakraborty, PhD of Duke-NUS Medical School will present a Levin Lecture on “Innovat...
10/27/2025

This Thursday, October 30th, Bibhas Chakraborty, PhD of Duke-NUS Medical School will present a Levin Lecture on “Innovative Trial Designs in Mobile Health Using Reinforcement Learning” from 9:00am - 10:15am over Zoom. You can find the link on the Fall 2025 Departmental Lectures page. All are welcome to come learn with us!

Abstract:
Multi-site national and international imaging consortia have formed with the goal of precisely characterizing the human brain across the lifespan. These consortia have succeeded in collecting large samples of brain magnetic resonance imaging (MRI) scans to estimate sex-specific trajectories of brain phenotypes across age, often called brain charts. The promise of brain charts is that future researchers and clinicians will be able to assess a new scan for deviations from this healthy trajectory. However, the implementation of these charts in practice is severely limited by differences across study sites, also known as site effects. Here, we first discuss several projects in harmonization of MRI data specifically tailored to this normative modeling setting. Then, we leverage advancements in model uncertainty quantification to propose new ways to calibrate brain charts, as an alternative to harmonizing data. Finally, we apply our approaches to the Lifespan Brain Chart Consortium (LBCC) to assess generalizability to new scans from both healthy individuals and Alzheimer's disease (AD) patients. Based on our findings, we provide methodological recommendations for applying fitted brain charts to new sites.

This Thursday, October 23rd, GuanNan Wang, PhD of the College of William & Mary will present a Levin Lecture on “Boostin...
10/21/2025

This Thursday, October 23rd, GuanNan Wang, PhD of the College of William & Mary will present a Levin Lecture on “Boosting Biomedical Imaging Analysis via Distributed Functional Regression and Synthetic Surrogates” from 11:45am - 1:00pm over Zoom. You can find the link on the Fall 2025 Departmental Lectures page. All are welcome to come learn with us!

Abstract:
Understanding how scalar covariates influence spatial patterns in medical imaging data, such as neuroimaging or organ-level functional images, is a central challenge in modern biomedical research. The rapid expansion of large-scale imaging studies has heightened the need for statistical frameworks that are both interpretable and computationally scalable. In this talk, I will introduce a new class of domain-aware functional regression models, where spatially varying coefficients link scalar predictors to imaging responses defined over complex 3D domains. Our Distributed Image-on-Scalar Regression framework employs a triangulation-based domain decomposition strategy, enabling efficient parallel estimation with trivariate penalized splines. This design preserves global spatial structure while flexibly accommodating subregion-specific heterogeneity. To address additional challenges posed by incomplete or noisy imaging data, I will also discuss the use of synthetic surrogates generated with modern AI tools. Rather than imputing missing values directly, these synthetic surrogates can serve as auxiliary data that can be jointly analyzed with observed images, improving efficiency while maintaining robustness to imputation error. Together, these advances pave the way for scalable, uncertainty-aware statistical analysis of high-dimensional biomedical imaging.

This Thursday, October 16th, Andrew An Chen, PhD of the Medical University of South Carolina will present a Levin Lectur...
10/14/2025

This Thursday, October 16th, Andrew An Chen, PhD of the Medical University of South Carolina will present a Levin Lecture on “Methodological Considerations in Applying Brain Charts to New Samples” from 11:45am - 1:00pm in person in ARB Hess Commons. All are welcome to come learn with us!

Abstract:
Multi-site national and international imaging consortia have formed with the goal of precisely characterizing the human brain across the lifespan. These consortia have succeeded in collecting large samples of brain magnetic resonance imaging (MRI) scans to estimate sex-specific trajectories of brain phenotypes across age, often called brain charts. The promise of brain charts is that future researchers and clinicians will be able to assess a new scan for deviations from this healthy trajectory. However, the implementation of these charts in practice is severely limited by differences across study sites, also known as site effects. Here, we first discuss several projects in harmonization of MRI data specifically tailored to this normative modeling setting. Then, we leverage advancements in model uncertainty quantification to propose new ways to calibrate brain charts, as an alternative to harmonizing data. Finally, we apply our approaches to the Lifespan Brain Chart Consortium (LBCC) to assess generalizability to new scans from both healthy individuals and Alzheimer's disease (AD) patients. Based on our findings, we provide methodological recommendations for applying fitted brain charts to new sites.

This Thursday, October 9th, KC Gary Chan, PhD of the University of Washington School of Public Health will present a Lev...
10/07/2025

This Thursday, October 9th, KC Gary Chan, PhD of the University of Washington School of Public Health will present a Levin Lecture on “Robust and efficient semiparametric inference for the stepped wedge design” from 11:45am - 1:00pm over Zoom. You can find the link on the Fall 2025 Departmental Lectures page. All are welcome to come learn with us!

Abstract:
Stepped wedge designs (SWDs) are increasingly used to evaluate longitudinal cluster-level interventions but pose substantial challenges for valid inference. Because crossover times are randomized, intervention effects are intrinsically confounded with secular time trends, while heterogeneous cluster effects, complex correlation structures, baseline covariate imbalances, and unreliable standard errors from few clusters further complicate statistical inference. We propose a unified semiparametric framework for estimating possibly time-varying intervention effects in SWDs that directly addresses these issues. A nonstandard development of semiparametric efficiency theory is required to accommodate correlated observations within clusters, non-identically distributed outcomes across clusters due to varying cluster-period sizes, and weakly dependent treatment assignments that are hallmarks of SWDs. The resulting estimator of treatment contrast is consistent and asymptotically normal even under misspecification of the covariance structure and control cluster-period means, and achieves the semiparametric efficiency bound when both are correctly specified. To facilitate inference for trials with few clusters, we introduce a permutation-based procedure to better capture finite-sample variability and a leave-one-out correction to mitigate plug-in bias. We further discuss how effect modification can be naturally incorporated, and imbalanced precision variables can be accommodated via a simple adjustment closely related to post-stratification, a novel connection of independent interest. Simulations and application to a public health trial demonstrate the robustness and efficiency of the proposed method relative to standard approaches.

We are excited to invite our Biostatistics students, faculty, and staff to a Pumpkin Painting Event on Wednesday, Octobe...
10/01/2025

We are excited to invite our Biostatistics students, faculty, and staff to a Pumpkin Painting Event on Wednesday, October 8th, from 4:00 PM to 5:00 PM in the ARB 6th Floor Lobby. This event will be a fantastic opportunity to unleash your creativity, enjoy some seasonal fun, and connect with fellow students.

All materials will be provided, and no prior painting experience is necessary. Just bring your enthusiasm and a smile!

Please RSVP by October 3rd to ensure we have enough supplies for everyone via the link sent to your Columbia email.

We look forward to seeing you there and celebrating the autumn season together!

This Thursday, October 2nd, Natalie Dean, PhD  of Emory University Rollins School of Public Health will present a Levin ...
09/30/2025

This Thursday, October 2nd, Natalie Dean, PhD of Emory University Rollins School of Public Health will present a Levin Lecture on “Challenges in Estimating Vaccine Effectiveness Against Progression to Severe Disease” from 11:45am - 1:00pm over Zoom. You can find the link on the Fall 2025 Departmental Lectures page. All are welcome to come learn with us!

Abstract:
Vaccines can reduce an individual’s risk of infection and their risk of progression to disease given infection. The latter effect is less commonly estimated but is relevant for risk communication and vaccine impact modeling. Using a motivating example from the COVID-19 literature, we note how vaccine effectiveness against progression can appear to increase over time in settings where true biological strengthening is unlikely. We use mathematical modeling to demonstrate how this phenomenon can occur when there is an underlying vulnerable subpopulation with poor vaccine response against infection and progression. As a result, the earliest infections are among those with the weakest protection against disease. We describe a modeling framework to link underlying immunology and post-vaccination outcomes that we use to further examine this problem. This work highlights methodological challenges in isolating a vaccine’s effect on progression to severe disease after infection.

Next week is the start of a new month and that means more Departmental Levin Lectures! You can view all the abstracts, u...
09/26/2025

Next week is the start of a new month and that means more Departmental Levin Lectures!

You can view all the abstracts, upcoming lectures in Fall 2025, and find Zoom links on our Departmental Lectures webpage. We hope to see you there!

Thursday, October 2nd: Natalie Dean, PhD - “Challenges in Estimating Vaccine Effectiveness Against Progression to Severe Disease”

Thursday, October 9th: KC Gary Chan, PhD - “Robust and Efficient Semiparametric Inference for the Stepped Wedge Design”

Thursday, October 16th*: Andrew An Chen, PhD - “Methodological Considerations in Applying Brain Charts to New Samples” (*This lecture will be in-person only in ARB Hess Commons)

Thursday, October 23rd: GuanNan Wang, PhD - "Boosting Biomedical Imaging Analysis via Distributed Functional Regression and Synthetic Surrogates"

Thursday, October 30th*: Bibhas Chakraborty, PhD - "Innovative Trial Designs in Mobile Health Using Reinforcement Learning" (*This lecture will begin at 9am EST)

We are excited for the first FDAWG (Functional Data Analysis Working Group) Meeting today! Join us from 4 - 5pm on Zoom ...
09/23/2025

We are excited for the first FDAWG (Functional Data Analysis Working Group) Meeting today! Join us from 4 - 5pm on Zoom or in ARB room 627 to hear Dr. Johan Vagelius of Uppsala University give a talk titled, "Functional mixed models for time-dependent PET”

Abstract:
The simplified reference tissue model (SRTM) is widely used for PET receptor quantification but assumes constant kinetic parameters. Existing time-varying extensions often impose fixed response shapes or rely on voxelwise fits that forego hierarchical pooling. We propose a functional mixed-effects formulation that models the apparent efflux as a smooth function of time, Image, decomposed into group-level smooths (fixed functional effects) and subject-specific smooth deviations (random functions). Using a common time grid and a Gaussian-process kernel, we formulate the SRTM in a function-on-scalar mixed model that supports direct inference on group differences in Image with principled uncertainty quantification.
We place a Gaussian prior on the fixed-effects coefficients and integrate out the random functional effects to obtain a marginal likelihood. Conditioning on variance/smoothing hyperparameters, the posterior of the fixed effects is Gaussian. Computationally, inference requires inversion of only Image matrices, where Image and Image are the number of covariates and the time-grid size, respectively, rather than the full data covariance. This enables scalable estimation of time-dependent PET processes.

This Thursday, September 18th, Jingyi Jessica Li, PhD will present a Levin Lecture on “Nullstrap: A Simple, High-Power, ...
09/16/2025

This Thursday, September 18th, Jingyi Jessica Li, PhD will present a Levin Lecture on “Nullstrap: A Simple, High-Power, and Fast Framework for FDR Control in Variable Selection for Diverse High-Dimensional Models” from 11:45am - 1:00pm over Zoom. Find the zoom link via the Fall 2025 Departmental Lectures page on our website. All are welcome to come learn with us!

Abstract:
Balancing false discovery rate (FDR) control with high statistical power is a central challenge in high-dimensional variable selection. Existing methods often degrade data through knockoffs or splitting, leading to power loss. We propose Nullstrap, a framework that con- trols FDR without altering the original data. Nullstrap generates synthetic null data by fitting a null model under the null hypothesis and applies the same estimation to both original and synthetic datasets. This parallel structure resembles the likelihood ratio test, serving as its numerical analog. A data-driven correction procedure adjusts null estimates, enabling variable selection with theoretical guarantees: asymptotic FDR control at any desired level and power converging to one. Nullstrap is fast, stable, and broadly applicable across linear, generalized linear, Cox, and graphical models. Simulations indicate that Nullstrap maintains robust FDR control and outperforms the knockoff filter and data splitting in power (0.95 vs. 0.50 and 0.70) and efficiency (≈ 30×). While all three methods are randomized, Nullstrap is more stable (Jaccard 0.98 vs. 0 and 0.42). In a triple-omics time-to-labor dataset, the knockoff filter and data splitting fail to identify variables in most of 70 runs with different random seeds, whereas Nullstrap consistently selects predictors, achieves > 90% predictive accuracy, and is three orders of magnitude faster.

This Thursday, September 11th, Bingxin Zhao, PhD will present a Levin Lecture on “Resampling-based Pseudo-training in Ge...
09/09/2025

This Thursday, September 11th, Bingxin Zhao, PhD will present a Levin Lecture on “Resampling-based Pseudo-training in Genomic Predictions” from 11:45am - 1:00pm in the ARB 8th Floor Auditorium. This week, the lecture is in-person only. All are welcome to come learn with us!

Abstract:
In this talk, I will present a resampling-based pseudo-training framework for genomic prediction that enables model development using only summary-level data. We show that generating pseudo-training and validation statistics from summary results achieves asymptotic equivalence to conventional training while avoiding the need for individual-level datasets. Simulations and real data applications suggest that pseudo-training performs comparably to standard approaches with large datasets and substantially better when tuning data are limited. We highlight two platforms built on this framework: PennPRS (https://pennprs.org/), a cloud-based computing infrastructure supporting large-scale, no-code polygenic risk score training with purely summary data resources, and GCB-Hub (https://www.gcbhub.org/), which applies pseudo-training to proteome-wide association studies for protein-disease mapping and drug discovery. Together, these advances demonstrate how resampling-based pseudo-training methods can broaden accessibility, scalability, and impact of genomic prediction across diverse biomedical research settings.

Happy start of the fall semester! We’re celebrating with the first T-Time this Wednesday, September 11th, from 4-5pm. Co...
09/08/2025

Happy start of the fall semester! We’re celebrating with the first T-Time this Wednesday, September 11th, from 4-5pm.

Come hang out in the department and meet your fellow classmates & faculty! Whether you're a returning student or brand new to campus, this if your chance to connect and set the tone for an amazing semester ahead!

All Biostatistics Faculty, staff, and students are welcome and encouraged to come.

We were excited to welcome our newest students for the first week of classes of Fall 2025! Here's to a great semester!
09/05/2025

We were excited to welcome our newest students for the first week of classes of Fall 2025! Here's to a great semester!

Address

722 W 168th Street
New York, NY
10032

Alerts

Be the first to know and let us send you an email when Department of Biostatistics at Columbia University posts news and promotions. Your email address will not be used for any other purpose, and you can unsubscribe at any time.

Share

Share on Facebook Share on Twitter Share on LinkedIn
Share on Pinterest Share on Reddit Share via Email
Share on WhatsApp Share on Instagram Share on Telegram