01/14/2026
Imputation of missing data without understanding why it's missing is a great example of a lack of critical thinking. Missing data is not just empty cells that can be filled simply by inferring from the data that is present. When Indigenous Australians self-discharge from ICU at four times the rate of other patients, when Black patients with out-of-hospital cardiac arrest never make it into the datasets, when non-English speakers have their vital signs checked less frequently, these aren't statistical phenomenon to be imputed away. What's truly missing isn't the data itself. It's the context of how it came to be absent, the provenance of who collected it (or chose not to), under what conditions, with what biases, and for whom. We've obsessed over filling empty cells while ignoring the stories those empty cells tell. The real crisis isn't the missing data itself; it's our collective failure to ask why it's missing, to understand the systemic neglect that render certain lives less worthy of the care that generates data in the first place. No algorithm, however sophisticated, can rescue insights from datasets that fundamentally misrepresent reality.
This paper is a call for a factory reset in how we build AI models. Understanding data provenance means tracing backwards through the entire pipeline: which hospitals had the resources to store data comprehensively, which communities had fragmented care across institutions, which patients encountered language barriers that made documentation burdensome. The path forward isn't more clever imputation techniques: it's transforming the AI lifecycle to center context before computation, and to recognize that our databases don't just reflect health inequities, they reproduce them at scale. Until we reimagine AI systems as opportunities for repair rather than optimization, we'll continue building technologies that illuminate what we already know too well: that healthcare has never been equally accessible, and our algorithms are learning this lesson perfectly.
Author summary Healthcare data that is missing, incomplete, or inaccurately documented is often treated as a technical problem to be solved with statistical methods. We emphasize that this perspective overlooks the real issue: the data has been stripped of its context. Missing, incomplete, or inaccu...