– Sr Data Scientist, Flatiron Health, New York, New York, United States
Background: Electronic health record (EHR)-derived databases are commonly subject to left truncation, a type of selection bias due to the requirement of patients surviving long enough to meet certain entry criteria. Standard methods to adjust for left truncation in survival analyses rely on an assumption of marginal independence between entry and survival times, which may not always be true in practice.
Objectives: To illustrate novel methodology for unbiased estimation of common survival parameters, under a weaker assumption of conditionally independent left truncation.
Methods: We describe conditionally independent left truncation, a property easily testable from observed truncated data. We then show how this assumption leads to the estimability of conditional parameters in a truncated dataset, and of marginal parameters that leverage reference data containing non-truncated data on confounders. The latter uses a weighting approach that is complementary to observational causal inference methodology applied to real world external comparators. This yields estimation of marginal hazard ratios and survival distributions. We implement our proposed methodology to estimate these parameters in simulation studies of left truncated data where entry and survival times are marginally dependent, but independent conditional on measured confounders. We also illustrate estimation of the marginal survival distribution for patients with metastatic prostate cancer in a real world clinico-genomic database.
Results: Our simulation studies show unbiased estimation and valid confidence interval coverage for conditional hazard ratios, marginal hazard ratios, and marginal survival distributions. In contrast, standard methods for left truncation show heavy bias and lack of coverage given a confounded association between survival and entry times. In our application, we found evidence of conditionally independent left truncation among a cohort of patients with metastatic prostate cancer who received comprehensive genomic profiling. A standard risk set adjusted Kaplan-Meier analysis estimates a median survival time of 17.8 months (95% CI [16.5, 19.3]) after first-line therapy. Using a non-truncated dataset, we estimate density ratio weights to account for the confounders identified. A subsequent weighted analysis estimates a median survival time of 16.5 months (95% CI [15.1, 18.1]), indicating an overestimate of survival when conditionally independent left truncation is not properly accounted for.
Conclusions: We developed a novel approach that relaxes the necessary assumptions for valid survival estimation and inference under left truncation, allowing for a broader range of analyses to be conducted.