MP03-13: Identification of Muscle-Invasion Status in Bladder Cancer Patients Using Natural Language Processing and Machine Learning
Friday, May 13, 2022
7:00 AM – 8:15 AM
Location: Room 222
Ruixin Yang, Di Zhu, Lauren Howarda, Amanda De Hoedt, Durham, NC, Zachary Klaassen, Augusta, GA, Stephen Freedland, Los Angeles, CA, Stephen Williams, Galveston, TX, Ruixin Yang*, Durham, NC
Introduction: Mortality from bladder cancer (BC) increases exponentially once it invades the muscle. Correctly identifying muscle-invasion is critical; however, accurate and efficient delineation of these patients at the population level presents a challenge. We developed a natural language processing (NLP) model for automatically identifying muscle-invasive BC (MIBC) patients at the population level and validated the model’s performance against our gold standard of chart reviewed data. The model was then applied to a nationwide Veterans Affairs (VA) cohort to estimate the number of patients with MIBC and non-muscle invasive BC (NMIBC).
Methods: We developed and validated the NLP model using 2,337 full-text clinic reports on 600 unique patients who had undergone transurethral resection of bladder tumor (TURBT). Each report was classified as NMIBC, MIBC, no cancer, or unknown. Results from the NLP model were compared to hand-abstracted invasion status at both report and patient level, with the worst case applied. We subsequently applied the NLP model to patients who received TURBT in the VA system; a nationwide cohort of 71,200 patients with 978,837 reports.
Results: The NLP model had an overall accuracy of 88% and 92% at the report and patient levels, respectively. The positive predictive value and specificity for predicting a positive report result were 83% and 70%, respectively. Although the sensitivity for MIBC was 70% at the report-level, the sensitivity was 86% when evaluating at the patient level, where all reports were taken into consideration with the worst case applied. The model had a very good performance in classification of NMIBC with 98% sensitivity at both the report and patient levels. The model was finally applied to the whole VA TURBT cohort and was able to identify invasion status for 96% patients with TURBT at the population level. 13,642 MIBC (19%) and 47,595 NMIBC (66%) patients were identified when evaluating 71,200 TURBT patients.
Conclusions: We developed a novel highly accurate NLP model for identifying muscle-invasion from BC at the population-level. After further validation, this NLP model may be a practical and accurate tool for rapidly identifying BC invasion status thereby aiding population-based BC research.
Source of Funding: DOD Career Development Award (W81XWH1710576)