1189: Natural Language Processing of Chest CT Reports as a Novel Method of Identifying RA-ILD in Real-World Data

Sunday, November 13, 2022

1:00 PM – 3:00 PM Eastern Time

Location: Virtual Poster Hall

Abstract Poster Presenter(s)

Brent Luedders, MD

University of Nebraska Medical Center
Omaha, NE, United States

Brent Luedders¹, Punyasha Roul², Yangyuna Yang¹, Brendan Cope¹, Matt DeVries¹, W. Scott Campbell¹, Daniel Hershberger¹, Jorge Rojas³, Grant Cannon⁴, Brian Sauer⁵, Joshua Baker⁶, Jeffrey Curtis⁷, Ted Mikuls⁸ and Bryant England¹, ¹University of Nebraska Medical Center, Omaha, NE, ²UNMC, Omaha, NE, ³George E. Wahlen Department of Veterans Affairs Salt Lake City Health Care System, Salt Lake City, UT, ⁴Retired, Salt Lake City, UT, ⁵Salt Lake City VA/University of Utah, Salt Lake City, UT, ⁶University of Pennsylvania, Philadelphia, PA, ⁷University of Alabama at Birmingham, Hoover, AL, ⁸Division of Rheumatology, University of Nebraska Medical Center, Omaha, NE

Background/Purpose: Algorithms incorporating diagnostic and procedural codes have recently been developed to identify rheumatoid arthritis-associated interstitial lung disease (RA-ILD) in administrative and electronic health record (EHR) data sets for research and clinical purposes. In a single-center EHR, we previously incorporated ILD-related terms from chest computed tomography (CT) reports to improve the positive predictive value (PPV) of such algorithms (Luedders et al., Arthritis Rheumatol [abstract] 2021; 73(suppl 10)). We aimed to externally validate this approach in real-world data collected from multiple centers.

Methods: We selected participants within the multicenter Veterans Affairs Rheumatoid Arthritis registry to undergo record review using stratified subsampling to enrich the sample with RA-ILD. Record review was performed in a standardized fashion to determine ILD status (reference standard). Administrative algorithms incorporating varying levels of diagnostic and procedural codes collected from linked administrative data were applied to the cohort (Table 1). Chest CT reports were obtained from a national data warehouse, and ILD-related terms were identified in these reports using automatic regular expressions (a natural language processing [NLP] technique). We subsequently added the requirement of an ILD-related term in the CT report to the administrative algorithms, with exclusion of ILD-related terms with negative modifiers within 40 characters of the term. Terms were considered not to be present if a CT read was not available. Algorithm performance was assessed by calculating the PPV and sensitivity, accounting for the sampling process.

Results: We studied 536 RA patients (93% male, mean age 7^th decade, 71% with available chest CT reports) from 12 centers, of which 203 had RA-ILD by the reference standard. The PPV of administrative algorithms alone improved with increasing algorithm requirements, ranging from 53.8% (algorithm 1) to 81.6% (algorithm 3) (Figure 1). Requiring only the presence of at least 1 ILD-related term from NLP of chest CT reports (Algorithm T) achieved a moderate sensitivity (75.2%) and PPV (63.8%). The addition of ILD-related terms improved the PPV of all administrative algorithms, with the greatest improvements occurring with algorithms that had fewer administrative data requirements (21.1% in algorithm 1 vs. 6.0% in algorithm 3). Combining administrative algorithms with stricter requirements and ILD-related terms from chest CT reports achieved the highest PPV (algorithm 4, 89.2%). Increases in PPV were accompanied by a decrease in sensitivity of a similar magnitude (range -3.9 to -19.5%).

Conclusion: The inclusion of ILD-related terms acquired from chest CT reports using NLP substantially improves the PPV of administrative-based RA-ILD algorithms, with accompanying decreases in sensitivity of a similar magnitude. These findings in real-world data collected from multiple centers externally validate prior work in a single-center EHR and support the application of these algorithms to identify RA-ILD patients for clinical and research purposes in various real-world data sources.

Figure 1. Positive predictive value (A) and sensitivity (B) of RA-ILD algorithms both without (blue) and with (orange) inclusion of ILD-related terms from chest CT reports.
Delta denotes the change in values between algorithms with/without ILD terms. Error bars indicate 95% confidence intervals. Abbreviations: RA-ILD = rheumatoid arthritis-associated interstitial lung disease, CT = computed tomography.
Disclosures: B. Luedders, None; P. Roul, None; Y. Yang, None; B. Cope, None; M. DeVries, None; W. Campbell, None; D. Hershberger, None; J. Rojas, None; G. Cannon, None; B. Sauer, None; J. Baker, Bristol-Myers Squibb(BMS), RediTrex, Pfizer; J. Curtis, AbbVie/Abbott, Amgen, ArthritisPower, Aqtual, Bendcare, Bristol-Myers Squibb(BMS), CorEvitas, FASTER, GlaxoSmithKlein(GSK), IlluminationHealth, Janssen, Labcorp, Eli Lilly, Myriad, Novartis, Pfizer, Sanofi, Scipher, Setpoint, UCB, United Rheumatology; T. Mikuls, Gilead Sciences, Bristol-Myers Squibb, Horizon, Sanofi, Pfizer Inc; B. England, Boehringer-Ingelheim.