University of Nebraska Medical Center Omaha, NE, United States
Brent Luedders1, Punyasha Roul2, Yangyuna Yang1, Brendan Cope1, Matt DeVries1, W. Scott Campbell1, Daniel Hershberger1, Jorge Rojas3, Grant Cannon4, Brian Sauer5, Joshua Baker6, Jeffrey Curtis7, Ted Mikuls8 and Bryant England1, 1University of Nebraska Medical Center, Omaha, NE, 2UNMC, Omaha, NE, 3George E. Wahlen Department of Veterans Affairs Salt Lake City Health Care System, Salt Lake City, UT, 4Retired, Salt Lake City, UT, 5Salt Lake City VA/University of Utah, Salt Lake City, UT, 6University of Pennsylvania, Philadelphia, PA, 7University of Alabama at Birmingham, Hoover, AL, 8Division of Rheumatology, University of Nebraska Medical Center, Omaha, NE
Background/Purpose: Algorithms incorporating diagnostic and procedural codes have recently been developed to identify rheumatoid arthritis-associated interstitial lung disease (RA-ILD) in administrative and electronic health record (EHR) data sets for research and clinical purposes. In a single-center EHR, we previously incorporated ILD-related terms from chest computed tomography (CT) reports to improve the positive predictive value (PPV) of such algorithms (Luedders et al., Arthritis Rheumatol [abstract] 2021; 73(suppl 10)). We aimed to externally validate this approach in real-world data collected from multiple centers.
Methods: We selected participants within the multicenter Veterans Affairs Rheumatoid Arthritis registry to undergo record review using stratified subsampling to enrich the sample with RA-ILD. Record review was performed in a standardized fashion to determine ILD status (reference standard). Administrative algorithms incorporating varying levels of diagnostic and procedural codes collected from linked administrative data were applied to the cohort (Table 1). Chest CT reports were obtained from a national data warehouse, and ILD-related terms were identified in these reports using automatic regular expressions (a natural language processing [NLP] technique). We subsequently added the requirement of an ILD-related term in the CT report to the administrative algorithms, with exclusion of ILD-related terms with negative modifiers within 40 characters of the term. Terms were considered not to be present if a CT read was not available. Algorithm performance was assessed by calculating the PPV and sensitivity, accounting for the sampling process.
Results: We studied 536 RA patients (93% male, mean age 7th decade, 71% with available chest CT reports) from 12 centers, of which 203 had RA-ILD by the reference standard. The PPV of administrative algorithms alone improved with increasing algorithm requirements, ranging from 53.8% (algorithm 1) to 81.6% (algorithm 3) (Figure 1). Requiring only the presence of at least 1 ILD-related term from NLP of chest CT reports (Algorithm T) achieved a moderate sensitivity (75.2%) and PPV (63.8%). The addition of ILD-related terms improved the PPV of all administrative algorithms, with the greatest improvements occurring with algorithms that had fewer administrative data requirements (21.1% in algorithm 1 vs. 6.0% in algorithm 3). Combining administrative algorithms with stricter requirements and ILD-related terms from chest CT reports achieved the highest PPV (algorithm 4, 89.2%). Increases in PPV were accompanied by a decrease in sensitivity of a similar magnitude (range -3.9 to -19.5%).
Conclusion: The inclusion of ILD-related terms acquired from chest CT reports using NLP substantially improves the PPV of administrative-based RA-ILD algorithms, with accompanying decreases in sensitivity of a similar magnitude. These findings in real-world data collected from multiple centers externally validate prior work in a single-center EHR and support the application of these algorithms to identify RA-ILD patients for clinical and research purposes in various real-world data sources.
Figure 1. Positive predictive value (A) and sensitivity (B) of RA-ILD algorithms both without (blue) and with (orange) inclusion of ILD-related terms from chest CT reports. Delta denotes the change in values between algorithms with/without ILD terms. Error bars indicate 95% confidence intervals. Abbreviations: RA-ILD = rheumatoid arthritis-associated interstitial lung disease, CT = computed tomography. Disclosures: B. Luedders, None; P. Roul, None; Y. Yang, None; B. Cope, None; M. DeVries, None; W. Campbell, None; D. Hershberger, None; J. Rojas, None; G. Cannon, None; B. Sauer, None; J. Baker, Bristol-Myers Squibb(BMS), RediTrex, Pfizer; J. Curtis, AbbVie/Abbott, Amgen, ArthritisPower, Aqtual, Bendcare, Bristol-Myers Squibb(BMS), CorEvitas, FASTER, GlaxoSmithKlein(GSK), IlluminationHealth, Janssen, Labcorp, Eli Lilly, Myriad, Novartis, Pfizer, Sanofi, Scipher, Setpoint, UCB, United Rheumatology; T. Mikuls, Gilead Sciences, Bristol-Myers Squibb, Horizon, Sanofi, Pfizer Inc; B. England, Boehringer-Ingelheim.