Alexandru Garaiman1, Carina Mihai1, Rucsandra Dobrota1, cosimo bruni2, Muriel Elhai1, Suzana Jordan1, Lea Stamm3, Anna-Maria Hoffmann-vold4, Oliver Distler1 and Mike Becker1, 1Department of Rheumatology, University Hospital Zurich, University of Zurich, Zürich, Switzerland, 2University of Florence, Florence, Italy, 3Department of Rheumatology, University Hospital Zurich, Zürich, Switzerland, 4Oslo University Hospital, Oslo, Norway
Background/Purpose: Some gastrointestinal (GI) and nutritional factors are associated with the presence of interstitial lung disease related to systemic sclerosis (SSc-ILD). However, there are many unexplored GI risk factors for the presence of SSc-ILD which could be potentially revealed by machine learning algorithms (ML). Therefore, the aim of our study was to identify GI related risk factors for the presence of SSc-ILD using ML based on decision trees (DT).
Methods: Data of the last follow-up visit from consecutive patients fulfilling the 2013 ACR/EULAR SSc classification criteria recorded in our local EUSTAR registry was analyzed.
The study outcome was the presence of SSc-ILD on high-resolution computed tomography.
Two sets of predictors were identified based on their potential association with GI. The first set contained the following variables available in the EUSTAR registry: esophageal symptoms (dysphagia and reflux), stomach symptoms (early satiety, vomiting), intestinal symptoms (diarrhea, bloating and constipation), malabsorption syndrome, body mass index (BMI) and proton pump inhibitor (PPI) therapy and calcium channel blocker (CCB) therapy. In the second set, we replaced the first three EUSTAR variables of the first set with the scales of the UCLA Gastrointestinal Tract Questionnaire 2.0 (UCLA-GIT). Of these two sets, the most important variable was selected using three different DT-based algorithms: recursive partitioning and regression trees (RPART), random forest (RF), and gradient boosting machines (GBM).
The selected variables were eventually integrated with established predictors for presence of SSc-ILD (diffuse cutaneous subset, anti-Scl-70 positivity, male sex, forced vital capacity [FVC%] and diffusion capacity of the lung for carbon monoxide-single breath [DLCO-SB%]) into final prediction models for SSc-ILD using RPART, RF and GBM respectively. Their performance was evaluated by C-statistics. The importance of the newly detected predictor was assessed by variable importance plots (VIPs).
Results: We included in our study 334 patients. The median age was 61 [IQR: 50-69] years, 59 (17.7%) were males and 266 (79.6%) had limited cutaneous SSc. Median BMI was 23 [IQR: 21-26] kg/m2, 133 (39.8%) of the patients had SSc-ILD, median FVC% 93 [IQR: 81-105] and DLCO-SB% 72.5 [56-84]. Of the UCLA-GIT scales the highest score was for the distension/bloating with a value of 0.50 [IQR: 0-1.24]. Regarding medications, 167 (50%) patients were exposed to PPI and 39 (11.7%) to CCB.
The BMI was deemed by all three algorithms as the most important predictor of SSc-ILD among both sets of GI related variables (Fig. 1A-F). The final model, which included established risk factors for presence of ILD and the BMI, supported the importance of BMI in predicting the SSc-ILD (Fig 1 G-I). The VIPs obtained by GBM ranked the BMI as the most important predictor. A lower BMI was associated with presence of SSc-ILD (C-statistics for the RPART, RF and GBM models were 0.79, 0.70 and 0.76, respectively, corresponding to a fair accuracy).
Conclusion: A lower body mass index is a novel promising factor signalizing the presence of SSc-ILD and might helpful for enrichment for future clinical trials design. Feature selection by RPART, RF and GBM from the two list of predictors. Panels A-C, feature selection from the EUSTAR predictors by RPART, RF and GBM respectively. Panels D-F, feature selection from the UCLA-GIT predictors by RPART, RF and GBM respectively. Variable importance in the final model (panels G-I) Abbreviations: BMI, body mass index; CCB, calcium channel blocker; DLCO/SB, single-breath diffusing capacity of the lung for CO; FVC%, forced vital capacity predicted; GBM, gradient boosting machines; mRSS, modified Rodnan skin score; PPI, proton pump inhibitor; RPART, Recursive Partitioning Trees, RF, Random Forest; SSc, systemic sclerosis, SSc-ILD, interstitial lung disease related to systemic sclerosis; UCLA-GIT, UCLA Scleroderma Clinical Trials Consortium Gastrointestinal Tract 2.0. Disclosures: A. Garaiman, None; C. Mihai, Boehringer-Ingelheim, Mepha, MEDTalks, Roche, Janssen; R. Dobrota, Pfizer, Actelion, Boehringer-Ingelheim; c. bruni, Boehringer-Ingelheim, Eli Lilly; M. Elhai, Bristol-Myers Squibb(BMS), Janssen; S. Jordan, None; L. Stamm, None; A. Hoffmann-vold, Boehringer-Ingelheim, Actelion, Janssen, Roche, Merck/MSD, ARXX Therapeutics, Eli Lilly, Medscape, EUSTAR, EULAR, ACR, ERS; O. Distler, AbbVie/Abbott, Amgen, GlaxoSmithKlein(GSK), Novartis, Roche, UCB, Kymera, Mitsubishi Tanabe, Boehringer Ingelheim, 4P-Pharma, Acceleron, Alcimed, Altavant Sciences, AnaMar, Arxx, AstraZeneca, Blade Therapeutics, Bayer, Corbus Pharmaceuticals, CSL Behring, Galapagos, Glenmark, Horizon, Inventiva, Lupin, Miltenyi Biotec, Merck/MSD, Prometheus Biosciences, Redx Pharma, Roivant, Sanofi, Topadur, Pfizer, Janssen, Medscape, Patent issued “mir-29 for the treatment of systemic sclerosis” (US8247389, EP2331143), FOREUM Foundation, ERS/EULAR Guidelines, EUSTAR, SCQM (Swiss Clinical Quality Management in Rheumatic Diseases), Swiss Academy of Medical Sciences (SAMW), Hartmann Müller Foundation; M. Becker, Amgen, Bayer, Mepha, Novartis, Vifor, Novartis Foundation.