Introduction: Hydronephrosis (HN) is one of the most frequently detected congenital anomalies in pediatric urology. It can represent a spectrum of etiologies - from a benign transient state to a serious structural issue requiring antibiotic prophylaxis and/or surgical correction. Current management is driven by various classification systems used to interpret and grade renal ultrasound (US) images. However, these systems are inherently subjective and have been found to have low inter-rater reliability. Machine learning may be a means to develop more objective algorithms to classify anomalies and possibly predict outcomes. This pilot study aims to show that machine learning can determine low-grade (LG) from high-grade (HG) HN.
Methods: US images from patients ages 0-8 diagnosed with HN at a single institution were reviewed and collected under a HIPAA compliant and IRB approved protocol. Patients were excluded if they had less than 2 studies prior to surgical intervention, or if they had structural abnormalities, including ectopic ureter, duplicated system, ureterocele, multicystic dysplastic kidney, posterior urethral valves, neurogenic bladder, and prune belly syndrome. US studies were graded according to the Society for Fetal Urology system by a pediatric radiologist. LG and HG HN were defined as SFU grades I-II and III-IV, respectively. In total, 74 patients (202 images) were identified. Each kidney was outlined by a urology resident and reviewed by an attending urologist with consensus in the transverse and longitudinal view. Twenty-five radiomic features were automatically extracted from and subsequently used in the task of distinguishing LG from HG HN with a support-vector machine (SVM) as classifier. Receiver operating characteristic analysis (ROC) was conducted to assess the performance of each individual radiomic feature and the SVM classifier, with the area under the ROC curve (AUC) serving as the figure of merit in the classification task.
Results: From ROC analysis, an AUC value of 0.79 (se=0.05) was obtained in a round-robin evaluation method in the task of distinguishing between LG and HG HN. In addition, from a Mann-Kendal Trend Test between SVM output and clinical HN grade, a statistically significant upward trend was found (p-value < 0.0001).
Conclusions: In this pilot study, machine learning demonstrated potential in distinguishing between LG and HG HN, showing a statistically significant relationship with clinical HN grade.
Source of Funding: Institute for Translational Medicine Core Subsidy Award