Databases
Dan Dauner, PharmD, MSPH (he/him/his)
Graduate Student
University of Minnesota
University of Minnesota
Duluth, Minnesota, United States
Eleazar Leal, PhD
Assistant Professor
University of Minnesota Duluth, Minnesota, United States
Terrence Adam, RPh, MD, PhD
Associate Professor
University of Minnesota, MN, United States
Rui Zhang, PhD
Associate Professor
University of Minnesota, MN, United States
Joel F. Farley, PhD
Professor
University of Minnesota
University of Minnesota College of Pharmacy
Minneapolis, Minnesota, United States
Background: Signal detection algorithms are used to measure disproportionality and identify signals in adverse drug event (ADE) databases. Traditionally the algorithms give the same weight to information from all products and patients, which may mask signals or produce false positives. Research has shown that logistic regression-based signal detection algorithms are superior to disproportionality analysis due to their ability to account for potential confounders and masking factors. Exploring features and additional machine learning algorithms can further strengthen signal detection.
Objectives: The aim of this study was to compare the signal detection performance of logistic regression, gradient-boosted trees, random forest, and support vector machine models utilizing US Food and Drug Administration Adverse Event Reporting System (FAERS) data.
Methods: The publicly available FAERS quarterly data extract files from October 1, 2017, through December 31, 2020, were utilized. The Demographic, Drug, Outcome, Reaction, Therapy, and Indication files were used in this study. The files were linked on the primary ID and drug sequence variable as appropriate. A deduplication procedure was performed by selecting the highest primary ID for each report. Only the primary suspect drug from a report was included in the analysis. Reports missing a primary suspect drug or not listing an ADE were excluded. Generic names were used to identify drugs, and all ADEs and drug names were converted to upper case text for standardization. Twelve features were included in the models. All numeric variables, except for the disproportionality measures, were standardized by subtracting the mean and dividing by the standard deviation. Due to an imbalanced outcome, two training sets were used: one stratified on the outcome variable and another using Synthetic Minority Oversampling Technique (SMOTE). Using Python, a crude model and a model with tuned hyperparameters (HPT) were developed, trained, and tested. Model performance was compared against a reference set using accuracy, precision, F1 score, recall, and the receiver operating characteristic area under the curve (ROCAUC).
Results: Models trained on the balanced training set had higher accuracy, F1 score, and recall compared to models trained on the SMOTE training set. The precision was similar for each model regardless of the training set used. When using the balanced training set, results were similar between the crude and HPT models. Overall, logistic regression had higher performance metrics (crude; HPT): accuracy (0.71; 0.71), F1 score (0.824; 0.816), precision (0.724; 0.741), and recall (0.955; 0.909). The range of ROCAUC values for models trained with the balanced training set was 0.154 (0.487, 0.641). The highest ROCAUC for each model was: logistic regression HPT (0.601), gradient-boosted trees crude (0.566), random forest crude (0.641), and support vector machine HPT (0.591).
Conclusions: All models trained on the balanced training set performed similarly according to the performance evaluation metrics. All models except the gradient-boosted tree HPT model had ROCAUC values > 0.5. Logistic regression models had higher accuracy, precision, and recall. Incorporating additional information from case reports and the disproportionality analysis results into the models resulted in higher performance evaluation metrics than disproportionality analysis alone.