Purpose: To evaluate the benefits of adding temporal context information to an artificial intelligence (AI) system for breast cancer detection.
Materials and Methods: A dataset of 2D screening mammograms was retrospectively collected from 8,087 patients from one US site and one French site acquired between 2006 and 2019. Only examinations with at least one prior available within 6 to 30 months were kept. Cancer-positive examinations were confirmed by biopsy/surgery within 18 months after the acquisition, while negatives were confirmed by a 2 years negative follow-up. A further selection was done to keep only cases deemed normal/benign by the reader who had initially read the mammogram but who’s prior was deemed suspicious (i.e., cases assessed as stable after the initial recall). Final selection consisted in 509 positive and 355 negative cases. The newly developed AI system (temporal AI) is able to output a level of suspicion from a pair of mammograms and its prior. Predictions of temporal AI were compared with those of the reference AI (i.e., the commercially available version of the system). Comparison was done in terms of sensitivity at 100% Positive Predictive Value (PPV), area under the ROC curve (AUC), area under the precision-recall curve (AUCPRC), specificity at a fixed value of sensitivity, false positive rate reduction and F-score.
Results: The temporal AI showed better performance compared to the reference AI. Differences in estimated metrics are reported on Figure 1 where an overall trend toward superiority is visible. Recall at 100% precision was 8.4% (CI: 6.8% to 17.8%) for the reference AI and 19.7% (CI: 17.4% to 38.1%) for the temporal AI (Figure 2). The AUCPRC increased from 81.6% (CI: 78.23% to 85.2%) to 84.4% (CI: 81.81% to 87.51%). Specificity at 75% of sensitivity was 53.2% (CI: 44.5% to 63.4%) for the reference AI and 57.7% (CI: 47.3% to 67.3%) for the temporal AI (Figure 3). False positive rate decreased by 9%.
Conclusion: The introduction of temporal change into an AI system was shown to improve specificity which today represents the major limitation for an autonomous use of AI systems. We showed that temporal AI can detect about 20% of cancer-positive cases without producing any false positive. Results have to be confirmed by larger studies.
Clinical Relevance Statement: This AI, if integrated in breast cancer screening as standalone-system, could be used to detect high-risk patients for immediate diagnosis, or prioritize recalls in backlogs that extended due to Covid19.