Introduction: Timely diagnosis and assessment of prognosis are challenges for prostate cancer (PCa)—this results in a large number of deaths and increases the overall disease risk and cost of treatment. Although clinical testing strategies are effective, recent advancements in machine learning suggest promising strengths of developing pipelines that could automate standardized and objective assessments while also reducing costs in time, human capital, and other resources. Artificial intelligence (AI) tools from companies like PathAI, HTL Clinical, PaigeAI, and Deciplex can identify cancer areas in digital pathology and differentiate them from non-cancer regions. However, limitations restrict the applicability of these tools in clinical use, as they: 1) consider only the histological architecture to grade cancers and 2) have minimal capability to differentiate between proximal cancer areas. Overcoming these limitations is critical to improving the clinical utility and efficacy of AI for automating PCa diagnosis and prognosis, which can ultimately improve PCa patient outcomes. The proposed study will use machine learning to accurately identify proximal cancer areas by integrating pathology with genomic signatures to aid clinical decision making and reduce costs related to repeat follow-ups.
Methods: We downloaded the data (digital pathology, RNA sequencing, and cancer grade) from 500 patients with primary adenocarcinoma (PRAD) of the prostate from The Cancer Genome Atlas. We used the Xception network architecture within the Keras software package and TensorFlow backend in Python to build the model. 80 images that were scored by a board-certified pathologist were used as training images to create a refined deep convolutional neural network to automate Gleason grading to identify cancer regions of interest on test training slides. We then used this model to score new unannotated slides blindly. Additionally, we integrated the standardized gene expression into the AI models to improve the efficiency of our models by a significant proportion.
Results: We screened multiple AI models for their ability to grade tumors, and then selected and tweaked the most efficient model by augmenting the training dataset with: 1) grading information from three independent pathologists; 2) genomic signatures, representing PCa grades without being confounded by age, prostate specific antigen (PSA) levels, and therapy; and 3) incorporation of a logistic regression classification system to allow screening of proximal cancer stages. Results showed a significant increase in the ability of our model to grade cancer areas for low grade cancer post integration of genomic signatures such that: For 3+3, the efficiency increased from 71.% to 78.6%, for 3+4, the efficiency increase from 57.1% to 69.4% and for 4+3, it increased from 68.1% to 70.8%. In general, there was an increment of 7-12% across the board.
Conclusions: Our machine learning model has the potential to not only automate but also vastly enhance the utility to detect PCa early and aid in clinical decision making. Given the accuracy of our current model, we are hopeful of producing an automated software workflow that will aid in identifying tumor areas, determining their severity, and influencing treatment decisions.