The role of AI in cancer grading - Prostate Cancer Test Case
Cancer is a leading cause of death globally and one of the main challenges modern healthcare is facing. According to the World Health Organization, the number of new cancer cases is expected to grow by 70% over the next two decades. In order to improve survival rates and reduce the death toll, it is imperative to select the right therapy for each patient, and do so early on, so that the treatment is effective. Healthcare costs would also be greatly impacted by this, as more than $75 billion is spent annually on ineffective cancer treatments.
To date, pathology is the only medical specialty responsible for diagnosing cancer, and it goes beyond a simple yes or no determination. Pathologists also classify and grade the cancer, assessing the severity and aggressiveness of a tumor. This information is then used in the cancer staging process for (1) providing the patient with a prognosis for risk of recurrence and chances of survival, and (2) selection of the most appropriate treatment protocol; chemotherapy, radiation, surgery, adjuvant therapy, etc.
There are very clear and well-defined cancer grading methodologies, backed by extensive scientific research and field experience. While these methods provide a degree of assurance, they are limited by the fact that they are designed to allow pathologists to group a range of complex pathological manifestations into a simplified system that contains a relatively small number of classes. This is not an inherent trait of cancer grading systems, but rather a reflection of the human mind’s limited ability to identify patterns in complex data and categorize them based on many parameters, or into many classes, or both.
Cancer grading can immensely benefit from artificial intelligence. Convolutional neural networks (CNNs) are perfectly suitable for identifying subtle, complex patterns and classify them with high accuracy. CNN-based solutions can provide pathologists with cancer grading suggestions that are more complex, while only requiring them to verify a limited amount of key areas or parameters, but without sacrificing accuracy and efficiency. On the contrary, accuracy and efficiency are boosted by AI. The more complex grading can be beneficial in achieving finer and more accurate correlations with available therapies and with patient outcomes, thus improving the overall level of care and reducing spending on ineffective treatments.
In order to demonstrate the tremendous potential of AI in cancer grading, we present the case of prostate cancer grading, as an example.
Prostate cancer facts and figures
Prostate cancer is the most common cancer and the third leading cause of cancer death among men in Europe and N. America, accounting for 684,000 new cases and 140,000 deaths in 2018. Globally, a man is diagnosed with prostate cancer every 25 seconds and a man dies from prostate cancer every 89 seconds.
Like in any other type of cancer, survival rate is dependent on cancer stage at the time of discovery and the rate at which the cancer continues to develop. According to the American Cancer Society, when the cancer is still local, or regional, and there is no sign it has spread outside of the prostate or nearby areas, the relative 5-year survival rate is nearly 100%. On the other hand, when the cancer is diagnosed at a ‘distant stage’, meaning it has metastasized to distant lymph nodes, bones or other organs, the relative 5-year survival rate is dramatically reduced to around 29%.
The practice of cancer grading - Gleason Score
The aggressiveness of prostate cancer, i.e. how rapidly it is expected to grow and spread, is expressed by its Gleason Score, which is based on the two most common Gleason patterns identified in the sample by a pathologist, who examines it under a microscope. The Gleason Score is a useful prognostic tool and is also a major consideration in deciding on therapy. It is considered to be one of the most powerful outcome predictors in prostate cancer and has been incorporated into the WHO classification of prostate cancer, the AJCC/UICC staging system and the NCCN guidelines. The higher the Gleason score is, the more aggressive the cancer is, with greater risk and higher mortality.
The Gleason scoring system was developed in the 1960s by Dr. Donald Gleason and his colleagues at the Minneapolis VA Hospitals. It defines five histological growth patterns for prostate adenocarcinoma, 1 being the most differentiated pattern and 5 being the least differentiated one. As many prostate adenocarcinomas harbor two or more Gleason patterns, the system assigns a primary and secondary score, the sum of which yields the final Gleason score.
The Gleason score has remained unchanged for nearly half a century, until it was modified and refined in 2005, and again in 2014. Although the 2014 system, referred to as the Epstein Gleason score, has been accepted by the WHO and is superior to the 2005 system, according to many pathologists and urologists, the 2005 system still remains in widespread use.
The role of AI in the evolution of prostate cancer grading
It is a matter of time until the Epstein Gleason scoring system is adopted as the primary method for evaluating prostate adenocarcinoma. Nevertheless, even this system only takes into account a couple of morphological patterns, at most, for each of its grade groups and is, therefore, limited by the human factor. Humans are limited in their capacity to process large amounts of complex data and classifying it into many classes. This means that subtle differentiation that may have a significant meaning in cancer treatment might be overlooked and that many potentially important correlations are left uncovered. This is a classic problem for artificial intelligence, which can enrich pathology labs with the ability to quickly and automatically recognize such subtle differences and classify them into a long-tail of classes, based on multiple complex parameters. The result would be a finer and more accurate cancer grading system that better correlates with the ever-widening range of therapies. The implementation of AI in prostate cancer diagnostics is starting to show promising results, as can be demonstrated by the following two examples.
Google Study: Improving prostate cancer grading using deep learning
Google has recently reported a study on improving prostate cancer grading using deep learning. The researchers trained a deep learning system (DLS), based on annotated prostatectomy samples, and subsequently evaluated its performance in grading prostate cancer against a cohort of 29 US board-certified pathologists. The DLS achieved an overall accuracy of 70%, compared to an average accuracy of 61% achieved by the pathologists that participated in the study. Even when comparing the DLS with the top ten pathologists, it still performed better than eight of them.
Perhaps more importantly, the DLS was able to characterize tissue morphology that appeared to lie at the cusp of two Gleason patterns, which is one reason for the disagreements in Gleason grading observed between pathologists, suggesting the possibility of creating finer grained “precision grading” of prostate cancer. This result enables future research to be conducted on the potential correlations between these intermediate patterns (e.g. Gleason pattern 3.3 or 3.7) and patient outcomes.
Finding such correlations may show that AI does not only help pathologists become more accurate, objective and efficient, but actually extends our diagnostic capabilities beyond the scope of human capacity. Furthermore, we can envision a finer and more complex (and potentially more accurate) grading system than Google’s, which takes into account morphological features that are not directly related to malignancy (i.e. Gleason patterns), such as inflammation and atrophy, as well as different features altogether, which can be found in the patient’s electronic medical record.
Ibex: An AI-based quality system in a pathology lab to catch misdiagnosed cancers
Ibex Medical Analytics deployed its AI-based pathology cancer diagnosis system in a clinical lab in March 2018. The system was integrated into the daily routine of a large pathology institute to detect misdiagnosed prostate biopsies. Within the first week, the system had already reversed a misdiagnosis in a 55-year-old patient, whose original diagnosis was benign. The Ibex Second Read™ system alerted on a discrepancy and marked the biopsy as highly likely to be malignant. The case was reviewed once more and the system’s assertion was confirmed using additional staining. Consequently, the original diagnosis was revised to adenocarcinoma with a Gleason score of 3+3. This had direct impact on the chosen treatment protocol for the patient.
The methodologies used for annotating data and for training the deep learning algorithms underlying the Ibex Second Read system coincide with the aforementioned approach of fine-grained precision. Ibex has dubbed this method “Deep Annotation”, reflecting the 20 different morphological features its system is trained to identify in prostate tissue. Consequently, today, the system in Maccabi also alerts on cases that were diagnosed as prostate adenocarcinoma with a Gleason score of 3+3, but are suspected to contain tissue areas of Gleason 4 and Gleason 5 patterns, which, if confirmed in a particular case, would dramatically change the therapeutic approach for the respective patients. Additional types of alerts are planned for deployment shortly.
The full potential of AI in cancer diagnostics is yet to be realized. Challenges in researching and implementing precision grading methods include the fact that routine medical practice does not set aside a control group, i.e. untreated cancer patients, especially with higher-grade cancers. To overcome this, carefully planned, expensive clinical studies are needed. Another challenge for some cancer types is the fact that biopsies are inherently error prone, because they represent small samples of the affected organ, while important finding might lie in unsampled regions. New technologies, such as MRI-guided biopsies, promise to reduce error rates. The outlook is positive, as remaining challenges are rapidly met with driven physicians on one end and with determined companies and startups on the other, joining forces to accelerate innovation in cancer diagnostics.
We used prostate cancer as a test case to demonstrate the immense value AI can bring to cancer grading. We are starting to see actual results and are convinced that healthcare in the 21st century, and cancer diagnostics in particular, will see more accurate, rapid and objective diagnosis, better risk stratification and therapy selection, less unnecessary surgeries (e.g. prostatectomies) and chemotherapy, and overall better patient outcomes, all driven by innovation enabled by AI.