Study Finds Deep Learning Can Distinguish Recalled-Benign Mammogram Images From Malignant and Negative Cases


Key Points

  • Current breast cancer screening mammography has high false recall rates often resulting in unnecessary medical procedures, including breast biopsies, higher medical costs, and increased psychological stress for patients.  
  • Deep learning convolutional neural network models can be used to recognize nuanced imaging features that distinguish recalled but benign mammography images, which may not be identifiable by human visual assessment, thereby reducing the false recall rate.
  • With the ability of the deep learning model to distinguish between negative and malignant images, artificial intelligence can also perform well in computer-aided diagnosis of breast cancer.

Although digital mammography is effective in detecting early-stage breast cancer and in reducing mortality, high recall rates after a screening mammogram often result in unnecessary medical procedures, including breast biopsies, medical costs, and psychological stress for patients.  

A retrospective study investigating deep learning methods to distinguish recalled but benign mammography images from negative exams and those with malignancy has found that deep learning convolutional neural network (CNN) methods can identify nuanced mammographic imaging features from malignant and negative cases. The findings may lead to a computerized clinical toolkit to aid radiologists in distinguishing these images and help reduce the false recall rate. The study by Aboutalib et al is published in Clinical Cancer Research.

Study Methodology

The researchers used two independent mammography datasets: Full-Field Digital Mammography Dataset (FFDM) and a digitized film dataset, Digital Dataset of Screening Mammography (DDSM), to develop and evaluate deep learning classifiers. The two datasets included a total of 3,715 patients, 1,303 FFDM patients, and 2,412 DDSM patients, and 14,860 images. The researchers then built two- and three-class CNN models to investigate six classification scenarios to help distinguish images of benign, malignant, and recalled-benign mammograms.

Study Results

Training and testing using only the FFDM dataset resulted in area under the curve (AUC) ranging from 0.70 to 0.81. When the DDSM dataset was used, AUC ranged from 0.77 to 0.96. When the datasets from FFDM and DDSM were combined for training and testing, the area under the curve to distinguish benign, malignant, and recalled-benign images ranged from 0.76 to 0.91. When pretrained on a large nonmedical dataset and DDSM, the models showed consistent improvements in AUC ranging from 0.02 to 0.05 (all P > .05), compared with pretraining only on the nonmedical dataset.

“This study demonstrates that automatic deep learning CNN models can identify nuanced mammographic imaging features to distinguish recalled-benign images from malignant and negative cases, which may lead to a computerized clinical toolkit to help reduce false recalls,” concluded the study authors.

“We showed that there are imaging features unique to recalled-benign images that deep learning can identify and potentially help radiologists in making better decisions on whether a patient should be recalled or is more likely a false recall,” said Shandong Wu, PhD, Assistant Professor of Radiology, Biomedical Informatics, and Bioengineering at the University of Pittsburgh, and principal investigator of this study, in a statement. “Based on the consistent ability of our algorithm to discriminate all categories of mammography images, our findings indicate that there are indeed some distinguishing features/characteristics unique to images that are unnecessarily recalled. Our [artificial intelligence] models can augment radiologists in reading these images and ultimately benefit patients by helping reduce unnecessary recalls.”

Dr. Wu is the corresponding author of this study.

Funding for this study was provided by the National Institutes of Health, a Radiological Society of North America Research Scholar Grant, a University of Pittsburgh Physicians Academic Foundation Award, and the Pittsburgh Foundation.

J.H. Sumkin, DO, reported receiving commercial research grants from GE and Hologic. No potential conflicts of interest were disclosed by the other study authors.

The content in this post has not been reviewed by the American Society of Clinical Oncology, Inc. (ASCO®) and does not necessarily reflect the ideas and opinions of ASCO®.