Can AI Assessment of Screening Mammograms Offer Similar Accuracy to Human Readers?

Get Permission

A commercially available artificial intelligence (AI) algorithm may perform comparably to human readers at assessing screening mammograms, according to a recent study published by Chen et al in Radiology.

False-positive interpretations on screening mammograms can result in women without cancer undergoing unnecessary imaging and biopsy. To improve the sensitivity and specificity of screening mammograms, researchers have proposed that two readers interpret every mammogram. Although double reading has been shown to increase cancer detection rates by 6% to 15% and keep recall rates low, this strategy is labor intensive and difficult to achieve during reader shortages.

“There is a lot of pressure to deploy AI quickly to solve these problems, but we need to get it right to protect women’s health,” stressed lead study author Yan Chen, PhD, Professor of Digital Screening at the University of Nottingham.

Study Methods and Results

In the new study, the researchers compared the performance of 552 human readers with the AI algorithm when assessing two test sets—each containing 60 challenging screening mammograms with abnormal, benign, and normal results—from the Personal Performance in Mammographic Screening (PERFORMS) quality assurance assessment. For each test mammogram, the human reader’s scores were compared with the ground truth of the AI algorithm’s results. Among the human readers, 57% (n = 315) of them were board-certified radiologists, 37% (n = 206) were radiographers, and 6% (n = 31) were breast clinicians.

“It’s really important that human readers working in breast cancer screening demonstrate satisfactory performance. The same will be true for AI once it enters clinical practice,” Dr. Chen emphasized. “The 552 readers in our study represent 68% of readers in the [UK’s National Health Service Breast Screening Program], so this provides a robust performance comparison between human readers and AI [algorithms],” she added.

After treating each breast separately, the researchers found that 67% (n = 161/240) of the screening mammograms were normal, 29% (n = 70) of them contained malignancies, and 4% (n = 9) of them were benign. Masses were the most common malignant mammographic feature, present in 64% of cases (n = 45/70); followed by calcifications in 13% of cases (n = 9); asymmetries in 11% of cases (n = 8); and architectural distortions in 11% of cases (n = 8). The mean size of malignant lesions was 15.5 mm.

No statistically significant differences in performance were observed between the AI algorithm and human readers regarding the detection of breast cancer across all 120 exams. Human reader performance demonstrated mean 90% sensitivity and 76% specificity, and the AI algorithm was comparable in sensitivity and specificity (91% and 77%, respectively).


“The results of this study provide strong supporting evidence that AI for breast cancer screening can perform as well as human readers,” Dr. Chen highlighted. “I think it is too early to say precisely how we will ultimately use AI in breast screenings. The large prospective clinical trials that are ongoing will tell us more. But no matter how we use AI, the ability to provide ongoing performance monitoring will be crucial to its success,” she underscored.

The researchers noted that it is crucial to recognize that AI algorithm performance can drift over time, and algorithms can be affected by changes in the operating environment. More research may be needed before AI algorithms can be used as second readers in clinical settings.

“It’s vital that imaging centers have a process in place to provide ongoing monitoring of AI once it becomes part of clinical practice. There are no other studies to date that have compared such a large number of human reader performance in routine quality assurance test sets to AI—so this study may provide a model for assessing AI performance in a real-world setting,” Dr. Chen concluded.

Disclosure: For full disclosures of the study authors, visit

The content in this post has not been reviewed by the American Society of Clinical Oncology, Inc. (ASCO®) and does not necessarily reflect the ideas and opinions of ASCO®.