Case Study Examines Differences Between AI and Radiologist Perception in Breast Cancer Screening

Get Permission

Radiologists and artificial intelligence (AI) systems yield significant differences in breast cancer screenings, a team of researchers has found. The case study by Makino et al, which appears in the journal Nature Scientific Reports, reveals the potential value of using both human and AI methods in making medical diagnoses. 

“While AI may offer benefits in health care, its decision-making is still poorly understood,” explained Taro Makino, a doctoral candidate in NYU’s Center for Data Science and the paper’s lead author. “Our findings take an important step in better comprehending how AI yields medical assessments and, with it, offer a way forward in enhancing cancer detection.”

The analysis centered on a specific AI tool: deep neural networks, which are layers of computing elements—“neurons”—simulated on a computer. A network of such neurons can be trained to “learn” by building many layers and configuring how calculations are performed based on data input, a process called “deep learning.” 

In the Nature Scientific Reports article, the scientists compared breast cancer screenings read by radiologists with those analyzed by deep neural networks. The researchers, who also included Krzysztof Geras, PhD, Laura Heacock, MD, and Linda Moy, MD, faculty in NYU Grossman School of Medicine’s Department of Radiology, found that deep neural networks and radiologists diverged significantly in how they diagnose soft-tissue lesions.

“In these breast cancer screenings, AI systems consider tiny details in mammograms that are seen as irrelevant by radiologists,” explained Dr. Geras. “This divergence in readings must be understood and corrected before we can trust AI systems to help make life-critical medical decisions.”

More specifically, although radiologists primarily relied on brightness and shape, the deep neural networks used tiny details scattered across the images. These details were also concentrated outside of the regions deemed most important by radiologists.

By revealing such differences between human and machine perception in medical diagnosis, the researchers moved to close the gap between academic study and clinical practice.

“Establishing trust in deep neural networks for medical diagnosis centers on understanding whether and how their perception is different from that of humans,” said Dr. Moy. “With more insights into how they function, we can both better recognize the limits of deep neural networks and anticipate their failures.” 

“The major bottleneck in moving AI systems into the clinical workflow is in understanding their decision-making and making them more robust,” added Mr. Makino. “We see our research as advancing the precision of AI’s capabilities in making health-related assessments by illuminating, and then addressing, its current limitations.”

The study was reported by grants from the National Science Foundation and the National Institutes of Health.


The content in this post has not been reviewed by the American Society of Clinical Oncology, Inc. (ASCO®) and does not necessarily reflect the ideas and opinions of ASCO®.