Investigators have found that artificial intelligence (AI) language models like OpenAI’s ChatGPT may accurately identify appropriate imaging tests for breast cancer screenings and breast pain, according to a recent study published by Rao et al in the Journal of the American College of Radiology. The new findings suggested that large language models may have the potential to assist primary care providers in decision-making, patient evaluation, and ordering imaging tests for breast cancer screenings and breast pain.
Background
ChatGPT, a large language model built on data from the Internet, is capable of answering questions in a human-like way. Since ChatGPT was introduced in November 2022, investigators across the world have been analyzing how these AI language models can be used in medical scenarios.
When primary care providers order specialized testing for patients who complain of breast pain, they may not know the best imaging tests to choose—such as magnetic resonance imaging (MRI), ultrasound, mammogram, or another imaging test. Radiologists generally follow the American College of Radiology’s Appropriateness Criteria to make these decisions. Although these evidence-backed guidelines are well known to specialists, they are often less known for nonspecialists tasked with selecting the most appropriate imaging test during a patient’s visit. This may confuse patients and lead to unnecessary or incorrect testing.
"In this scenario, ChatGPT’s abilities were impressive," explained senior study author Marc D. Succi, MD, Assistant Professor of Radiology at Harvard Medical School, Associate Chair of Innovation and Commercialization at Mass General Brigham Enterprise Radiology, and Founder and Executive Director of the Medical Engineered Solutions in Healthcare Incubator at Massachusetts General Hospital. "I see it acting like a bridge between the referring health-care professional and the expert radiologist—stepping in as a trained consultant to recommend the right imaging test at the point of care, without delay. This could reduce administrative time on both referring and consulting physicians in making these evidence-backed decisions, optimize workflow, reduce burnout, and reduce patient confusion and wait times,” he added.
Study Methods and Results
In the recent study, the investigators invented 21 patient scenarios involving the need for breast cancer screenings or the reporting of breast pain and asked ChatGPT-3.5 and the newer, more advanced ChatGPT-4 to help them decide which imaging tests to use—with the goal of assessing the AI language models’ clinical decision-making capabilities. The investigators asked ChatGPT-3.5 and ChatGPT-4 in an open-ended way and by giving the AI language models a list of options.
They discovered that ChatGPT-4 outperformed ChatGPT-3.5, especially when given the available imaging options. For instance, when asked about breast cancer screenings, and given multiple choice imaging options, ChatGPT-3.5 answered an average of 88.9% of the prompts correctly, and ChatGPT-4 answered about 98.4% of them correctly.
"This study doesn’t compare ChatGPT to existing radiologists because the existing gold standard is actually a set of guidelines from the American College of Radiology, which is the comparison we performed,” Dr. Succi noted. “This is purely an additive study, so we are not arguing that AI is better than your [physician] at choosing an imaging test but can be an excellent adjunct to optimize a [physician’s] time on noninterpretive tasks."
Conclusions
The investigators proposed that ChatGPT could be integrated into medical decision-making at the point of care. When a primary care provider enters data into an electronic health record, the program could alert them to the best imaging options—supplying patients with expectations when they go in for the tests and recommending the most appropriate tests for primary care providers to order.
The investigators also highlighted that a more advanced medical AI language model could be created using data sets from hospitals and other research institutions to make it more specific to health-focused applications.
"We may be able to fine-tune ChatGPT with different patient and therapeutic data and knowledge sets to tailor it to specific patient populations," Dr. Succi emphasized. "At Mass General Brigham, we have specialized centers of excellence where we care for patients with some of the most complex and rare diseases. We can leverage our experience and lessons learned from caring for these patient cases to train a model to provide support for rare and complex diagnoses and then make that model available to centers around the world, especially centers that may treat these conditions less frequently,” he underscored.
The investigators concluded that before any AI language model would be involved in medical decision-making, it would need to be extensively tested for bias, privacy concerns, and approved for use in medical settings. New regulations around medical AI language models could also set parameters for what should be included in patient care interactions.
Disclosure: The research in this study was supported in part by a grant from the National Institute of General Medical Sciences. For full disclosures of the study authors, visit jacr.org.