Assessing and Improving Imaging Interpretation in Breast Cancer Screening

Get Permission

Diana Buist, PhD

Barbara Monsees, MD

Tracy Onega, PhD

Patricia A. Carney, PhD

Digital mammography makes centralized interpretation of screening mammograms feasible but at the same time less workable for diagnostic evaluation, where a radiologist should be present.

—Barbara Monsees, MD
Geography does not seem to affect access to mammography, but it may limit other breast services.

—Tracy Onega, PhD

The quality of mammography images has markedly improved over the past few decades. However, the quality of the interpretation of mammograms remains variable. That said, more than 38 million mammograms are performed annually in the United States.

So said Diana Buist, PhD, Senior Scientific Investigator, Group Health Breast Cancer Surveillance, as she introduced a workshop hosted by the National Cancer Policy Forum and the American Cancer Society.

What Could Be Improved

Congress authorized the Mammography Quality Standards Act (MQSA) in 1992, but image quality and interpretation remain problematic. Both depend on many factors and are difficult to measure.  Inconsistency, however, is a constant. Therefore, in preparation for reauthorization of MQSA, Congress commissioned a study from the Institute of Medicine (IOM) in 2005 to determine what could be done to increase accuracy and whether current regulations should be modified. IOM also was asked to consider access to mammography services and to identify what would ensure safe and effective use of other screening and diagnostic tools.

The Institute made recommendations about medical audits, centers of excellence, continuing medical education, reader volume, double reading, computer-aided detection, state and federal regulations, inspections and enforcement, data analysis, workforce, and accreditation for nonmammography breast imaging methods.

Dr. Buist said that since publication of the 2005 IOM report, Improving Breast Imaging Quality Standards,1 there has been a substantial body of research on factors that influence interpretation, including minimum volume needed for high quality, identification of radiologists who perform below par, and whether live instructors or self-paced methods are better at improving performance.

Part of the push for improvement is the National Mammography Database, a registry established in 2009 that allows facilities and physicians to monitor and improve quality using standardized measures consistent with the Breast Imaging Reporting and Data System (BI-RADS). It currently has 275 registered sites, 162 of which contribute data from more than 9 million exams.

“This provides good representation across the country and across practice types and locations,” said Carl D’Orsi, MD, Director of Breast Imaging Research, Emory Healthcare. The National Mammography Database is automated, and data are sent to it directly. It now collects only mammography data but will expand to include ultrasound and magnetic resonance imaging (MRI) later this year.

Unlike the National Cancer Institute–funded Breast Cancer Surveillance Consortium, the National Mammography Database does not have information on missed cancers, limiting its ability to effectively evaluate performance and safety.

Medical Audits

Etta D. Pisano, MD, Dean Emerita and Distinguished University Professor, Medical University of South Carolina, noted that the required elements of an MQSA medical audit in 2005 included:

  • All mammograms interpreted as positive, or BI-RADS 4 or 5
  • Follow-up of all positive ­mammograms
  • All biopsy results
  • Correlation of pathology results with final assessment
  • An interpreting physician for each case
  • Annual analysis of results and sharing them with the interpreting physician and the entire facility

In addition, said Dr. D’Orsi, a complete audit should include sensitivity (percent of cancers detected by mammography among all cancers that were found in the women receiving screening mammograms), specificity (percent of negative cases interpreted on a mammogram among all cases when women received screening), recall rate (screens given additional imaging), abnormal interpretation rate (positive exams), accuracy (cancer and negative cases identified from all cases), positive predictive value type 1 (screening exams with a positive interpretation and cancer within a year), positive predictive value type 2 (positive exams with a biopsy recommended and cancer within a year), positive predictive value type 3 (biopsies done with a positive interpretation and a known biopsy of cancer in a year), cancer detection rate (number of cancers detected per thousand women), and percent of minimal cancer (less than 1 cm, or ductal carcinoma in situ). When any of these parameters are unknown, surrogate markers may need to be used for the audit.

A positive screening mammogram, he said, has an assessment of BI-RADS 0, 3, 4, or 5, as does a positive ultrasound. “Since the number of images and parameters for either a screening or diagnostic MRI are the same, the definition of a positive screen and diagnostic exam is the same. However, if the screening exam includes additional images [eg, a 90-degree lateral on a screening mammogram or orthogonal images of a cyst on a screening ultrasound], this too is positive.”

Dr. Pisano noted that some changes were made in the revised MQSA audit, but most of IOM’s suggestions were not implemented. Despite the 2005 IOM recommendations, the revised MQSA did not mandate collection of patient characteristics and tumor staging, establishment of a statistical center to analyze data and provide feedback to interpreting physicians and to report aggregate data to the public, nor development of pay-for-performance incentives for participation in audits and meeting performance criteria—although some payers have implemented pay-for-performance mammography metrics.

Mammography Challenges

Barbara Monsees, MD, Emerita Chief, Breast Imaging Section, Washington University School of Medicine, said that it’s not easy to achieve high-quality mammography. She asked, “How do we ensure broad access? What do patients need to understand about new technologies? How does supplemental screening fit in? Finally, how do state laws mandate notification about breast density, and how does this change expectations and outcome tracking?”

Disparities in performance of and access to screening lead to disparities in outcomes, said Tracy ­Onega, PhD, Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth. For example, 12.6% of white women travel more than 30 minutes to the closest mammography center, 6.4% of black women spend that much time, but 39.6% of Native American women do. In urban areas, only 0.5% of women travel more than a half-hour, but in rural areas, that figure is 27.9%.

The percentages of women over age 40 who have had a mammogram within the past 2 years are 75.4% of whites, 78.6% of blacks, and 63.9% of Native Americans. When white women get breast cancer, 7.6% have stage III or IV at diagnosis, whereas in black women it is 11.2%. Breast cancer mortality had a similar ratio: 22.7% for whites and 30.8% for blacks. Morbidity and mortality rates were not available for Native Americans.

“Geography does not seem to affect access to mammography, but it may limit other breast services,” said Dr. Onega. “Moreover, at facilities that serve vulnerable populations, screening mammography had the same sensitivity as other facilities, but specificity was significantly higher. In diagnostic mammograms, the false-positive rates were much higher at facilities that serve vulnerable women.”

MQSA audit requirements and the way BI-RADS addresses them will have an effect on outcome. For example, what tools should be used, what are appropriate audit measures, and how often should data be reviewed? What are reasonable goals for recall rates, detection rates, and tumor size and stage? Are there reasonable tradeoffs for sensitivity and specificity, and if so, what are they?

These challenges are complicated by the fact that breast imaging includes both screening and diagnostic mammography, ultrasound, MRI, image-guided needle biopsy, and other modalities. “Expectations are high,” said Dr. Monsees, “and medicolegal implications can be an issue. Variability in interpretation is a problem, but double reading is not feasible even though it is often used in other countries. More modalities and procedures can help with patient management, but they are time consuming and expensive.”

Overall, however, mammography has improved. Technologists have learned how to produce better images with good compression and positioning. Digital mammography means that technique is less of a factor because of wider recording latitude and elimination of film processors. Quality control is easier and more streamlined, and there are fewer lost exams.

More and more radiologists do only breast imaging, although some general radiologists interpret screening mammograms or perform diagnostic workups, including breast ultrasound. “Digital mammography makes centralized interpretation of screening mammograms feasible but at the same time less workable for diagnostic evaluation, where a radiologist should be present,” said Dr. Monsees.

Dr. Onega added that access to mammography is generally good, but the mammography workforce (radiologists and technologists) is not standardized, and quality varies with practices.

Dr. Monsees said that use of computer-aided detection during screening mammography among Medicare beneficiaries is a good news–bad news story, leading to an increased incidence of ductal carcinoma in situ, diagnosis of invasive breast cancer at earlier stages, and increased diagnostic testing among women without breast cancer.

Teleradiology, now 67% of the telemedicine market, also is inconsistent, said Dr. Onega. Whereas local access emphasizes machines and technicians, teleradiology separates interpretation from physical location. Mammography readers could increase volume, but it remains to be seen to what extent volume has a relationship with outcome.

She added that mammography misses about 20% of breast cancers, but digital breast tomosynthesis seems to increase detection and reduce recall.

Training, Experience, and Performance

Dr. Buist said that interpretation variability is due to patient factors, practice and facility characteristics, and radiologist training, years of experience, and volume. “Volume requirements differ greatly across countries, as do quality standards,” she said. “In the United States, we have demonstrated poorer performance for radiologists with low interpretive volume, leading to higher false-positive rates, lower cancer detection, and lower sensitivity.”

The most important finding in the United States is improved screening and diagnostic interpretive performance for radiologists who interpret some proportion of diagnostic examinations and also for radiologists who interpret some of their own recalled screening exams.

Diana Miglioretti, PhD, Dean’s Professor in Biostatistics, Department of Public Health Sciences, University of California Davis, said that 18% of radiologists fall into the low-performance range in sensitivity, 48% in specificity, 49% in recall rate, and 38% in positive predictive value type 1 (ie, with abnormal findings at screening). She added that most radiologists are in the low range for at least one measure of competence, and many interpret few mammograms associated with a cancer diagnosis.

Patricia A. Carney, PhD, Professor of Family Medicine and of Public Health & Preventive Medicine, Oregon Health & Science University, concurred. “There is significant variability in the interpretive acumen of practicing radiologists: 75% to 95% for sensitivity and 83% to 98.5% for specificity.”

Dr. Buist suggested that consideration should be given to increasing minimum interpretation volumes, including diagnostic exams, which should be a proportion of total volume. She also thinks radiologists should be required to perform a minimum number of diagnostic workups resulting from their own recalls.

Radiology Technologists

Louise M. Henderson, PhD, Assistant Professor, Department of Radiology, University of North Carolina, Chapel Hill, said, “Mammograms are interpreted by radiologists but performed by technologists who are responsible for image quality.”

The American Registry of Radiologic Technologists (ARRT) tests, certifies, and registers the more than 250,000 technologists and awards the Registered Technologist designation. Even though ARRT provides continuing education and reregisters technologists every year, certification is voluntary and is not the same as state licensure.

“Technologists have a significant impact on mammography performance, specifically recall rate, sensitivity, specificity, [positive predictive value], and cancer detection rate,” said Dr. Henderson. They also can serve as double readers, although this is more common in Europe than in the United States. Where they do serve as such for screening mammograms, cancer detection rates increase without significantly increasing recall or false-positive rates. ■

Disclosure: Drs. Buist, D’Orsi, Pisano, Monsees, Onega, Miglioretti, Carney, and Henderson reported no potential conflicts of interest.


1. Institute of Medicine: Improving Breast Imaging Quality Standards. May 23, 2005. Available at Accessed June 22, 2015.


Related Articles

Breast Imaging Reporting and Data System BI-RADS

The Breast Imaging Reporting and Data System (BI-RADS) is a standardized system to describe mammogram findings and results. Developed by the American College of Radiology (ACR), results of mammograms are sorted into categories numbered 0 through 6 with interpretation as follows:

Category 0:...