Large Language Models May Generate Concise, Coherent Pathology Summaries, Reducing Physician Burden

By The ASCO Post Staff

Posted: 4/14/2026 12:00:00 PM
Last Updated: 4/14/2026 11:36:34 AM

Large language models performed better than physicians at producing accurate and comprehensive oncology pathology report summaries, according to the results of a study published in JCO Clinical Cancer Informatics.

Six large language models were tested in the study, and most generated summaries with greater similarity to the original pathology reports than the physician-prepared summaries. LLM-generated summaries demonstrated better performance in objective metrics and greater completeness in subjective evaluations compared with physician summaries.

“As cancer care becomes increasingly complex, the burden of synthesizing complex reports is growing rapidly,” stated senior study author Mohamed E. Abazeed, MD, PhD, Chair and Professor of Radiation Oncology at Northwestern University Feinberg School of Medicine. “What we’re seeing is that AI can help ensure critical pathological and genomic details are consistently captured—not as a replacement for physicians, but as a tool to augment clinical decision-making.”

Background and Study Methods

Researchers assessed the ability of several large language models to offer a potential alternative to physician-prepared summaries integrating complex pathology data from multiple sources. These summaries require physicians to condense histopathologic, immunohistochemical, and molecular findings from multiple reports and institutions into one summary, usually under time constraints that lead to greater chances of introducing errors. If large language models could instead produce accurate report summaries, this might help reduce physician documentation burden and improve workflow efficiency.

“If AI can reliably synthesize these reports, clinicians can review key findings more efficiently, important genetic details are less likely to be overlooked and documentation becomes more standardized,” said corresponding study author P. Troy Teo, PhD, Instructor in the Department of Radiation Oncology at Northwestern University Feinberg School of Medicine. “This could help physicians focus more on patient care.”

The researchers gathered cases (n = 94) of patients who had undergone an initial thoracic consultation between January 2019 and July 2023. They extracted and anonymized their pathology reports and physician pathology summaries.

These reports were then run through six open-source large language models to generate pathology report summaries: Llama 3.0, Llama 3.1, Llama 3.2, Mistral, Gemma, and DeepSeek-R1.

The AI-generated report summaries were compared with physician summaries for correctness, completeness, and conciseness, as well as for similarity to the original pathology reports, which were used as the ground truth in the analysis. The summaries were evaluated on both objective and subjective measurements.

Key Findings

Pathology report summaries generated by the large language models achieved higher scores than the physician-prepared summaries across all objective evaluation measurements (P < .0001). By subjective evaluation, four of the models' summaries achieved higher ratings than physician summaries for completeness (P = .017 for DeepSeek, P < .0001 for Mistral, P < .0001 for Llama 3.1, and P < .0001 for Llama 3.2). All of these models showed comparable correctness to the physician summaries (P = 1.000).

The study authors concluded that open-source large language models can accurately condense complex pathologic information into report summaries while improving completeness compared with physician-prepared summaries.

“Patients with complex cancers might benefit the most,” commented first study author Yirong Liu, MD, PhD, a Fifth-Year Resident in Radiation Oncology at McGaw Medical Center of Northwestern. “In cases where missing a key pathological finding or an actionable genetic marker could change treatment decisions, ensuring that information is consistently captured is critical.”

“Patients are living longer and undergoing repeated biopsies and genetic sequencing,” Dr. Liu added. “Their reports can span dozens of pages. Even a single missed detail can impact care, and this is where AI may provide meaningful support.”

DISCLOSURES: Dr. Teo received funding from the Canadian Institute of Health Research and from Amazon Web Services’ Social Impact funding. For full disclosures of the other study authors, visit ascopubs.org.

Large Language Models May Generate Concise, Coherent Pathology Summaries, Reducing Physician Burden

Guideline-Concordant Care Is Associated With Improved Survival. So, Why Aren’t More AYA Patients Receiving This Care?

Targeted Therapies Drive Long-Term Decline in Multiple Myeloma Mortality in the United States

Patient-Reported Outcomes Support Niraparib Maintenance in Advanced Ovarian Cancer Regardless of Homologous Recombination Deficiency Status

FDA Approves Antibody-Drug Conjugate for Triple-Negative Breast Cancer

Preclinical Studies Evaluate Mezigdomide for T-Cell Dysfunction in Multiple Myeloma