Advertisement

Study Shows Potential Missing Patient Information in SEER Database


Advertisement
Get Permission

A significant number of patients with cancer—particularly those with more advanced disease who are more likely to receive care at community hospitals, safety net hospitals, and rural medical centers—may have incomplete case information in the Surveillance, Epidemiology, and End Results (SEER) database, according to a study published by White et al in the Journal of the American College of Surgeons. This finding, the researchers noted, warrants careful consideration when interpreting studies that rely on SEER data.

These missing cases create “blind spots” in the data, said senior study author Schelomo Marmor, PhD, MPH, a Professor in the Department of surgery, Division of Surgical Oncology, and Real-World Data Lead at the Center for Learning Health System Sciences at the University of Minnesota in Minneapolis (UMN).

Dr. Marmor and McKenzie White, MD, a complex general surgical oncology fellow at Moffitt Cancer Center, who was formerly at UMN, studied four types of cancer in this analysis: breast, pancreatic, colon, and non–small cell lung cancer (NSCLC).

“Patients with missing data had meaningfully lower rates of survival,” Dr. Marmor said. “That was not a coincidence. These are not minor statistical loose ends. They are a high-risk, underserved population that effectively disappears from the scientific record every time a study excludes incomplete cases.”

Study Findings

The study evaluated how population-based studies that used the SEER database handled the missing patient data. The study findings indicate that SEER studies that exclude patients with missing data may overestimate survival outcomes and exclude at-risk or underserved patient populations.

The study analyzed 328,030 patients with one of the four cancers and stage I to IV disease entered in SEER from 2018 to 2020. Most patients were treated at American College of Surgeons Commission on Cancer (CoC)-accredited cancer centers—82% with breast cancer, 83% with pancreatic cancer, 75% with colon cancer, and 80% with NSCLC. SEER captures all cancer cases from both CoC- and non–COC-accredited centers and 18 population-based registries across 22 geographic regions and covers about half of the United States population.

Patients who went to centers that were not CoC-accredited were more than two to three times more likely to have missing data than those who went to CoC-accredited facilities: 23% vs 9% for breast cancer, 36% vs 14% for pancreatic cancer, 30% vs 13% for colon cancer, and 42% vs 13% for NSCLC.

“Think of it this way: if you set out to understand how cancer treatments perform across the entire country, but your data systematically leaves out the sickest patients, the oldest patients, and those from rural or underserved communities, then what you’re left with is a portrait of cancer care that looks far more optimistic than reality,” Dr. Marmor said. “The hardest cases weren’t randomly absent; they were quietly excluded, and the database never flagged them as missing. The database results look much rosier than reality, and that's because the most difficult cases were quietly left out.”

Patients with missing data had significantly lower 3-year overall survival: 63% vs 81% for breast cancer, 5% vs 12% for pancreatic cancer, 42% vs 61% for colon cancer, and 17% vs 27% for NSCLC. Overall, the proportion of missing data ranged from 12% for breast cancer to 19% for NSCLC.

This is the first study to show that patients with missing data in SEER are treated mostly at centers that are not CoC-accredited, Dr. Marmor said.

“Our manuscript shows that patients with missing data are more likely to be older, from rural areas, or socioeconomically disadvantaged backgrounds—the very patients who already face barriers to preventive care, who are more often diagnosed at aggressive stages, and who have less access to high-quality cancer treatment,” Dr. Marmor said. “When their records are dropped from an analysis, we don’t just lose data points; we lose the clinical and human reality of cancer in America.”

Implications for Building AI Models

“Population-based registries like SEER are the foundation on which [artificial intelligence (AI)]-driven cancer research is built. These registries capture longitudinal real-world data and are really the backbone that enable these types of AI advances, but if we move deeper into an era of AI and machine learning–driven oncology research, the issue of missing data becomes even more consequential because AI does not fix the missing data by default,” Dr. Marmor said. “AI learns from us how we handle it, and if we're systematically excluding those patients, we’re teaching AI our own blind spots.”

The study suggested cancer researchers use multiple data sources; for example, both SEER and the National Cancer Database, a hospital-based registry that captures cases from CoC-accredited centers.

Strengths of the study are the large size of the dataset and its use of clinically validated methods to adjust for age and proportional hazard, Dr. Marmor said. A key limitation is that the study could not determine why specific data were missing.

“Was it inadequate documentation? Understaffed registries? Changes in coding systems? Or disease severity itself?” Dr. Marmor said. “That’s an important question for future research, because understanding why data goes missing is the first step to ensuring we stop losing these patients from the scientific conversation.”

This work was first presented at the Society of Surgical Oncology Annual Meeting in 2023, as an e-poster at the Minnesota Society for Clinical Oncology 2023 Spring Meeting, and as an oral presentation at the Minnesota Surgical Society (a Chapter of the ACS) 2023 Fall Meeting, and then updated with an additional year of data in 2025.

DISCLOSURE: For full disclosures of the study authors, visit journals.lww.com/journalacs.

The content in this post has not been reviewed by the American Society of Clinical Oncology, Inc. (ASCO®) and does not necessarily reflect the ideas and opinions of ASCO®.
Advertisement

Advertisement




Advertisement