Artificial intelligence (AI) models pretrained on vast data sets may outperform standard baseline models in identifying nonmelanoma skin cancers from digital images of tissue samples, according to new findings presented by Song et al at the 2025 American Association for Cancer Research (AACR) Annual Meeting (Abstract 1141). The findings suggest that advanced, pretrained AI models could help expand the reach of machine learning–based cancer diagnosis to resource-limited settings.
Background
Skin lesions suspected of being nonmelanoma skin cancers are typically resected, thinly sliced, and mounted on a slide for evaluation by an expert pathologist.
“In resource-limited settings, however, the lack of expert pathologists limits the ability to provide timely and widespread review and diagnosis of [nonmelanoma skin cancers],” stressed lead study author Steven Song, BS, an MD/PhD candidate in the Medical Scientist Training Program at the Pritzker School of Medicine and the Department of Computer Science at the University of Chicago. “[AI] and machine learning have long promised to fill resource gaps, but the development and deployment of bespoke machine-learning models require significant resources that may not be available in many places—namely, computational experts, specialized computational hardware, and large amounts of curated data to train each model,” he added.
Study Methods and Results
In the study, researchers hypothesized that machine-learning models previously trained on vast amounts of data—known as foundation models—in resource-rich environments could be effective off-the-shelf tools to guide nonmelanoma skin cancer diagnosis. They evaluated the accuracy of three contemporary foundation models in identifying nonmelanoma skin cancers from digital pathology images of suspected cancerous skin lesions: PRISM, UNI, and Prov-GigaPath. All of the models were designed to convert high-resolution digital images of tissue pathology slides into small image tiles; extract meaningful features from the tiles; and analyze these features to compute the probability that the tissue contained nonmelanoma skin cancer.
To determine the models’ accuracy, the researchers used 2,130 tissue slide images representing 553 biopsy samples from patients residing in Bangladesh who participated in the Bangladesh Vitamin E and Selenium Trial. High levels of exposure to arsenic through contaminated drinking water is known to increase the risk for nonmelanoma skin cancers in this patient population, providing a relevant real-world context for the study. Among the images, 706 of them were of normal tissue and 1,424 of them were of confirmed nonmelanoma skin cancer cases such as Bowen’s disease (n = 638), basal cell carcinoma (n = 575), and invasive squamous cell carcinoma (n = 211).
The researchers then compared the accuracy of the three foundation models with that of ResNet18, an established but older architecture for image recognition. Each of the three foundation models outperformed ResNet18, correctly distinguishing between nonmelanoma skin cancers and normal tissue in 92.5% (PRISM), 91.3% (UNI), and 90.8% (Prov-GigaPath) of cases compared with 80.5% for ResNet18.
“ResNet architectures have been used as a starting point for training vision models for nearly a decade and serve as a meaningful baseline comparison for evaluating the performance gains of newer pretrained foundation models,” Mr. Song noted.
To make the foundation models more amenable to use in resource-limited settings, the researchers developed and tested simplified versions of each model that required less extensive analysis of pathology image data. They discovered that the simplified models still outperformed ResNet18, with accuracies of 88.2% (PRISM), 86.5% (UNI), and 85.5% (Prov-GigaPath), demonstrating robustness even with reduced complexity.
In addition, the researchers developed and applied an annotation framework designed to highlight cancerous regions on tissue slides identified by the foundation models. The framework didn’t require training on large data sets and instead leveraged example images of cancerous tissue from a small number of biopsies. It then compared pathology image tiles against these examples to identify and annotate cancerous regions. The researchers detailed that annotation could help guide the attention of a user toward regions of interest on each slide.
Conclusions
The researchers hope their study may allow machine-learning models to be used in settings with limited access to large data sets as well as the specialized equipment or experts needed for developing new models.
“Overall, our results demonstrate that pretrained machine-learning models have the potential to aid diagnosis of [nonmelanoma skin cancers], which might be particularly beneficial in resource-limited settings,” emphasized Mr. Song. “Our study also provides insights that may advance the development and adaptation of foundation models for various clinical applications,” he highlighted.
Study limitations included the models’ evaluation on a single cohort of patients from Bangladesh—which may have limited the generalizability of the findings to other populations—and its lack of examination of the practical details of deploying the pretrained machine-learning models in resource-limited settings, despite approaching its analyses from the perspective of these settings.
“While our study suggests foundation models as resource-efficient tools for aiding [nonmelanoma skin cancer] diagnosis, we acknowledge that we are still far from having a direct impact on patient care and that further work is needed to address practical considerations such as the availability of digital pathology infrastructure, internet connectivity, integration into clinical workflows, and user training,” Mr. Song concluded.
Disclosure: The research in this study was supported by the National Institutes of Health. For full disclosures of the study authors, visit abstractsonline.com.