Document Type

Article

Publication Date

4-3-2026

Department

Department of Biological Sciences; Department of Biomedical Engineering

Abstract

Featured Application: This paper has important applications in environmental health risk assessment and urban soil remediation planning. By demonstrating how machine learning can predict lead bioaccessibility in lead paint-contaminated soil, the study provides a scalable, cost-effective alternative to laboratory-based extraction methods. Such models could support rapid screening of contaminated sites, helping prioritize high-risk areas for intervention and allocate remediation resources more efficiently. This model is expected to advance data-driven decision-making for managing lead-contaminated soils and protect vulnerable urban populations from exposure. Lead contamination in urban soils, primarily from deteriorating lead-based paint, poses a significant health risk in the United States. These soils often serve as major sources of exposure, making them critical targets for remediation efforts. To guide such strategies, preliminary risk assessments are necessary to evaluate lead bioaccessibility in the soil and identify key soil properties influencing lead speciation. In this study, a novel machine learning approach was co-developed with an artificial intelligence assistant, Claude Sonnet, developed by Anthropic, to design a predictive model that overcomes the difficulties of conducting experimental bioaccessibility models. Data was compiled from published sources (n = 640), as well as an internal analysis of soils sampled across three large cities in the United States (n = 30), to use as a validation model. While our final model’s prediction accuracy was good (R2 = 0.95), it initially did not perform as expected on our internal dataset, indicating a fundamental domain shift. Further analysis revealed complications with outliers, data availability, and data consistency that resulted in poor performance. When optimization was applied to the validation model, our final prediction accuracy improved (R2 = 0.84). Here, we conclude the importance of data availability and consistency in heavy-metal soil bioaccessibility studies to build a generalizable predictive model.

Publisher's Statement

Copyright: © 2026 by the authors. Licensee MDPI, Basel, Switzerland. Publisher’s version of record: https://doi.org/10.3390/app16073504

Publication Title

Applied Sciences Switzerland

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Version

Publisher's PDF

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.