Using Machine Learning Algorithms to Clarify Relationships Between Soil Properties and Lead Stomach Bioaccessibility

Document Type

Article

Publication Date

4-1-2026

Abstract

Featured Application: This paper has important applications in environmental health risk assessment and urban soil remediation planning. By demonstrating how machine learning can predict lead bioaccessibility in lead paint-contaminated soil, the study provides a scalable, cost-effective alternative to laboratory-based extraction methods. Such models could support rapid screening of contaminated sites, helping prioritize high-risk areas for intervention and allocate remediation resources more efficiently. This model is expected to advance data-driven decision-making for managing lead-contaminated soils and protect vulnerable urban populations from exposure. Lead contamination in urban soils, primarily from deteriorating lead-based paint, poses a significant health risk in the United States. These soils often serve as major sources of exposure, making them critical targets for remediation efforts. To guide such strategies, preliminary risk assessments are necessary to evaluate lead bioaccessibility in the soil and identify key soil properties influencing lead speciation. In this study, a novel machine learning approach was co-developed with an artificial intelligence assistant, Claude Sonnet, developed by Anthropic, to design a predictive model that overcomes the difficulties of conducting experimental bioaccessibility models. Data was compiled from published sources (n = 640), as well as an internal analysis of soils sampled across three large cities in the United States (n = 30), to use as a validation model. While our final model’s prediction accuracy was good (R2 = 0.95), it initially did not perform as expected on our internal dataset, indicating a fundamental domain shift. Further analysis revealed complications with outliers, data availability, and data consistency that resulted in poor performance. When optimization was applied to the validation model, our final prediction accuracy improved (R2 = 0.84). Here, we conclude the importance of data availability and consistency in heavy-metal soil bioaccessibility studies to build a generalizable predictive model.

Publication Title

Applied Sciences Switzerland

Share

COinS