What drives risk similarity across construction projects? An explainable machine learning analysis using GPT-based embeddings

Document Type

Article

Publication Date

7-2026

Department

Department of Civil, Environmental, and Geospatial Engineering

Abstract

Risk registers contain rich experiential knowledge, yet existing approaches struggle to systematically compare risks across projects and translate past lessons into actionable insights. This study introduces an explainable, data-driven framework for quantifying cross-project risk similarity in transportation construction by integrating GPT-based text embeddings, ensemble learning, and explainable artificial intelligence (XAI). Using over 3500 risk items from 72 transportation projects, the framework measures semantic similarity between project risk profiles and models how project characteristics shape similarity patterns. Ensemble models achieve strong predictive performance (R²=0.85), while XAI analysis reveals that risk documentation practices, geographic context, and delivery method dominate similarity outcomes, outweighing project scale or project type. These findings demonstrate that transferable risk knowledge is primarily context-driven rather than size-driven. The proposed framework provides a robust foundation for future LLM- and graph-based risk prediction systems, enabling more transparent, scalable, and context-aware risk management in transportation infrastructure projects.

Publication Title

Advanced Engineering Informatics

Recommended Citation

Erfani, A., & Naghdi, M. (2026). What drives risk similarity across construction projects? An explainable machine learning analysis using GPT-based embeddings. Advanced Engineering Informatics, 73. http://doi.org/10.1016/j.aei.2026.104543
Retrieved from: https://digitalcommons.mtu.edu/michigantech-p2/2426

Michigan Tech Publications

What drives risk similarity across construction projects? An explainable machine learning analysis using GPT-based embeddings

Document Type

Publication Date

Department

Abstract

Publication Title

Recommended Citation

LINKS

Browse

Search

Graduate Students

Author Corner

Michigan Tech Publications

What drives risk similarity across construction projects? An explainable machine learning analysis using GPT-based embeddings

Authors

Document Type

Publication Date

Department

Abstract

Publication Title

Recommended Citation

Share

LINKS

Browse

Search

Graduate Students

Author Corner