What drives risk similarity across construction projects? An explainable machine learning analysis using GPT-based embeddings
Document Type
Article
Publication Date
7-2026
Department
Department of Civil, Environmental, and Geospatial Engineering
Abstract
Risk registers contain rich experiential knowledge, yet existing approaches struggle to systematically compare risks across projects and translate past lessons into actionable insights. This study introduces an explainable, data-driven framework for quantifying cross-project risk similarity in transportation construction by integrating GPT-based text embeddings, ensemble learning, and explainable artificial intelligence (XAI). Using over 3500 risk items from 72 transportation projects, the framework measures semantic similarity between project risk profiles and models how project characteristics shape similarity patterns. Ensemble models achieve strong predictive performance (R2=0.85), while XAI analysis reveals that risk documentation practices, geographic context, and delivery method dominate similarity outcomes, outweighing project scale or project type. These findings demonstrate that transferable risk knowledge is primarily context-driven rather than size-driven. The proposed framework provides a robust foundation for future LLM- and graph-based risk prediction systems, enabling more transparent, scalable, and context-aware risk management in transportation infrastructure projects.
Publication Title
Advanced Engineering Informatics
Recommended Citation
Erfani, A.,
&
Naghdi, M.
(2026).
What drives risk similarity across construction projects? An explainable machine learning analysis using GPT-based embeddings.
Advanced Engineering Informatics,
73.
http://doi.org/10.1016/j.aei.2026.104543
Retrieved from: https://digitalcommons.mtu.edu/michigantech-p2/2426