What drives risk similarity across construction projects? An explainable machine learning analysis using GPT-based embeddings

Document Type

Article

Publication Date

7-2026

Department

Department of Civil, Environmental, and Geospatial Engineering

Abstract

Risk registers contain rich experiential knowledge, yet existing approaches struggle to systematically compare risks across projects and translate past lessons into actionable insights. This study introduces an explainable, data-driven framework for quantifying cross-project risk similarity in transportation construction by integrating GPT-based text embeddings, ensemble learning, and explainable artificial intelligence (XAI). Using over 3500 risk items from 72 transportation projects, the framework measures semantic similarity between project risk profiles and models how project characteristics shape similarity patterns. Ensemble models achieve strong predictive performance (R2=0.85), while XAI analysis reveals that risk documentation practices, geographic context, and delivery method dominate similarity outcomes, outweighing project scale or project type. These findings demonstrate that transferable risk knowledge is primarily context-driven rather than size-driven. The proposed framework provides a robust foundation for future LLM- and graph-based risk prediction systems, enabling more transparent, scalable, and context-aware risk management in transportation infrastructure projects.

Publication Title

Advanced Engineering Informatics

Share

COinS