Vision-Language Models for Highway Roadside Safety Management: A Comparative Study

Document Type

Article

Publication Date

1-1-2026

Abstract

Accurate and timely inspection of roadside infrastructure elements such as guardrails and rigid roadside objects is critical for proactive highway safety management. Although computer vision models offer a promising alternative to manual inspections, they often require large amounts of annotated data and may face challenges when applied across diverse roadway environments. With the rapid rise of multimodal large language models (MLLM), their potential to surpass traditional computer vision methods has gained significant attention. Yet, key questions remain regarding the extent of their performance advantages and the trade-offs involved. This study conducts a comparative evaluation of vision-language models (VLMs) to evaluate their semantic understanding capabilities for detecting key roadside features, benchmarking their performance against a deep learning approach. The results demonstrated that reasoning-based VLMs, particularly GPT-4.1 variants, achieve up to 99% accuracy on our data set in zero- and few-shot settings, outperforming convolutional neural network (CNN) baselines and highlighting the potential of prompt-driven visual reasoning for infrastructure safety management applications. Although VLMs offer strong performance without requiring labeled data for training, their application entails potential cost and data privacy concerns that must be considered in practice.

Publication Title

Journal of Management in Engineering

Share

COinS