Pavement Condition Classification: Empirical Evidence from Machine Learning Models with Data Augmentation

Document Type

Article

Publication Date

5-22-2026

Department

Department of Civil, Environmental, and Geospatial Engineering

Abstract

Accurate prediction of pavement conditions is critical for resource allocation in transportation asset management. This study develops and evaluates machine learning classification models to predict pavement roughness classes, using a nationwide dataset from the U.S. National Highway System encompassing varied climates, traffic loads, and roadway types. Four algorithms: Logistic Regression, Support Vector Machine, Random Forest, and eXtreme Gradient Boosting (XGBoost) were compared, with XGBoost selected as the baseline due to its balanced performance. Across the four models, macro-averaged F1-scores ranged from 0.87 to 0.89, indicating comparable baseline performance on imbalanced data. To address class imbalance, four data augmentation methods: SMOTE, SMOTENC, CTGAN and TabDDPM were tested. The results indicate that augmentation did not improve performance. SMOTE and SMOTENC showed minor reductions in macro-averaged metrics, while CTGAN and TabDDPM yielded F1-scores comparable to the baseline. Confusion-matrix analysis shows that CTGAN slightly improved classification of the good class, whereas TabDDPM improved the Acceptable class. Synthetic data quality assessment confirmed that CTGAN and TabDDPM reproduced data distributions with reasonable fidelity but did not improve predictive outcomes. The findings suggest that classification models can reliably support pavement condition forecasting, while augmentation techniques offer limited advantages under the tested conditions in this case.

Publication Title

International Journal of Pavement Research and Technology

Share

COinS