Extracting full-field subpixel structural displacements from videos via deep learning

Document Type


Publication Date



Department of Mechanical Engineering-Engineering Mechanics


Conventional displacement sensing techniques (e.g., laser, linear variable differential transformer) have been widely used in structural health monitoring in the past two decades. Though these techniques are capable of measuring displacement time histories with high accuracy, distinct shortcoming remains such as point-to-point contact sensing which limits its applicability in real-world problems. Video cameras have been widely used in the past years due to advantages that include low price, agility, high spatial sensing resolution, and non-contact. Compared with target tracking approaches (e.g., digital image correlation, template matching, etc.), the phase-based method is powerful for detecting small subpixel motions without the use of paints or markers on the structure surface. Nevertheless, the complex computational procedure limits its real-time inference capacity. To address this fundamental issue, we develop a deep learning framework based on convolutional neural networks (CNNs) that enable real-time extraction of full-field subpixel structural displacements from videos. In particular, two new CNN architectures are designed and trained on a dataset generated by the phase-based motion extraction method from a single lab-recorded high-speed video of a dynamic structure. As displacement is only reliable in the regions with sufficient texture contrast, the sparsity of motion field induced by the texture mask is considered via the network architecture design and loss function definition. Results show that, with the supervision of full and sparse motion field, the trained network is capable of identifying the pixels with sufficient texture contrast as well as their subpixel motions. The performance of the trained networks is tested on various videos of other structures to extract the full-field motion (e.g., displacement time histories), which indicates that the trained networks have generalizability to accurately extract full-field subpixel displacements for pixels with sufficient texture contrast.

Publication Title

Journal of Sound and Vibration