CNN-LSTM deep learning architecture for computer vision-based modal frequency detection

Document Type


Publication Date



Department of Mechanical Engineering-Engineering Mechanics


The conventional modal analysis involves physically-attached wired or wireless sensors for vibration measurement of structures. However, this method has certain disadvantages, owing to the sensor’s weight and its low spatial resolution, which limits the analysis precision or the high cost of optical vibration sensors. Besides, the sensor installation and calibration in itself is a time consuming and labor-intensive process. Non-contact computer vision-based vibration measurement techniques can address the shortcomings mentioned above. In this paper, we introduce CNN-LSTM (Convolutional Neural Network, Long Short-Term Memory) deep learning based approach that can serve as a backbone for computer vision-based vibration measurement techniques. The key idea is to use each pixel of an image taken from an off the shelf camera, encapsulating the Spatio-temporal information, like a sensor to capture the modal frequencies of a vibrating structure. Non-contact “pixel-sensor” does not alter the system’s dynamics and is relatively low-cost, agile, and provides measurements with very high spatial resolution. Our computer vision-based deep learning model takes the video of a vibrating structure as input and outputs the fundamental modal frequencies. We demonstrate, using reliable empirical results, that “pixel-sensor” is more efficient, autonomous, and accurate. Robustness of the deep learning model has been put to the test by using specimens of a variety of materials, and varying dimensions and results have shown high levels of sensing accuracy.

Publication Title

Mechanical Systems and Signal Processing