Generative EO/IR multi-scale vision transformer for improved object detection
Document Type
Conference Proceeding
Publication Date
6-7-2024
Department
Michigan Tech Research Institute; Department of Computer Science
Abstract
For certain objects, panchromatic or 3-band (RGB) imagery may be insufficient to achieve accurate object identification, thus, additional bandwidths within the infrared (IR) spectrum may be needed to exploit unique spectral characteristics for improving object detection. Many of the existing generative modeling techniques are applied solely to the visible wavelengths. A need exists to fully explore the application of generative modeling techniques to multispectral imagery (MSI) and specifically the IR bands. Generative models used for data augmentation for object detection must have sufficient fidelity to avoid generating data that are out of distribution with respect to actual measured data, or that contain systemic bias or artifacts. This work demonstrates the utility of a conditionally generative, multi-scale vision transformer that learns the spatial and spectral structures and the interactions between them in order to accurately synthesize near-infrared (NIR) and short-wave infrared (SWIR) data from RGB. This synthesis is performed over a diverse set of target objects observed over multiple seasons, at multiple look angles, over varying terrains, with images sampled globally from multiple satellites. For both training and inference, the model is provided no contextual information or metadata as input. Compared to using RGB alone, the average precision (AP) of an off-the-shelf object detection model (YOLOv5) trained with the additional synthesized IR data improves by up to 48% on a target class that is difficult for an analyst to identify. In conjunction with RGB data, using synthetic instead of true IR data for object detection provides higher AP values over all target classes.
Publication Title
Proceedings of SPIE - The International Society for Optical Engineering
ISBN
[9781510673885]
Recommended Citation
Christian, J.,
Bright, M.,
Summers, J.,
Olson, A.,
&
Havens, T. C.
(2024).
Generative EO/IR multi-scale vision transformer for improved object detection.
Proceedings of SPIE - The International Society for Optical Engineering,
13035.
http://doi.org/10.1117/12.3023596
Retrieved from: https://digitalcommons.mtu.edu/michigantech-p2/954