Unraveling Patch Size Effects in Vision Transformers: Adversarial Robustness in Hyperspectral Image Classification

Document Type

Article

Publication Date

2-1-2026

Abstract

Highlights: This work investigates the effect of spatial patch size on the classification accuracy and adversarial robustness of Vision Transformer-based architectures for hyperspectral image analysis. What are the main findings? Smaller patch sizes generally exhibit stronger adversarial robustness while maintaining comparable clean classification performance. Larger patch sizes tend to reduce robustness by increasing sensitivity to localized adversarial perturbations, with some dataset-dependent variations. What are the implications of the main findings? Spatial patch size is an important design consideration when applying Vision Transformers to hyperspectral image classification tasks. The findings provide practical guidance for informed patch-size selection in robust, deployment-aware transformer-based hyperspectral image classification models. Vision Transformers (ViTs) have demonstrated strong performance in hyperspectral image (HSI) classification; however, their robustness is highly sensitive to patch size. This study investigates the impact of spatial patch size on clean accuracy and adversarial robustness using a standard ViT and a Channel Attention Fusion variant (ViT-CAF). Patch sizes from 1 × 1 to 19 × 19 are evaluated across four benchmark datasets under FGSM, BIM, CW, PGD, and RFGSM attacks. Descriptive results show that smaller patches, particularly 1 × 1 and 3 × 3, generally yield higher adversarial accuracy, while larger patches amplify localized perturbations and degrade robustness. Parameter analysis indicates that patch-size-dependent variations arise mainly from the embedding layer, with the Transformer backbone remaining fixed, confirming that robustness differences are driven primarily by spatial context rather than model capacity. These findings reveal a trade-off between spatial granularity and adversarial resilience and provide guidance for patch size selection in ViT-based HSI applications.

Publication Title

Remote Sensing

Share

COinS