Michigan Tech Publications

Unraveling Patch Size Effects in Vision Transformers: Adversarial Robustness in Hyperspectral Image Classification

Document Type

Article

Publication Date

2-20-2026

Department

Department of Applied Computing

Abstract

Highlights: This work investigates the effect of spatial patch size on the classification accuracy and adversarial robustness of Vision Transformer-based architectures for hyperspectral image analysis. What are the main findings? Smaller patch sizes generally exhibit stronger adversarial robustness while maintaining comparable clean classification performance. Larger patch sizes tend to reduce robustness by increasing sensitivity to localized adversarial perturbations, with some dataset-dependent variations. What are the implications of the main findings? Spatial patch size is an important design consideration when applying Vision Transformers to hyperspectral image classification tasks. The findings provide practical guidance for informed patch-size selection in robust, deployment-aware transformer-based hyperspectral image classification models. Vision Transformers (ViTs) have demonstrated strong performance in hyperspectral image (HSI) classification; however, their robustness is highly sensitive to patch size. This study investigates the impact of spatial patch size on clean accuracy and adversarial robustness using a standard ViT and a Channel Attention Fusion variant (ViT-CAF). Patch sizes from 1 × 1 to 19 × 19 are evaluated across four benchmark datasets under FGSM, BIM, CW, PGD, and RFGSM attacks. Descriptive results show that smaller patches, particularly 1 × 1 and 3 × 3, generally yield higher adversarial accuracy, while larger patches amplify localized perturbations and degrade robustness. Parameter analysis indicates that patch-size-dependent variations arise mainly from the embedding layer, with the Transformer backbone remaining fixed, confirming that robustness differences are driven primarily by spatial context rather than model capacity. These findings reveal a trade-off between spatial granularity and adversarial resilience and provide guidance for patch size selection in ViT-based HSI applications.

Publisher's Statement

Publication Title

Remote Sensing

Recommended Citation

Chandrappa, S., Paheding, S., & Reyes-Angulo, A. A. (2026). Unraveling Patch Size Effects in Vision Transformers: Adversarial Robustness in Hyperspectral Image Classification. Remote Sensing, 18(4). http://doi.org/10.3390/rs18040656
Retrieved from: https://digitalcommons.mtu.edu/michigantech-p2/2389

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Version

Publisher's PDF

Download

Included in

Computer Sciences Commons

COinS

Michigan Tech Publications

Unraveling Patch Size Effects in Vision Transformers: Adversarial Robustness in Hyperspectral Image Classification

Document Type

Publication Date

Department

Abstract

Publisher's Statement

Publication Title

Recommended Citation

Creative Commons License

Version

Included in

LINKS

Browse

Search

Author Corner

Michigan Tech Publications

Unraveling Patch Size Effects in Vision Transformers: Adversarial Robustness in Hyperspectral Image Classification

Authors

Document Type

Publication Date

Department

Abstract

Publisher's Statement

Publication Title

Recommended Citation

Creative Commons License

Version

Included in

Share

LINKS

Browse

Search

Author Corner