The Unreasonable Ineptitude of Deep Image Classification Networks

Document Type

Conference Proceeding

Publication Date



Department of Cognitive and Learning Sciences


The success of deep image classification networks has been met with enthusiasm and investment from both the academic community and industry. We hypothesize users will expect these systems to behave similarly to humans, and to succeed and fail in ways humans do. To investigate this, we tested six popular image classifiers on imagery from ten tool categories, examining how 17 visual transforms impacted both human and AI classification. Results showed that (1) none of the visual transforms we examined produced substantial impairment for human recognition; (2) human errors were limited to mostly to functional confusions; (3) almost all visual transforms impacted nearly every image classifier negatively and often catastrophically; (4) human expectations about performance of AI classifiers map more closely onto human error than AI performance; and (5) models trained with an enriched training set involving examples of the transformed imagery achieved improved performance but were not inoculated from error.

Publication Title

Proceedings of the Human Factors and Ergonomics Society Annual Meeting