Michigan Tech Publications

Understanding Confusion: A Case Study of Training a Machine Model to Predict and Interpret Consensus From Volunteer Labels

Ramanakumar Sankar, University of California, Berkeley
Kameswara Mantha, University of Minnesota Twin Cities
Cooper Nesmith, University of Minnesota Twin Cities
Lucy Fortson
Shawn R. Brueshaber, Michigan Technological UniversityFollow
Candice Hansen-Koharcheck, Planetary Science Institute
Glenn Orton, California Institute of Technology

Document Type

Article

Publication Date

12-9-2024

Department

Department of Mechanical and Aerospace Engineering

Abstract

Citizen science has become a valuable and reliable method for interpreting and processing big datasets, and is vital in the era of ever-growing data volumes. However, there are inherent difficulties in the generating labels from citizen scientists, due to the inherent variability between the members of the crowd, leading to variability in the results. Sometimes, this is useful — such as with serendipitous discoveries, which corresponds to rare/unknown classes in the data — but it might also be due to ambiguity between classes. The primary issue is then to distinguish between the intrinsic variability in the dataset and the uncertainty in the citizen scientists’ responses, and leveraging that to extract scientifically useful relationships. In this paper, we explore using a neural network to interpret volunteer confusion across the dataset, to increase the purity of the downstream analysis. We focus on the use of learned features from the network to disentangle feature similarity across the classes, and the ability of the machines’ “attention” in identifying features that lead to confusion. We use data from Jovian Vortex Hunter, a citizen science project to study vortices in Jupiter’s atmosphere, and find that the latent space from the model helps effectively identify different sources of image-level features that lead to low volunteer consensus. Furthermore, the machine’s attention highlights features corresponding to specific classes. This provides meaningful image-level feature-class relationships, which is useful in our analysis for identifying vortex-specific features to better understand vortex evolution mechanisms. Finally, we discuss the applicability of this method to other citizen science projects.

Publisher's Statement

Publication Title

Citizen Science: Theory and Practice

Recommended Citation

Sankar, R., Mantha, K., Nesmith, C., Fortson, L., Brueshaber, S., Hansen-Koharcheck, C., & Orton, G. (2024). Understanding Confusion: A Case Study of Training a Machine Model to Predict and Interpret Consensus From Volunteer Labels. Citizen Science: Theory and Practice, 9(1). http://doi.org/10.5334/cstp.731
Retrieved from: https://digitalcommons.mtu.edu/michigantech-p2/1300

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Version

Publisher's PDF

Download

Included in

Aerospace Engineering Commons, Mechanical Engineering Commons

COinS

Michigan Tech Publications

Understanding Confusion: A Case Study of Training a Machine Model to Predict and Interpret Consensus From Volunteer Labels

Document Type

Publication Date

Department

Abstract

Publisher's Statement

Publication Title

Recommended Citation

Creative Commons License

Version

Included in

LINKS

Browse

Search

Author Corner

Michigan Tech Publications

Understanding Confusion: A Case Study of Training a Machine Model to Predict and Interpret Consensus From Volunteer Labels

Authors

Document Type

Publication Date

Department

Abstract

Publisher's Statement

Publication Title

Recommended Citation

Creative Commons License

Version

Included in

Share

LINKS

Browse

Search

Author Corner