Jensen–Shannon divergence based novel loss functions for Bayesian neural networks
Document Type
Article
Publication Date
2-14-2025
Department
Department of Mechanical Engineering-Engineering Mechanics; Institute of Computing and Cybersystems; Department of Mathematical Sciences
Abstract
Bayesian neural networks (BNNs) are state-of-the-art machine learning methods that can naturally regularize and systematically quantify uncertainties using their stochastic parameters. Kullback–Leibler (KL) divergence-based variational inference used in BNNs suffer from unstable optimization and challenges in approximating light-tailed posteriors due to the unbounded nature of the KL divergence. To resolve these issues, we formulate a novel loss function for BNNs based on a new modification to the generalized Jensen–Shannon (JS) divergence, which is bounded. In addition, we propose a Geometric JS divergence-based loss, which is computationally efficient since it can be evaluated analytically. We found that the JS divergence-based variational inference is intractable, and hence employed a constrained optimization framework to formulate these losses. Our theoretical analysis and empirical experiments on multiple regression and classification data sets suggest that the proposed losses perform better than the KL divergence-based loss, especially when the data sets are noisy or biased. Specifically, there are approximately 5% and 8% improvements in accuracy for a noise-added CIFAR-10 dataset and a regression dataset, respectively. There is about 13% reduction in false negative predictions of a biased histopathology dataset. In addition, we quantify and compare the uncertainty metrics for the regression and classification tasks.
Publication Title
Neurocomputing
Recommended Citation
Thiagarajan, P.,
&
Ghosh, S.
(2025).
Jensen–Shannon divergence based novel loss functions for Bayesian neural networks.
Neurocomputing,
618.
http://doi.org/10.1016/j.neucom.2024.129115
Retrieved from: https://digitalcommons.mtu.edu/michigantech-p2/1361