Document Type


Publication Date



College of Forest Resources and Environmental Science; Department of Mathematical Sciences


Gene expression data features high dimensionality, multicollinearity, and non-Gaussian distribution noise, posing hurdles for identification of true regulatory genes controlling a biological process or pathway. In this study, we integrated the Huber loss function and the Berhu penalty (HB) into partial least squares (PLS) framework to deal with the high dimension and multicollinearity property of gene expression data, and developed a new method called HB-PLS regression to model the relationships between regulatory genes and pathway genes. To solve the Huber-Berhu optimization problem, an accelerated proximal gradient descent algorithm with at least 10 times faster than the general convex optimization solver (CVX), was developed. Application of HB-PLS to recognize pathway regulators of lignin biosynthesis and photosynthesis in Arabidopsis thaliana led to the identification of many known positive pathway regulators that had previously been experimentally validated. As compared to sparse partial least squares (SPLS) regression, an efficient method for variable selection and dimension reduction in handling multicollinearity, HB-PLS has higher efficacy in identifying more positive known regulators, a much higher but slightly less sensitivity/(1-specificity) in ranking the true positive known regulators to the top of the output regulatory gene lists for the two aforementioned pathways. In addition, each method could identify some unique regulators that cannot be identified by the other methods. Our results showed that the overall performance of HB-PLS slightly exceeds that of SPLS but both methods are instrumental for identifying real pathway regulators from high-throughput gene expression data, suggesting that integration of statistics, machine leaning and convex optimization can result in a method with high efficacy and is worth further exploration.

Publisher's Statement

Copyright: © 2021 by the author(s). Exclusive Licensee Maximum Academic Press, Fayetteville, GA. This article is an open access article distributed under Creative Commons Attribution License (CC BY 4.0), visit https:// Publisher’s version of record:

Publication Title

Forestry Research

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.


Publisher's PDF



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.