Document Type

Article

Publication Date

9-19-2023

Department

Department of Mathematical Sciences

Abstract

Variable selection and graphical modeling play essential roles in highly correlated and high-dimensional (HCHD) data analysis. Variable selection methods have been developed under both parametric and nonparametric model settings. However, variable selection for nonadditive, nonparametric regression with high-dimensional variables is challenging due to complications in modeling unknown dependence structures among HCHD variables. Gaussian graphical models are a popular and useful tool for investigating the conditional dependence between variables via estimating sparse precision matrices. For a given class of interest, the estimated precision matrices can be mapped onto networks for visualization. However, the limitation of Gaussian graphical models is that they are only applicable to discretized response variables and for the case when (Formula presented.), where (Formula presented.) is the number of variables and (Formula presented.) is the sample size. They are necessary to develop a joint method for variable selection and graphical modeling. To the best of our knowledge, the methods for simultaneously selecting variable selection and estimating networks among variables in the semiparametric regression settings are quite limited. Hence, in this paper, we develop a joint semiparametric kernel network regression method to solve this limitation and to provide a connection between them. Our approach is a unified and integrated method that can simultaneously identify important variables and build a network among those variables. We developed our approach under a semiparametric kernel machine regression framework, which can allow for nonlinear or nonadditive associations and complicated interactions among the variables. The advantages of our approach are that it can (1) simultaneously select variables and build a network among HCHD variables under a regression setting; (2) model unknown and complicated interactions among the variables and estimate the network among these variables; (3) allow for any form of semiparametric model, including non-additive, nonparametric model; and (4) provide an interpretable network that considers important variables and a response variable. We demonstrate our approach using a simulation study and real application on genetic pathway-based analysis.

Publisher's Statement

© 2023 The Authors. Publisher’s version of record: https://doi.org/10.1002/sim.9910

Publication Title

Statistics in Medicine

Version

Publisher's PDF

Included in

Mathematics Commons

Share

COinS