Date of Award

2025

Document Type

Open Access Dissertation

Degree Name

Doctor of Philosophy in Statistics (PhD)

Administrative Home Department

Department of Mathematical Sciences

Advisor 1

Kui Zhang

Committee Member 1

Qiuying Sha

Committee Member 2

Hairong Wei

Committee Member 3

Xiao Zhang

Abstract

Transcriptome-wide association studies (TWAS) have emerged as a powerful strategy to bridge genome-wide association studies (GWAS) with gene regulatory mechanisms by integrating genotypic data with gene expression data. While early TWAS methods typically rely on linear models and single-tissue expression references, recent advances underscore the need for flexible, multi-tissue approaches that can capture heterogeneous regulatory architectures and tissue-specific expression patterns. This dissertation introduces a three‑part research project that advances multi‑tissue transcriptome‑wide association studies (TWAS) along complementary axes of methodology, statistical power, and modelling flexibility.

In chapter One, TWAS‑CTL introduces a two‑stage cross‑tissue learner that trains any user‑chosen single‑tissue imputers (STLs) and then fuses their predictions with an empirical utility weight function that down‑weights poorly transferring tissues. Extensive simulations show that TWAS‑CTL can control the type I error rates and exceeds unified test for molecular signatures (UTMOST), one of the leading methods in this field, in power while cutting computational time by more than half. In the analysis of a GWAS cohort, it recovers more trait‑relevant genes than some existing benchmark works like PrediXcan (a foundational approach in the development of TWAS) and UTMOST.

In Chapter Two, GWAS-boosted cross-tissue learner (G‑Boost‑CTL) extends this framework by re‑weighting STLs with genotypic information extracted directly from the GWAS cohort—e.g., the cross‑sample variability of the imputed expression- so that tissues that carry stronger association signals are automatically emphasized. The dual weighting scheme preserves appropriate type I error rates yet delivers marked power gains over linear-penalized and covariance‑based tools across a wide spectrum of tissue‑sharing scenarios. G-Boost-CTL outperforms existing multi-tissue TWAS approaches in the analysis of a real data set as well by uncovering more statistically significant and biologically plausible disease loci.

In Chapter Three, we explore and replace the linear imputers that dominate TWAS with two non‑linear engines- gradient‑boosted trees and deep learning. Using data from the genotype-tissue expression project (GTEx) of 49 tissues, we show, through large‑scale simulation and real‑data analysis, that these learners maintain appropriate type I error rates but boost discovery, with improved powers for gradient-boosted trees and deep learning methods (e.g., deep neural networks) revealing more complementary, tissue‑specific signals.

Collectively, these studies demonstrate that (i) adaptive, cross‑tissue weighting, (ii) incorporation of GWAS‑derived information, and (iii) non‑linear advanced machine learning and deep learning imputers each confer substantial and largely orthogonal benefits. Taken together, they outline a scalable, modular blueprint for advanced multi‑tissue TWAS that more faithfully captures the complex, heterogeneous architecture of gene regulation and unlocks deeper insights into the molecular basis of human complex disease.

Available for download on Monday, August 03, 2026

Share

COinS