Date of Award

2025

Document Type

Open Access Dissertation

Degree Name

Doctor of Philosophy in Statistics (PhD)

Administrative Home Department

Department of Mathematical Sciences

Advisor 1

Kui Zhang

Committee Member 1

Qiuying Sha

Committee Member 2

Hairong Wei

Committee Member 3

Xiao Zhang

Abstract

Transcriptome-wide association studies (TWAS) have emerged as a powerful strategy to bridge genome-wide association studies (GWAS) with gene regulatory mechanisms by integrating genotypic data with gene expression data. While early TWAS methods typically rely on linear models and single-tissue expression references, recent advances underscore the need for flexible, multi-tissue approaches that can capture heterogeneous regulatory architectures and tissue-specific expression patterns. This dissertation introduces a three‑part research project that advances multi‑tissue transcriptome‑wide association studies (TWAS) along complementary axes of methodology, statistical power, and modelling flexibility.

In chapter One, TWAS‑CTL introduces a two‑stage cross‑tissue learner that trains any user‑chosen single‑tissue imputers (STLs) and then fuses their predictions with an empirical utility weight function that down‑weights poorly transferring tissues. Extensive simulations show that TWAS‑CTL can control the type I error rates and exceeds unified test for molecular signatures (UTMOST), one of the leading methods in this field, in power while cutting computational time by more than half. In the analysis of a GWAS cohort, it recovers more trait‑relevant genes than some existing benchmark works like PrediXcan (a foundational approach in the development of TWAS) and UTMOST.

In Chapter Two, GWAS-boosted cross-tissue learner (G‑Boost‑CTL) extends this framework by re‑weighting STLs with genotypic information extracted directly from the GWAS cohort—e.g., the cross‑sample variability of the imputed expression- so that tissues that carry stronger association signals are automatically emphasized. The dual weighting scheme preserves appropriate type I error rates yet delivers marked power gains over linear-penalized and covariance‑based tools across a wide spectrum of tissue‑sharing scenarios. G-Boost-CTL outperforms existing multi-tissue TWAS approaches in the analysis of a real data set as well by uncovering more statistically significant and biologically plausible disease loci.

In Chapter Three, we explore and replace the linear imputers that dominate TWAS with two non‑linear engines- gradient‑boosted trees and deep learning. Using data from the genotype-tissue expression project (GTEx) of 49 tissues, we show, through large‑scale simulation and real‑data analysis, that these learners maintain appropriate type I error rates but boost discovery, with improved powers for gradient-boosted trees and deep learning methods (e.g., deep neural networks) revealing more complementary, tissue‑specific signals.

Collectively, these studies demonstrate that (i) adaptive, cross‑tissue weighting, (ii) incorporation of GWAS‑derived information, and (iii) non‑linear advanced machine learning and deep learning imputers each confer substantial and largely orthogonal benefits. Taken together, they outline a scalable, modular blueprint for advanced multi‑tissue TWAS that more faithfully captures the complex, heterogeneous architecture of gene regulation and unlocks deeper insights into the molecular basis of human complex disease.

Recommended Citation

Billah, Md Mutasim, "METHODS IN STATISTICS, MACHINE LEARNING, AND DEEP LEARNING FOR COMBINING MULTI-OMICS DATASET", Open Access Dissertation, Michigan Technological University, 2025.

https://doi.org/10.37099/mtu.dc.etdr/1959

Download

Available for download on Monday, August 03, 2026

Included in

Applied Statistics Commons, Bioinformatics Commons, Biostatistics Commons, Genetics Commons, Genomics Commons, Statistical Methodology Commons, Statistical Models Commons

COinS

ORCID

0009-0003-2235-8388

Dissertations, Master's Theses and Master's Reports

METHODS IN STATISTICS, MACHINE LEARNING, AND DEEP LEARNING FOR COMBINING MULTI-OMICS DATASET

Date of Award

Document Type

Degree Name

Administrative Home Department

Advisor 1

Committee Member 1

Committee Member 2

Committee Member 3

Abstract

Recommended Citation

Included in

ORCID

LINKS

Browse

Search

Author Corner

Dissertations, Master's Theses and Master's Reports

METHODS IN STATISTICS, MACHINE LEARNING, AND DEEP LEARNING FOR COMBINING MULTI-OMICS DATASET

Author

Date of Award

Document Type

Degree Name

Administrative Home Department

Advisor 1

Committee Member 1

Committee Member 2

Committee Member 3

Abstract

Recommended Citation

Included in

Share

ORCID

LINKS

Browse

Search

Author Corner