Date of Award


Document Type

Campus Access Dissertation

Degree Name

Doctor of Philosophy in Statistics (PhD)

Administrative Home Department

Department of Mathematical Sciences

Advisor 1

Qiuying Sha

Committee Member 1

Kui Zhang

Committee Member 2

Xiao Zhang

Committee Member 3

Weihua Zhou


This dissertation includes three Chapters. A brief description of each chapter is organized as follows:

In chapter 1, we propose a method called External Controls into Association Test by Regression Calibration (iECAT-RC) for case-control association studies. It integrates external control samples into a case-control study to boost the power by eliminate the systematic differences (batch effects) between studies, such as differences in sequencing platforms and genotype-calling procedures. Extensive simulation studies demonstrate that iECAT-RC effectively controls type I error rates and improves statistical power across various models. We apply iECAT-RC to UK Biobank data on M72 Fibroblastic disorders, treating genotype calling as a batch effect. Our method identifies four SNPs associated with fibroblastic disorders, with higher sensitivity compared to two other methods, iECAT-Score and Internal, especially in unbalanced case-control scenarios.

In Chapter 2, we propose a new method called Meta-TOW-S, designed to conduct joint association tests between multiple correlated phenotypes and a set of variants, such as those within a gene. This method utilizes GWAS summary statistics from different cohorts. Our approach employs the set-based method that Tests for the effect of an Optimal Weighted combination of variants in a gene (TOW) and accounts for sample size differences across GWAS cohorts by employing the Cauchy combination method. Meta-TOW-S combines the advantages of set-based tests and multi-phenotype association tests, exhibiting computational efficiency and enabling analysis across multiple correlated phenotypes while accommodating overlapping samples from different GWAS cohorts.

In Chapter 3, we propose NetPRS, an innovative approach that enhances Polygenic Risk Score (PRS) prediction by integrating network annotation information into a penalized regression model. This network annotation is derived from a genotype-phenotype bipartite network (GPN), where SNPs and traits are linked based on their association strengths from GWAS summary statistics. By leveraging this network annotation, NetPRS integrates information from related traits into the PRS prediction for the target trait.

Available for download on Monday, July 01, 2024