Date of Award

2026

Document Type

Open Access Dissertation

Degree Name

Doctor of Philosophy in Computational Science and Engineering (PhD)

Administrative Home Department

College of Forest Resources and Environmental Science

Advisor 1

Hairong Wei

Committee Member 1

Kui Zhang

Committee Member 2

Qiuying Sha

Committee Member 3

Weihua Zhou

Abstract

This dissertation presents computational and AI-driven frameworks for identifying key regulatory genes and their downstream targets across plant and human biological systems. Three studies address distinct challenges in genomic regulation using advanced machine learning and bioinformatics approaches.

The first study introduces DyGAF (Dynamic Gene Attention Focus), a dual-attention transformer framework that identifies and ranks disease-relevant biomarker genes by simultaneously modeling independent molecular responses and interdependent regulatory network behavior. Two attention models provide complementary perspectives on gene importance and are fused through a novel combination metric. Applied to COVID-19 nasopharyngeal swab profiles, the attention-weighted representations achieved 94.23% classification accuracy, high sensitivity, and a cumulative mutual information of 17.13 nats across the selected feature set, outperforming conventional combination metrics,  and confirming that the learned biomarker weights capture biologically distinct transcriptional signatures. Pathway and functional enrichment analyses further validated its relevance to COVID-19 pathogenesis, outperforming differential expression and Random Forest based methods.

The second study introduces SignalPath-Finder, an AI-driven framework designed to identify downstream target genes of signaling complexes from heterogeneous public RNA-seq datasets without requiring targeted perturbation experiments. The framework applies a pseudo-peak transformation that reorders transcriptomic samples using TOR complex anchor genes as references, converting heterogeneous expression profiles into aligned bell-shaped patterns. Genes sharing distributional and structural similarity with TOR anchors are grouped via unsupervised clustering, and a cluster-wise autoencoder-based representation learning module ranks candidate downstream genes by their contribution to the learned latent manifold. Applied to 628 Populus trichocarpa RNA-seq samples across three tissue types, SignalPath-Finder recovered known TOR downstream genes with significantly higher enrichment than conventional methods including Spearman correlation, GENIE3, and TIGRESS, and identified novel downstream gene candidates supported by literature evidence across all tissues.

The third study addresses transcription factor regulation of regeneration in Arabidopsis thaliana. Using CollaborativeNet and Triple-Gene Mutual Interaction analysis on 78 RNA-seq samples, nine regeneration-associated subnetworks were identified and refined to three, from which six candidate transcription factors, WOX9A, LEC2, PGA37, WIP5, PEI1, and AIL1, were prioritized for their roles in somatic embryogenesis and regeneration.

Together, these studies advance scalable, AI-powered genomic frameworks for biomarker discovery, signaling pathway analysis, and regulatory network inference across diverse biological systems.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Share

COinS