Date of Award


Document Type

Open Access Dissertation

Degree Name

Doctor of Philosophy in Biochemistry and Molecular Biology (PhD)

Administrative Home Department

Department of Biological Sciences

Advisor 1

Stephen M. Techtmann

Committee Member 1

Paul D. Goetsch

Committee Member 2

Carsten Külheim

Committee Member 3

Ebenezer Tumban


Microbial ecosystems are complex, with hundreds of members interacting with each other and the environment. The intricate and hidden behaviors underlying these interactions make research questions challenging – but can be better understood through machine learning. However, most machine learning that is used in microbiome work is a black box form of investigation, where accurate predictions can be made, but the inner logic behind what is driving prediction is hidden behind nontransparent layers of complexity.

Accordingly, the goal of this dissertation is to provide an interpretable and in-depth machine learning approach to investigate microbial biogeography and to use micro-organisms as novel tools to detect geospatial location and object provenance (previous known origin). These contributions follow with a framework that allows extraction of interpretable metrics and actionable insights from microbiome-based machine learning models. The first part of this work provides an overview of machine learning in the context of microbial ecology, human microbiome studies and environmental monitoring – outlining common practice and shortcomings. The second part of this work demonstrates a field study to demonstrate how machine learning can be used to characterize patterns in microbial biogeography globally – using microbes from ports located around the world. The third part of this work studies the persistence and stability of natural microbial communities from the environment that have colonized objects (vessels) and stay attached as they travel through the water. Finally, the last part of this dissertation provides a robust framework for investigating the microbiome. This framework provides a reasonable understanding of the data being used in microbiome-based machine learning and allows researchers to better apprehend and interpret results.

Together, these extensive experiments assist an understanding of how to carry an in-silico design that characterizes candidate microbial biomarkers from real world settings to a rapid, field deployable diagnostic assay. The work presented here provides evidence for the use of microbial forensics as a toolkit to expand our basic understanding of microbial biogeography, microbial community stability and persistence in complex systems, and the ability of machine learning to be applied to downstream molecular detection platforms for rapid and accurate detection.