Identifying genetic markers for a range of phylogenetic utility–From species to family level

Bokyung Choi, The Australian National University
Michael D. Crisp, The Australian National University
Lyn G. Cook, The University of Queensland
Karen Meusemann, University of Freiburg
Robert D. Edwards, The University of Queensland
Alicia Toon, The University of Queensland
Carsten Külheim, Michigan Technological University

Article is deposited here in compliance with publisher policies. Publisher's version of record: https://doi.org/10.1371/journal.pone.0218995

This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the CreativeCommons CC0 public domain dedication.

Abstract

Resolving the phylogenetic relationships of closely related species using a small set of loci is challenging as sufficient information may not be captured from a limited sample of the genome. Relying on few loci can also be problematic when conflict between gene-trees arises from incomplete lineage sorting and/or ongoing hybridization, problems especially likely in recently diverged lineages. Here, we developed a method using limited genomic resources that allows identification of many low copy candidate loci from across the nuclear and chloroplast genomes, design probes for target capture and sequence the captured loci. To validate our method we present data from Eucalyptus and Melaleuca, two large and phylogenetically problematic genera within the Myrtaceae family. With one annotated genome, one transcriptome and two whole-genome shotgun sequences of one Eucalyptus and four Melaleuca species, respectively, we identified 212 loci representing 263 kbp for targeted sequence capture and sequencing. Of these, 209 were successfully tested from 47 samples across five related genera of Myrtaceae. The average percentage of reads mapped back to the reference was 57.6% with coverage of more than 20 reads per position across 83.5% of the data. The methods developed here should be applicable across a large range of taxa across all kingdoms. The core methods are very flexible, providing a platform for various genomic resource availabilities and are useful from shallow to deep phylogenies.