Statistical Methods for Inferring Recombination in Bacterial Genomes

Publication Type:
Thesis
Issue Date:
2022
Full metadata record
Homologous recombination events in bacterial genomes have wide-ranging effects on public health in society. This phenomenon is a significant factor explaining the prevalence of antimicrobial resistance. When recombination occurs in bacteria, a segment of foreign DNA is introduced into its chromosome. This evolutionary mechanism can give rise to antibiotic resistance. On the other hand, reconstructing bacteria's evolutionary history in the presence of recombination is notoriously tricky. Phylogenetic trees investigate evolutionary history and relationships between organisms, which is essential for understanding and analyzing natural processes. These trees are required tools for numerous fundamental and practical research. Recombination detection in a bacterial genome is one application of phylogenetic trees. Advanced phylogenetic inference methods, such as maximum likelihood and Bayesian inference, use probabilistic models that are computationally expensive, especially for large datasets. Detecting the boundary of recombination events and reconstructing a global phylogenetic tree illustrating the underlying evolutionary pattern of biological sequences has never been a straightforward problem. At the same time, the rapidly growing number of bacterial whole-genomes has produced an extra challenge for the computational approaches to reconstructing fast and accurate phylogenetic trees with the presence of recombination. In this thesis, we introduce PhiloBacter, a maximum likelihood-based tool to detect recombination in bacterial genomes and account for it during phylogenetic reconstruction. Specifically, it estimates the probability of each site in an alignment to be recombinant. We then presented two approaches to incorporate these probabilities to infer the clonal history of these genomes. The first borrows ideas from sequencing error estimation, and the other uses mixtures of matrices to account for uncertainty introduced through recombination. We also present a new simulation tool, BaciSim, for bacterial genomes that undergo recombination. Finally, we developed a software pipeline for the semi-automatic identification of recombination and reconstruction of a phylogenetic tree from an alignment bacterial genome. Using simulated datasets, we investigated the accuracy and reliability of our approach to detect recombination events and to get better estimates of the clonal history of a collection of genomes that underwent recombination. We benchmarked our methods with other widely used methods (Gubbins and ClonalFrameML). Our simulations show that PhiloBacter tends to outperform these two methods.
Please use this identifier to cite or link to this item: