TOKAC, NIHAN (2016) *Efficiency of Algorithms in Phylogenetics.* Doctoral thesis, Durham University.

| PDF - Accepted Version 3675Kb |

## Abstract

Phylogenetics is the study of evolutionary relationships between species. Phylogenetic

trees have long been the standard object used in evolutionary biology to illustrate how a

given set of species are related. There are some groups (including certain plant and fish

species) for which the ancestral history contains reticulation events, caused by processes that

include hybridization, lateral gene transfer, and recombination. For such groups of species, it

is appropriate to represent their ancestral history by phylogenetic networks: rooted acyclic

digraphs, where arcs represent lines of genetic inheritance and vertices of in-degree at least

two represent reticulation events. This thesis is concerned with the efficiency, accuracy, and

tractability of mathematical models for phylogenetic network methods.

Three important and related measures for summarizing the dissimilarity in phylogenetic

trees are the minimum number of hybridization events required to fit two phylogenetic trees

onto a single phylogenetic network (the hybridization number), the (rooted) subtree prune

and regraft distance (the rSPR distance) and the tree bisection and reconnection distance (the

TBR distance) between two phylogenetic trees. The respective problems of computing these

measures are known to be NP-hard, but also fixed-parameter tractable in their respective

natural parameters. This means that, while they are hard to compute in general, for cases

in which a parameter (here the hybridization number and rSPR/TBR distance, respectively)

is small, the problem can be solved efficiently even for large input trees. Here, we present

new analyses showing that the use of the “cluster reduction” rule – already defined for the

hybridization number and the rSPR distance and introduced here for the TBR distance – can

transform any O(f(p) · n)-time algorithm for any of these problems into an O(f(k) · n)-time

one, where n is the number of leaves of the phylogenetic trees, p is the natural parameter

and k is a much stronger (that is, smaller) parameter: the minimum level of a phylogenetic

network displaying both trees. These results appear in [9].

Traditional “distance based methods” reconstruct a phylogenetic tree from a matrix of pairwise

distances between taxa. A phylogenetic network is a generalization of a phylogenetic

tree that can describe evolutionary events such as reticulation and hybridization that are not

tree-like. Although evolution has been known to be more accurately modelled by a network

than a tree for some time, only recently have efforts been made to directly reconstruct a

phylogenetic network from sequence data, as opposed to reconstructing several trees first and then trying to combine them into a single coherent network. In this work, we present

a generalisation of the UPGMA algorithm for ultrametric tree reconstruction which can

accurately reconstruct ultrametric tree-child networks from the set of distinct distances

between each pair of taxa. This result will also appear in [15]. Moreover, we analyse the

safety radius of the NETWORKUPGMA algorithm and show that it has safety radius 1/2.

This means that if we can obtain accurate estimates of the set of distances between each pair

of taxa in an ultrametric tree-child network, then NETWORKUPGMA correctly reconstructs

the true network.

Item Type: | Thesis (Doctoral) |
---|---|

Award: | Doctor of Philosophy |

Keywords: | phylogenetic networks, hybridization number, rSPR distance, TBR distance, cluster reduction, UPGMA, ultrametric networks |

Faculty and Department: | Faculty of Science > Engineering and Computing Science, School of |

Thesis Date: | 2016 |

Copyright: | Copyright of this thesis is held by the author |

Deposited On: | 26 Sep 2016 11:18 |