TOKAC, NIHAN (2016) Efficiency of Algorithms in Phylogenetics. Doctoral thesis, Durham University.
| PDF - Accepted Version 3675Kb |
Abstract
Phylogenetics is the study of evolutionary relationships between species. Phylogenetic
trees have long been the standard object used in evolutionary biology to illustrate how a
given set of species are related. There are some groups (including certain plant and fish
species) for which the ancestral history contains reticulation events, caused by processes that
include hybridization, lateral gene transfer, and recombination. For such groups of species, it
is appropriate to represent their ancestral history by phylogenetic networks: rooted acyclic
digraphs, where arcs represent lines of genetic inheritance and vertices of in-degree at least
two represent reticulation events. This thesis is concerned with the efficiency, accuracy, and
tractability of mathematical models for phylogenetic network methods.
Three important and related measures for summarizing the dissimilarity in phylogenetic
trees are the minimum number of hybridization events required to fit two phylogenetic trees
onto a single phylogenetic network (the hybridization number), the (rooted) subtree prune
and regraft distance (the rSPR distance) and the tree bisection and reconnection distance (the
TBR distance) between two phylogenetic trees. The respective problems of computing these
measures are known to be NP-hard, but also fixed-parameter tractable in their respective
natural parameters. This means that, while they are hard to compute in general, for cases
in which a parameter (here the hybridization number and rSPR/TBR distance, respectively)
is small, the problem can be solved efficiently even for large input trees. Here, we present
new analyses showing that the use of the “cluster reduction” rule – already defined for the
hybridization number and the rSPR distance and introduced here for the TBR distance – can
transform any O(f(p) · n)-time algorithm for any of these problems into an O(f(k) · n)-time
one, where n is the number of leaves of the phylogenetic trees, p is the natural parameter
and k is a much stronger (that is, smaller) parameter: the minimum level of a phylogenetic
network displaying both trees. These results appear in [9].
Traditional “distance based methods” reconstruct a phylogenetic tree from a matrix of pairwise
distances between taxa. A phylogenetic network is a generalization of a phylogenetic
tree that can describe evolutionary events such as reticulation and hybridization that are not
tree-like. Although evolution has been known to be more accurately modelled by a network
than a tree for some time, only recently have efforts been made to directly reconstruct a
phylogenetic network from sequence data, as opposed to reconstructing several trees first and then trying to combine them into a single coherent network. In this work, we present
a generalisation of the UPGMA algorithm for ultrametric tree reconstruction which can
accurately reconstruct ultrametric tree-child networks from the set of distinct distances
between each pair of taxa. This result will also appear in [15]. Moreover, we analyse the
safety radius of the NETWORKUPGMA algorithm and show that it has safety radius 1/2.
This means that if we can obtain accurate estimates of the set of distances between each pair
of taxa in an ultrametric tree-child network, then NETWORKUPGMA correctly reconstructs
the true network.
Item Type: | Thesis (Doctoral) |
---|---|
Award: | Doctor of Philosophy |
Keywords: | phylogenetic networks, hybridization number, rSPR distance, TBR distance, cluster reduction, UPGMA, ultrametric networks |
Faculty and Department: | Faculty of Science > Engineering and Computing Science, School of (2008-2017) |
Thesis Date: | 2016 |
Copyright: | Copyright of this thesis is held by the author |
Deposited On: | 26 Sep 2016 11:18 |