Abstract
Time trees are evolutionary histories (also called phylogenies) that include times of evolutionary events. They arise in many applications, including cancer and virus evolution. Of particular interest are clock-like trees, where all leaves have equal distance to the root. These can be randomly sampled using the coalescent model, and a number of software packages for reconstructing trees from sequence data do so.
Most such inference methods use tree search algorithms, which require a tree space over which the inference is performed. These typically output a distribution of trees, which needs to be interpreted. Currently, most methods use consensus trees to summarise the output. Statistical methods such as mean trees and confidence regions would be preferable, however such methods are largely undeveloped for tree spaces.
It is essential for the development of such methods to explore the geometry
of tree spaces. Most tree spaces are based on tree rearrangement operations, which apply local changes to a tree to propose trees that are similar to a given tree. Popular tree rearrangements are Nearest Neighbour Interchange, Subtree Prune and Regraft, and Tree Bisection and Reconnection. For all three tree rearrangements, the problem of computing distances, which are defined as the minimum number of tree rearrangements needed to transform one tree into the other, is NP-hard, making tree inference and comparison
algorithms challenging to design in practice.
In this thesis, we introduce discrete coalescent trees, a discretisation of time trees that is motivated by the coalescent model. We then define tree rearrangement operations on discrete coalescent trees, which leads to a new tree space DCTm. We analyse this tree space, focussing on properties that are essential to establish statistical measures such as mean trees and confidence regions. Our results include a polynomial-time algorithm for computing shortest paths in DCTm, making this the first tree rearrangement based tree space in which distances can be computed efficiently. We also analyse geometrical properties of our tree space DCTm, and shortest paths within the space. As a special case of discrete coalescent trees we consider ranked trees. We also discuss unlabelled time trees and two different tree spaces that result from considering tree rearrangement operations on them, one of which can be interpreted as the unlabelled version of DCTm.