Bootstrap tests
Bootstrap tests [Felsenstein, 1985] is one of the most common ways
to evaluate the reliability of the topology of a phylogenetic tree. In
a bootstrap test, trees are evaluated using Efron's resampling
technique [Efron, 1982], which samples nucleotides from the original
set of sequences as follows:
Given an alignment of sequences
(rows) of length
(columns), we randomly choose
columns in the
alignment with replacement and use them to create a new alignment. The
new alignment has
rows and
columns just like the original
alignment but it may contain duplicate columns and some columns in the
original alignment may not be included in the new alignment. From this
new alignment we reconstruct the corresponding tree and compare it to
the original tree. For each subtree in the original tree we
search for the same subtree in the new tree and add a score of one to
the node at the root of the subtree if the subtree is present in the new
tree. This procedure is repeated a number of times (usually around 100
times). The result is a counter for each interior node of the original
tree, which indicate how likely it is to observe the exact same
subtree when the input sequences are sampled. A bootstrap value is
then computed for each interior node as the percentage of resampled
trees that contained the same subtree as that rooted at the node.
Bootstrap values can be seen as a measure of how reliably we can reconstruct a tree, given the sequence data available. If all trees reconstructed from resampled sequence data have very different topologies, then most bootstrap values will be low, which is a strong indication that the topology of the original tree cannot be trusted.