Science, Vol.327, No.5971, 1376-1379, 2010
Toward Extracting All Phylogenetic Information from Matrices of Evolutionary Distances
The matrix of evolutionary distances is a model-based statistic, derived from molecular sequences, summarizing the pairwise phylogenetic relations between a collection of species. Phylogenetic tree reconstruction methods relying on this matrix are relatively fast and thus widely used in molecular systematics. However, because of their intrinsic reliance on summary statistics, distance-matrix methods are assumed to be less accurate than likelihood-based approaches. In this paper, pairwise sequence comparisons are shown to be more powerful than previously hypothesized. A statistical analysis of certain distance-based techniques indicates that their data requirement for large evolutionary trees essentially matches the conjectured performance of maximum likelihood methods-challenging the idea that summary statistics lead to suboptimal analyses. On the basis of a connection between ancestral state reconstruction and distance averaging, the critical role played by the covariances of the distance matrix is identified.