Accounting for ambiguity in ancestral sequence reconstruction
AbstractThe reconstruction of ancestral genetic sequences from the analysis of contemporaneous data is a powerful tool to improve our understanding of molecular evolution. Various statistical criteria defined in a phylogenetic framework can be used to infer nucleotide, aminoa-cid or codon states at internal nodes of the tree, for every position along the sequence. These criteria generally select the state that maximises (or minimises) a given criterion. Although it is perfectly sensible from a statistical perspective, that strategy fails to convey useful information about the level of uncertainty associated to the inference. The present study introduces a new criterion for ancestral nucleotide reconstruction that selects a single state whenever the signal conveyed by the data is strong, and a combination of multiple states otherwise. Simulations demonstrate the benefit of this approach with a substantial increase in the accuracy of ancestral sequence reconstruction without significantly compromising on the precision of the solutions returned.