An improved method for fitting gamma distribution to substitution rate variation among sites
AbstractGamma distribution has been used to fit substitution rate variation over site. One simple method to estimate the shape parameter of the gamma distribution is to 1) reconstruct a phylogenetic tree and the ancestral states of internal nodes, 2) perform pairwise comparison between nodes on each side of each branch to count the number of “observed” substitutions for each site, and apply correction of multiple hits to derive the estimated number of substitutions for each site, and 3) fit the site-specific substitution data to gamma distribution to obtain the shape parameter α This method is fast but its accuracy depends much on the accuracy of the estimated site-specific number of substitutions. The existing method has three shortcomings. First, it uses Poisson correction which is inadequate for almost any nucleotide sequences. Second, it does independent estimation for the number of substitutions at each site without making use of information at all sites. Third, the program implementing the method has never been made publically available. I have implemented in DAMBE software a new method based on the F84 substitution model with simultaneous estimation that uses information from all sites in estimating the number of substitutions at each site. DAMBE is freely available at available athttp://dambe.bio.uottawa.ca