Mugal CF, Wolf JB, Kaj I
Mol. Biol. Evol. 31 (1) 212-231 [2014-01-00; online 2013-10-17]
The ratio of divergence at nonsynonymous and synonymous sites, dN/dS, is a widely used measure in evolutionary genetic studies to investigate the extent to which selection modulates gene sequence evolution. Originally tailored to codon sequences of distantly related lineages, dN/dS represents the ratio of fixed nonsynonymous to synonymous differences. The impact of ancestral and lineage-specific polymorphisms on dN/dS, which we here show to be substantial for closely related lineages, is generally neglected in estimation techniques of dN/dS. To address this issue, we formulate a codon model that is firmly anchored in population genetic theory, derive analytical expressions for the dN/dS measure by Poisson random field approximation in a Markovian framework and validate the derivations by simulations. In good agreement, simulations and analytical derivations demonstrate that dN/dS is biased by polymorphisms at short time scales and that it can take substantial time for the expected value to settle at its time limit where only fixed differences are considered. We further show that in any attempt to estimate the dN/dS ratio from empirical data the effect of the intrinsic fluctuations of a ratio of stochastic variables, can even under neutrality yield extreme values of dN/dS at short time scales or in regions of low mutation rate. Taken together, our results have significant implications for the interpretation of dN/dS estimates, the McDonald-Kreitman test and other related statistics, in particular for closely related lineages.