Dutoit L, Burri R, Nater A, Mugal CF, Ellegren H
Mol Ecol Resour 17 (4) 586-597 [2016-09-26; online 2016-09-26]
Properly estimating genetic diversity in populations of nonmodel species requires a basic understanding of how diversity is distributed across the genome and among individuals. To this end, we analysed whole-genome resequencing data from 20 collared flycatchers (genome size ≈1.1 Gb; 10.13 million single nucleotide polymorphisms detected). Genomewide nucleotide diversity was almost identical among individuals (mean = 0.00394, range = 0.00384-0.00401), but diversity levels varied extensively across the genome (95% confidence interval for 200-kb windows = 0.0013-0.0053). Diversity was related to selective constraint such that in comparison with intergenic DNA, diversity at fourfold degenerate sites was reduced to 85%, 3' UTRs to 82%, 5' UTRs to 70% and nondegenerate sites to 12%. There was a strong positive correlation between diversity and chromosome size, probably driven by a higher density of targets for selection on smaller chromosomes increasing the diversity-reducing effect of linked selection. Simulations exploring the ability of sequence data from a small number of genetic markers to capture the observed diversity clearly demonstrated that diversity estimation from finite sampling of such data is bound to be associated with large confidence intervals. Nevertheless, we show that precision in diversity estimation in large outbred population benefits from increasing the number of loci rather than the number of individuals. Simulations mimicking RAD sequencing showed that this approach gives accurate estimates of genomewide diversity. Based on the patterns of observed diversity and the performed simulations, we provide broad recommendations for how genetic diversity should be estimated in natural populations.
Bioinformatics Compute and Storage [Service]
NGI Uppsala (SNP&SEQ Technology Platform) [Service]
National Genomics Infrastructure [Service]