Genotype-free estimation of allele frequencies reduces bias and improves demographic inference from RADSeq data.

Warmuth VM, Ellegren H

Mol Ecol Resour 19 (3) 586-596 [2019-05-00; online 2019-04-17]

Restriction-site associated DNA sequencing (RADSeq) facilitates rapid generation of thousands of genetic markers at relatively low cost; however, several sources of error specific to RADSeq methods often lead to biased estimates of allele frequencies and thereby to erroneous population genetic inference. Estimating the distribution of sample allele frequencies without calling genotypes was shown to improve population inference from whole genome sequencing data, but the ability of this approach to account for RADSeq-specific biases remains unexplored. Here we assess in how far genotype-free methods of allele frequency estimation affect demographic inference from empirical RADSeq data. Using the well-studied pied flycatcher (Ficedula hypoleuca) as a study system, we compare allele frequency estimation and demographic inference from whole genome sequencing data with that from RADSeq data matched for samples using both genotype-based and genotype free methods. The demographic history of pied flycatchers as inferred from RADSeq data was highly congruent with that inferred from whole genome resequencing (WGS) data when allele frequencies were estimated directly from the read data. In contrast, when allele frequencies were derived from called genotypes, RADSeq-based estimates of most model parameters fell outside the 95% confidence interval of estimates derived from WGS data. Notably, more stringent filtering of the genotype calls tended to increase the discrepancy between parameter estimates from WGS and RADSeq data, respectively. The results from this study demonstrate the ability of genotype-free methods to improve allele frequency spectrum- (AFS-) based demographic inference from empirical RADSeq data and highlight the need to account for uncertainty in NGS data regardless of sequencing method.

Bioinformatics Support for Computational Resources [Service]

NGI Uppsala (SNP&SEQ Technology Platform) [Service]

National Genomics Infrastructure [Service]

PubMed 30633448

DOI 10.1111/1755-0998.12990

Crossref 10.1111/1755-0998.12990

Publications 9.5.0