Günther T, Goldberg A, Schraiber JG
G3 (Bethesda) 15 (10) - [2025-10-08; online 2025-07-30]
Population genomic analyses rely on an accurate and unbiased characterization of the genetic composition of the studied population. For short-read, high-throughput sequencing data, mapping sequencing reads to a linear reference genome can bias population genetic inference due to mismatches in reads carrying non-reference alleles. In this study, we investigate the impact of mapping bias on allele frequency estimates from pseudohaploid data and genotype likelihoods, 2 approaches commonly used in ultra-low to medium coverage sequencing. To mitigate mapping bias, we propose an empirical adjustment to genotype likelihoods. Using data from the 1000 Genomes Project, we find that our new method improves allele frequency estimation. To test a downstream application, we simulate ancient DNA data with realistic post-mortem damage to compare widely used methods for estimating ancestry proportions under different scenarios, including reference genome selection, population divergence, and sequencing depth. Our findings reveal that mapping bias can lead to differences in estimated admixture proportion of up to 4% depending on the reference population. However, the choice of method has a much stronger impact, with some methods showing differences of 10%. qpAdm appears to perform best at estimating simulated ancestry proportions, but it is sensitive to mapping bias and its applicability may vary across species due to its requirement for additional populations beyond the sources and target population. Our adjusted genotype likelihood approach largely mitigates the effect of mapping bias on genome-wide ancestry estimates from genotype likelihood-based tools. However, it cannot account for the bias introduced by the method itself or the noise in individual site allele frequency estimates due to low sequencing depth. Overall, our study provides valuable insights for obtaining more precise estimates of allele frequencies and ancestry proportions in empirical studies.
Bioinformatics Support for Computational Resources [Service]
PubMed 40737495
DOI 10.1093/g3journal/jkaf172
Crossref 10.1093/g3journal/jkaf172
pmc: PMC12506655
pii: 8219480