Investigation of rare and low-frequency variants using high-throughput sequencing with pooled DNA samples.

Wang J, Skoog T, Einarsdottir E, Kaartokallio T, Laivuori H, Grauers A, Gerdhem P, Hytönen M, Lohi H, Kere J, Jiao H

Sci Rep 6 (-) 33256 [2016-09-16; online 2016-09-16]

High-throughput sequencing using pooled DNA samples can facilitate genome-wide studies on rare and low-frequency variants in a large population. Some major questions concerning the pooling sequencing strategy are whether rare and low-frequency variants can be detected reliably, and whether estimated minor allele frequencies (MAFs) can represent the actual values obtained from individually genotyped samples. In this study, we evaluated MAF estimates using three variant detection tools with two sets of pooled whole exome sequencing (WES) and one set of pooled whole genome sequencing (WGS) data. Both GATK and Freebayes displayed high sensitivity, specificity and accuracy when detecting rare or low-frequency variants. For the WGS study, 56% of the low-frequency variants in Illumina array have identical MAFs and 26% have one allele difference between sequencing and individual genotyping data. The MAF estimates from WGS correlated well (r = 0.94) with those from Illumina arrays. The MAFs from the pooled WES data also showed high concordance (r = 0.88) with those from the individual genotyping data. In conclusion, the MAFs estimated from pooled DNA sequencing data reflect the MAFs in individually genotyped samples well. The pooling strategy can thus be a rapid and cost-effective approach for the initial screening in large-scale association studies.

Bioinformatics Compute and Storage [Service]

NGI Stockholm (Genomics Applications) [Service]

NGI Stockholm (Genomics Production) [Service]

National Genomics Infrastructure [Service]

QC bibliography QC xrefs

PubMed 27633116

DOI 10.1038/srep33256

Crossref 10.1038/srep33256

pii: srep33256
pmc: PMC5025741