Benchmarking long-read sequencing strategies for obtaining ASV-resolved rRNA operons from environmental microeukaryotes.

Overgaard CK, Jamy M, Radutoiu S, Burki F, Dueholm MKD

Mol Ecol Resour 24 (7) e13991 [2024-10-00; online 2024-07-09]

The use of short-read metabarcoding for classifying microeukaryotes is challenged by the lack of comprehensive 18S rRNA reference databases. While recent advances in high-throughput long-read sequencing provide the potential to greatly increase the phylogenetic coverage of these databases, the performance of different sequencing technologies and subsequent bioinformatics processing remain to be evaluated, primarily because of the absence of well-defined eukaryotic mock communities. To address this challenge, we created a eukaryotic rRNA operon clone-library and turned it into a precisely defined synthetic eukaryotic mock community. This mock community was then used to evaluate the performance of three long-read sequencing strategies (PacBio circular consensus sequencing and two Nanopore approaches using unique molecular identifiers) and three tools for resolving amplicons sequence variants (ASVs) (USEARCH, VSEARCH, and DADA2). We investigated the sensitivity of the sequencing techniques based on the number of detected mock taxa, and the accuracy of the different ASV-calling tools with a specific focus on the presence of chimera among the final rRNA operon ASVs. Based on our findings, we provide recommendations and best practice protocols for how to cost-effectively obtain essentially error-free rRNA operons in high-throughput. An agricultural soil sample was used to demonstrate that the sequencing and bioinformatic results from the mock community also translates to highly diverse natural samples, which enables us to identify previously undescribed microeukaryotic lineages.

NGI Long read [Service]

NGI Uppsala (Uppsala Genome Center) [Service]

National Genomics Infrastructure [Service]

PubMed 38979877

DOI 10.1111/1755-0998.13991

Crossref 10.1111/1755-0998.13991