BLR: a flexible pipeline for haplotype analysis of multiple linked-read technologies.

Höjer P, Frick T, Siga H, Pourbozorgi P, Aghelpasand H, Martin M, Ahmadian A

Nucleic Acids Res. - (-) - [2023-11-06; online 2023-11-06]

Linked-read sequencing promises a one-method approach for genome-wide insights including single nucleotide variants (SNVs), structural variants, and haplotyping. We introduce Barcode Linked Reads (BLR), an open-source haplotyping pipeline capable of handling millions of barcodes and data from multiple linked-read technologies including DBS, 10× Genomics, TELL-seq and stLFR. Running BLR on DBS linked-reads yielded megabase-scale phasing with low (<0.2%) switch error rates. Of 13616 protein-coding genes phased in the GIAB benchmark set (v4.2.1), 98.6% matched the BLR phasing. In addition, large structural variants showed concordance with HPRC-HG002 reference assembly calls. Compared to diploid assembly with PacBio HiFi reads, BLR phasing was more continuous when considering switch errors. We further show that integrating long reads at low coverage (∼10×) can improve phasing contiguity and reduce switch errors in tandem repeats. When compared to Long Ranger on 10× Genomics data, BLR showed an increase in phase block N50 with low switch-error rates. For TELL-Seq and stLFR linked reads, BLR generated longer or similar phase block lengths and low switch error rates compared to results presented in the original publications. In conclusion, BLR provides a flexible workflow for comprehensive haplotype analysis of linked reads from multiple platforms.

Bioinformatics Long-term Support WABI [Collaborative]

Bioinformatics Support for Computational Resources [Service]

Bioinformatics Support, Infrastructure and Training [Collaborative]

NGI Short read [Service]

NGI Stockholm (Genomics Production) [Service]

National Genomics Infrastructure [Service]

PubMed 37941142

DOI 10.1093/nar/gkad1010

Crossref 10.1093/nar/gkad1010

pii: 7369811

Publications 9.5.0