Detecting transposable elements in long read genomes using sTELLeR.

Bilgrav Saether K, Eisfeldt J

Bioinformatics - (-) - [2024-11-18; online 2024-11-18]

Repeat elements such as transposable elements (TE), are highly repetitive DNA sequences that compose around 50% of the genome. TEs such as Alu, SVA, HERV and L1 elements can cause disease through disrupting genes, causing frameshift mutations or altering splicing patters. These are elements challenging to characterize using short-read genome sequencing (srGS), due to its read length and TEs repetitive nature. Long read genome sequencing (lrGS) enables bridging of TEs, allowing increased resolution across repetitive DNA sequences. lrGS therefore present an opportunity for improved TE detection and analysis, not only from a research perspective, but also for future clinical detection. When choosing a lrGS TE caller, parameters such as runtime, CPU hours, sensitivity, precision and compatibility with inclusion into pipelines are crucial for efficient detection. We therefore developed sTELLeR, (s) Transposable ELement in Long (e) Read, for accurate, fast and effective TE detection. Particularly, sTELLeR exhibit higher precision and sensitivity for calling of Alu elements than similar tools. The caller is 5-48x as fast and uses <2% of the CPU hours compared to competitive callers. The caller is haplotype aware and output results in a VCF file, enabling compatibility with other variant callers and downstream analysis. sTELLeR is a python-based tool and is available at https://github.com/kristinebilgrav/sTELLeR. Altogether, we show that sTELLeR is a fast, sensitive and precise caller for detection of TE elements, and can easily be implemented into variant calling workflows. Supplementary data are available at Bioinformatics online.

Clinical Genomics [Service]

Clinical Genomics Stockholm [Service]

PubMed 39558574

DOI 10.1093/bioinformatics/btae686

Crossref 10.1093/bioinformatics/btae686

pii: 7903282