Wang J, Li B, Marques S, Steinmetz LM, Wei W, Pelechano V
Nucleic Acids Res. 48 (18) e104 [2020-10-09; online 2020-08-21]
Eukaryotic transcriptomes are complex, involving thousands of overlapping transcripts. The interleaved nature of the transcriptomes limits our ability to identify regulatory regions, and in some cases can lead to misinterpretation of gene expression. To improve the understanding of the overlapping transcriptomes, we have developed an optimized method, TIF-Seq2, able to sequence simultaneously the 5' and 3' ends of individual RNA molecules at single-nucleotide resolution. We investigated the transcriptome of a well characterized human cell line (K562) and identified thousands of unannotated transcript isoforms. By focusing on transcripts which are challenging to be investigated with RNA-Seq, we accurately defined boundaries of lowly expressed unannotated and read-through transcripts putatively encoding fusion genes. We validated our results by targeted long-read sequencing and standard RNA-Seq for chronic myeloid leukaemia patient samples. Taking the advantage of TIF-Seq2, we explored transcription regulation among overlapping units and investigated their crosstalk. We show that most overlapping upstream transcripts use poly(A) sites within the first 2 kb of the downstream transcription units. Our work shows that, by paring the 5' and 3' end of each RNA, TIF-Seq2 can improve the annotation of complex genomes, facilitate accurate assignment of promoters to genes and easily identify transcriptionally fused genes.
Bioinformatics Support for Computational Resources [Service]
PubMed 32816037
DOI 10.1093/nar/gkaa691
Crossref 10.1093/nar/gkaa691
pmc: PMC7544212
pii: 5894951