Klevebring D, Bjursell M, Emanuelsson O, Lundeberg J
PLoS ONE 5 (3) e9762 [2010-03-25; online 2010-03-25]
Several recent studies have indicated that transcription is pervasive in regions outside of protein coding genes and that short antisense transcripts can originate from the promoter and terminator regions of genes. Here we investigate transcription of fragments longer than 200 nucleotides, focusing on antisense transcription for known protein coding genes and intergenic transcription. We find that roughly 12% to 16% of all reads that originate from promoter and terminator regions, respectively, map antisense to the gene in question. Furthermore, we detect a high number of novel transcriptionally active regions (TARs) that are generally expressed at a lower level than protein coding genes. We find that the correlation between RNA-seq data and microarray data is dependent on the gene length, with longer genes showing a better correlation. We detect high antisense transcriptional activity from promoter, terminator and intron regions of protein-coding genes and identify a vast number of previously unidentified TARs, including putative novel EGFR transcripts. This shows that in-depth analysis of the transcriptome using RNA-seq is a valuable tool for understanding complex transcriptional events. Furthermore, the development of new algorithms for estimation of gene expression from RNA-seq data is necessary to minimize length bias.