The article 'The RNA Atlas expands the catalog of human non-coding RNAs', published today in Nature Biotechnology, is the result of more than five years of hard work to further unravel the complexity of the human transcriptome. Never before such a comprehensive effort was undertaken to characterize all RNA-molecules in human cells and tissues.
RNAs in all shapes and sizes
Our transcriptome is - analogous to our genome - the sum of all RNA molecules that are transcribed from the DNA strands that make up our genome. However, there's no 1-on-1 relationship with the latter. Firstly, each cell and tissue hasve a unique transcriptomes, with varying RNA production and compositions, including tissue-specific RNAs. Secondly, not all RNAs are transcribed from typical - protein coding - genes that eventually produce proteins. Many of our RNA molecules are not used as a template to build proteins, but originate from what once was called junk DNA: long sequences of DNA with unknown functions.
These non-coding RNAs (ncRNAs) come in all kinds of shapes and sizes: short, long, and even circular RNAs. Many of them even lack the tail of adenine-molecules that is typical for protein-coding RNAs.
300 human cell and tissue types and three sequencing methods
"There have been other projects to catalogue our transcriptome but the RNA-Atlas project is unique because of the applied sequencing methods," says prof. Pieter Mestdagh from the Center for Medical Genetics at Ghent University. "Not only did we look at the transcriptome of as many as 300 human cell and tissue types, but most importantly, we did so with three complementary sequencing technologies, one aimed at small RNAs, one aimed at polyadenylated (polyA) RNAs and a technique called total RNA sequencing."
This last sequencing technology led to the discovery of thousands of novel non-coding RNA genes, including a novel class of non-polyadenylated single exon genes and many new circular RNAs. By combining and comparing the results of the different sequencing methods the researchers were able to define for every measured RNA transcript, the abundance in the different cells and tissues, whether it has a polyA-tail or not (it appears that for some genes this can differ from cell type to cell type), and whether it is linear of circular. Moreover, the consortium searched and found important clues in determining the function of some of the ncRNAs. By looking at the abundancy of different RNA's in different cell types they found correlations that indicate regulatory functions, and could determine whether this regulation happens on the transcription level (by preventing or stimulating transcription of protein coding genes) or post-transcriptional (e.g. by breaking down RNAs).
An invaluable resource for biomedical science
All data, analyses and results (equivalent to a few libraries of information) are available for download and interrogation in the R2 web portal, enabling the community to implement this resource as a tool for exploration of non-coding RNA biology and function.
Prof. Pavel Sumazin of the Baylor College of Medicine: "By combining all data in one comprehensive catalogue, we have created a new valuable resource for biomedical scientists around the world studying disease processes. A better understanding of the complexity of the transcriptome is indeed essential to better understand disease processes and uncover novel genes that may serve as therapeutic targets or biomarkers. The age of RNA therapeutics is swiftly rising - we've all witnessed the impressive creation of RNA vaccines, and already the first medicines that target RNA are used in the clinic. I'm sure we'll see lots more of these therapies in the next years and decades."