Dynamics and Complexity of the Coding and Non-Coding Transcriptome
Piero Carninci, RIKEN Omics Science Center
Keynote Lecture, Morning Session 1 September (11th MGED Meeting, 1-4 September, 2008)
They've been mapping the expressed part of the genome, aka the transcriptome. This will help us understand the genome output. There are many different issues with RNAs that are retrieved via standard methods: there are many different transcripts from a single gene, and different promotors. Each promotor will have different levels of activity. They use Cap Analysis Gene Expression (CAGE). They are using MAGE-TAB and SDRF formats to store their data.
While in the 1990s, people thought there were 70,000-100,000 protein-coding genes. Today, we expect that there are only about 20,000. Instead, there is a lot of complexity: post-translational modifications, many overlapping transcripts, multiple promoters, etc.
But what are the long non-coding RNAs (ncRNAs) doing? They are long stretches of sequences that are not conserved. However, their promoter sequences are often conserved. Perhaps the mechanisms of their action do not require long stretches of conservation in the gene. Most of the unknown RNA is polyA minus and nuclear. A large proportion of the long RNAs are cleaved (deriving short RNAS that are often conserved). These derived short RNAs are mapping on the 5' end (PASRs) and 3' ends (TASRs) of genes. Therefore the whole transcript is not conserved because it doesn't need to be: only those bits that are cleaved and used later on need to be conserved. Essentially, this means a large number of RNAs from an individual locus.
PolyA- CAGE is mostly nuclear, overlap introns and TSSs, while cytoplasmic is more on exons. 3' untranslated regions (UTR) also are interesting: they start from a conserved promoter which has a conserved GGG section. There is also RNA that starts from the middle of a gene. It is more prevalent in the tata-box, with sharp promoters. Mouse and humans have similar starting sites. There are also antisense RNAs. Most TU (72%) show antisense transcription. Are the sense-antisense RNAs co-expressed? Is there dynamic regulation? If you perturb antisense RNA, the sense will be overexpressed. It also seems that sense and antisense RNA aren't transcribed at the same time – that they might take turns (this is my impression from the slides, rather than something he said exactly). Sometimes sense-antisense work in the cytoplasm (with theproduction of natural siRNA). One example is the beta-secretase-1 antisense, which increases the sense RNA (feed-forward loop), which is important in Alzheimer's.
You can even get RNA expression from repeats. Repeat elements can produces short RNA, like natural siRNA. They have identified that 10-35% of the transcript correspond to repeat elements. Surprisingly, they have dynamic tissue-specific behaviour / patterns. There is overrepresentation of repeats in the nucleus among polyA- RNAs, and there is compartment specificity.
There is a lot of promoter plasticity. A switch to PyPu will increase transcription, while the reverse decreases it. They're having a look at preferentially-expressed promoters (PEPs). These are promoters that have >30 tags and are statistically significant. The distribution of PEPs in brain tissues: genes that have multiple-tissue-expressed PEPs. Different PEPs drive funtional variability of the proteome. PEPs create more proteome diversity. They make use of THP-1 cells as a model cell. 46% of genes in THP-1 have alternative promoters. Of these 18, 245 are high-confidence promoters. 1909 of these are newly-discovered. CAGE identifies the active set of promoters, and more precisely defines the TSS position.
CAGE is not dependent on microarray design, and measures expression including ncRNAs. They have some bioinformatics tools freely available for the CAGE protocol, and have tried to simplify the CAGE protocol. Please contact him if you wish help in making your own CAGE library.
These are just my notes and are not guaranteed to be correct. Please feel free to let me know about any errors, which are all my fault and not the fault of the speaker. 🙂