Barbara Wold, California Institute of Technology
Plenary Lecture, Morning Session 1 September (11th MGED Meeting, 1-4 September, 2008)
A direct way to characterize a transcriptome is to sequence a cDNA copy of the entire transcriptome and then calculate the density of reads mapping to any given locus in the genome. Ultra-high-throughput sequencing platforms have made it practical for doing this genome-wide. They have done this for mouse mRNA and human tissues and cellines at levels ranging from 20 to 100M reads per transcriptome. These RNASeq texperiments detect RNA splice patterns including alternate splicing events, y identifiying sequence reads that cross known and theoretical splice junctions.
The methods of the previous speaker tells you about where specific parts are (starts, end). The RNASeq technique is a more broad way of looking at the transcriptome, and is a more brute-force method. However, you can still get really important data out of it. The main purpose of RNASeq is to be able to quantify RNAs, both relative and absolute. RNASeq is good at absolute numbers. It can also do transcript discovery and mapping, including revising gene models, splice isoforms, and RNA editing. Even in "boring" tissues like mouse liver or total mouse brain, you still come up with some robust newly-discovered transcripts using this technique that aren't quantitatively minor. There are limits to this technique, e.g. they're doing the work against known sets of genes. (Although they'd like to do work de novo). They're happy to help with providing data to help with this. A final function is in genetics, specifically expressed SNPs and private mutations that wouldn't normally appear on SNP arrays.
Two features of the data: when they do comparisons of technical replicates, they correlate very nicely. Biological replication can then really be about the biology. Secondly, the map of the RNA transcripts had a very nice linear shape on a log scale.
Should you look at RNA that can map equally well to multiple sites? Looking at 25mer reads in mammalian genomes. Let's see what happens to those can map equally well to 2-10 sites, inclusive, as well as the unique reads. 80% of the genome could be mapped uniquely, with 6% between 2-10, and 14% with more than 10. In myoblast transcriptome, the fraction that maps uniquely is smaller (69%), and this is something that happens generally. This is because there's lots of gene paralogy, and you'll get things that map due to a (recent or old) duplication. So if you ignore these multi-read sequences, you will risk missing out important stuff entirely.
What are the kinds of genes that are multiread sensitive? Their example is an actin gene (EL4r1). If you just map unique reads, you miss everything that is in the exons, and therefore would show as if it was NOT expressed if just using unique reads. RNASeq is really good in detecting alternative splicing. Really rare alternatice splicing events may just be the random events that are not intended, but which the system can tolerate – this should be taken into account.
They have discovered some candidate new genes: 161 in the brain, 95 in the muscle, and 77 in the liver, and some of these are overlaps between the three.
You need to include multireads to detect some true positives in ChIP: 5-10% of sites in the interactome are affected. Can you ID by ChIP essentially all sites predicted by FUNCTION assays? Yes, but strongly conditioned on good abs and good cells. Do you expect detectable function at every site with significant & reproducible in vivo occupancy? No – more data is needed, long range cis-interaction in big genomes, 3-C signals in our data etc. Significant ChIP at all instances of high consensus motif match? MyoD, Myogenin- NO!, as >1 million perfect motifs in genome. Yes for big, well-specified motifs (NRSF), and the meaning of binding seen at some 1/2 of the sites is unclear.
(Chromatin Immunoprecipitation: ChIP.)
These are just my notes and are not guaranteed to be correct.
Please feel free to let me know about any errors, which are all my
fault and not the fault of the speaker. 🙂