HL17: Learning from Resequencing Data: What To Do When the $1000 Genome Arrives? (ISMB 2009)

Gregory Kryukov

If it were not for the great variability among individuals medicine might as well be a science and not an art (Ostler 1892) (did I get this right? :))

Currently method of choice is GWAS. Today, talk about another type of AS that is based on sequencing and is based on rare rather than common SNPs. Talk about missense mutations. Genomes of well-phenotyped individuals will be available soon. Sequencing will make every gene susceptible for genetic analysis, and effectively all genes have rare coding variants. Will these new genomes revolutionize the search for genes underlying human phenotypes? It is, however, a statistical challenge for a number of reasons, e.g. sequencing will uncover many low-frequency variants (power to detect association with rare variants is reduced). So, you combine non-synonymous variants in a single test. Using multiple-candidate gene studies, you can test resequencing AS (RAS). The theory here is that most new missense mutations are functional, most new missense mutations are only weakly deleterious, and most functional missense mutations are likely to influence phenotype in the same direction.

Can sequencing be used as a discovery tool? How many individuals need to be sequenced and effects of what strength can be detected via RAS?

Analyse existing pop seq data -> develop a pop genetics model (estimate params of demographic history from non-coding variation data and natural selection from missense SNPs data) -> simulate genotypes -> simulate phenotypes -> simulate sequencing study. What is the demographic history model? Calculate loads of theoretical spectra and use maximum likelihood method. 370 generations, which coincides nicely with the advent of agriculture in europe. The experimental and best model have a very good agreement. Next, use data on missense SNPs to add natural selection.

They do not assume pre-existing variation with pheontypic effect, they simply rely on the mutation rate. The extent of the shift of the distribution is an important parameter in the model. Are whole-genome mutation excess AS feasible? Used ~20,000 genes.  A much smaller sample size og 1000 individuals would have over 75% power to detect effect sizes of 2 standard deviations. If consider pathways rather than genes, the same number of individuals would have over 60% power.

With sample ~10000 individuals, rare-variants-based resequencing AS are feasible (with phenotype info from 100000). What is next? Integration of computational predictions of damaging polymorphisms. Integration of information on individual quantitative phenotypes, and they’d also like to investigate multistage design.

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s