Exploring Genomic Medicine Using Translational Bioinformatics

Atul Butte
Plenary Talk, Morning Session, 3 September (11th MGED Meeting, 1-4 September, 2008)

What is translational bioinformatics? Addresses translating genome-era discoveries to medicine. He feels bioinformatics has helped caused the problem and can solve it 🙂 TBi itself is the development of analytic, storage, and interpretive methods. It is meant to optimize the transformation if increasingly voluminous genomic and iological data into diagnostics and therapeutics for the clinician. CTSA = Clinical and Translational Science Awards (from the NIH).

Disease integration to find SNPs. Sequence variants associated with many conditions. Genome-wide association studies (GWAS) find strong signals across patients. Several problems with the GWAS approach: inconsistent reproducibility of previous loci, inconsistent patient selection leads to inconsistent results, several significant loci not associated with genes, and effect sizes are relatively weak. How to find the "long tail of rare genes"? Control groups are often different between runs of the study, etc. These problems are difficult to separate out. Many GWAS have already run on these disorders: microarrays, proteomics, RNAi. How can we integrate the results of these experiments in a purely data-driven way to efficiently find those genes likely to have SNPs? As a field how to find all gene variants for a complex disorder? As an investigator, how to find a novel disease gene variant?

Use an example of progeria. Death occurs on average at age 15, 90% from artherosclerosis. For some reason, starting at age 2, they start losing fat cells: both subcutaneous and body fat. There's a fat cell phenotype: can this aspect of the disorder help treat or research obesity? Further, it turns out have fat. Inactivating 305 genes decreased fat content, and inactivating 112 genes fat content. Of course, they don't have fat cells, but fat "granules". There are a total of 49 such studies.

So they started with a gold standard of 273 genes with variants known to be associated with obesity, and compared these 49 studies against them (17 microarray studies, 16 human genetic stidiues, 10 rat genetic studies, 5 mouse proteomic studies, 1 work genome-wide KO study). How many of these genes can we get back using new methods? they use an ROC curve of false-positive on the X and sensitivity on the Y. Microarrays have 1 "good" curve only (as far to the left and up as possible), and proteomics have all bad curves (== diagonal). For each gene count the number of experiments in which it was implicated. This gives us the repository model, which is the only way to get a good ROC curve – it outperforms all of the other study types: the more positives experiments for a gene, the more likely it is to be a true positive.  The average obesity-related experiment offers poor sensitivity! The repository model (made by the bioinformaticians) outperforms ALL of the other study types.

Most GWAS do not even re-discover known associated genes for obesity. Combining experiments does better than single experiments, and therefore bioinformatics should be involved earlier!

The genome-phenome network. Johannsen in 1908 wrote pheontype = genes + environment. Instead of doing this one at a time, why not relate all genes to all aspects of the environment, to all phenotypes? We might find novel roles for genes, drug targets, or diagnostic biomarkers. They used GEO for this (average 125 microarrays submitted per day). Data has accepted standards, context does not. Leverage this data using structured vocabularies, with the goal of automated assessment of context to enable integraion across experiments. Where does this pheontype and environmental factors list exist from? They used the Unified Medical Language System (UMLS). It "unifies" 130+ biomedical vocabularies, and has 1Milliion+ concepts, and 41Million+ relations. AILUN: extracting GEO gene lists. You extract links, using UMLS, between the UMLS concepts and higher levels of gene expression. From this, you build a network.

In this way, you can relate aspects of the environment and phenotype to genes. Can search for relevant data sets using these annotations. Parsing of the database entries is OK, but the annotations need to improve, through ontologies. 53% of GEO entries are linked to Pubmed. 35% of GEO entries have Pubmed links relatable to disease, which themselves have almost 300 disease Mesh terms. You can then use Pubmed links as annotation – as a proxy for the dataset annotations.

Genetic nosology (Linnaeus, co-founder of systematic nosology). Nosology is the systematic classification of diseases. He did get it very wrong, but he had good idea anyway. So, can we have genetic nosology, or the classification of diseases based on genomics. It could reshuffle thinking about diseases and drugs, and the disease "tree" itself. GNOMED: The gene-expression nosology of medicine. 228 diseases from 361 experiments from 8580 samples across 115 tissue types. Tissue type and control groups can be detected automatically.

"Rogue's gallery": list of 100 genes that are significantly different in all human disease. Inflammatory genes show the greatest average change across all diseases. Neighboring diseases may share the same drugs. In GNOMED, muscular distrophy is next to heart attack. This is pretty interesting: there are 40 approved FDA drugs to treat heart attacks, but 0 for muscular dystrophy! So, this means they can computationally predict which drugs may have new uses.

Genomic medicine. (Guttmacher, Collins NEJM 2002). This is the application of knowledge of the human genome to medicine. Further, it must be a bidirectional relationship between the molecular study of disease to the pathophsiological study of disease. However, there is a difficulty in implementation – how do we get clinicians to provide pheontypes? Example: hospital of 60,000 people studied over 7 years, modeling 8100+ phenotypes – we want this data! Wrt hospitals, this could become the ultimate model organisms. (see his article in Science this year on this topic).

Not all systems biology is molecular: SB can be applied to health and disease. We need investigators who can imagine basic questions to ask of these repositories of clinical and genomic measurements. The patients, samples, molecular, clinical, and epidemiological data are there to make an impact across medicine.


These are just my notes and are not guaranteed to be correct.
Please feel free to let me know about any errors, which are all my
fault and not the fault of the speaker. 🙂

Read and post comments |
Send to a friend



Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s