ISCB Overton Prize Lecture: Trey Ideker, University of California, San Diego
Introduction to Trey (before he starts his talk): Received his PhD working with Leroy Hood in 2001. The curse of systems biology: you will be a jack of all trades, rather than a master of one. On to the talk.
Also worked with Richard Karp. Big question: How does one automatically assemble pathways? Design new perturbations to maximize information gain (this is what he did for his PhD). Ideker et al.: Ann Rev Genomics Hum Genetics 2001 – his PhD work (Systems Biology: A new approach to decoding life).
Let’s think about all the public interaction data: protein-dna interactions, PPIs, biochemical reactions. (Ideker et al Science 2001). The final figure of that Science manuscript, he feels, launched his career.
Querying biological networks for “Active Modules”, where you can paint the network with colors: for patient expression profile, protein states, any functional assay. This highlights the Interaction Database Dump, aka “Hairballs”, which aren’t good for a whole lot. (Ideker Bioinformatics 2002). In recent work with Chanda and Bandyopadhyay, he’s worked on Project siRNA pheotypes onto a network of Human-human and human-HIV protein interactions. Look at the network modules associated with infection (Konig et al. Cel 2008).
Next: Moving Network Biology into the Clinic: the working map. Importantly, this map doesn’t have to be complete, and there can be some toleration for FP and FN. Their research wants to move from network assembly from genome-scale data to network-based study of disease. From this map, you could get: network-based diagnoses, functional separation of disease gene families, moving from GWAS to network-wide PAS (Pathway AS). Input is: network evolutionary comparison/cross-species alignment to identify conserved modules, projection of molecular profiles on protein networks to reveal active modules, integration of transcriptional interactions with causal or functional links, etc. These working maps are still essentially hairballs, even if they are represented as pretty pictures. But isn’t the cell really a hairball inside anyway? Maybe the secret isn’t figuring out this thing – maybe it’s to use this thing.
Extracting conserved pathways and complexes from cross-species network comparison (with Sharan and Karp): PathBLAST and NetworkBLAST for cross-comparison of networks. Start with two large hairballs; next realize that there is a third network implicit there of protein sequence homologies/orthologies between the two networks; given it is a many-to-many relationships between the networks, find the particular one that is the maximum alignment; highly score dense conserved complexes; then look for conserved interactions and find matched protein pairs (he does use sequence similarity for some things); the interaction scores come from logistic regression on number of observations, expression correlation, and clustering coefficient. They applied it for Plasmodium and Sarcchomyces.
Also did work on Human vs mouse TF-TF networks in brain (Tim Ravasi). You combine these quite readily, and id2, rb1 and cepbd are some examples. What follows is a very nice slide on the timeline of both biological sequence comparison and biological network comparison (Sharan & Ideker. Nat Biotech 2006). Trey thinks there are better things out there now than PathBLAST and networkBLAST.
Genetic interactions (non-physical) form a distinct type of network map (Tong et al. Science 2001). Here, there exists a genetic interaction between gene A and B if phenotype of mutant a is OK, mutant b is OK, and mutant ab is sick. How can you compare these to physical networks? Kelley and Ideker Nat Biotech 2005 worked on systematic identification of parallel pathway relations. Genetic interactions run between clusters of physical interactions, not within them.
Functional maps of protein complexes (Bandyopadhyay et al. PLoS Comp Bio 2008). (Roguev, Science 2008) Genetic interaction maps are conserved between species (S cerevisiae, S pombe) (Thanks to Oliver for that article – I missed it on the slide).
Using ChIP-chip to assemble transcriptional networks underlying genotoxicity (Craig Mak and Chris Workman), and doing network comparison. Firstly, integrate cause-and-effect interactions with physical networks (Yeang, Mak et al. Genome Biology 2005). What if a lot of transcriptional binding is real but inconsequential to cellular function? They’d try to systematically functionally validate all the ChIP-chip data they generated. Workman, Mak et al. Science 2006. Recent extensions to this work: Mak et al. Genome Research 2009. Here, about 10% of TF show an interesting spatial distribution on the genome. Characterize based on the distance to the closest telomere for a given gene. Then characterize a TF by looking at distribution of distances of each one to its chromosome end. There does seem to be condition-specific behaviour: probably it isn’t the TF moving from one part of chromosome to another, but perhaps the genome is moving to and fro around them.
Network-based disease diagnosis. Much work is increasingly moving in this direction. Using protein networks to diagnose breast cancer metastasis. Breast cancers are very heterogeneous. Can we improve the work in terms of reproducibility and classification using further interaction information? If each patient has a mutation in a different gene, what do we do? What if these genes are sequential steps in a pathway, or are subunits in a common complex? Might you then be able to learn a rule for this? Nature Biotech 2009 Taylor et al.
Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!
PT11: Domain-oriented edge-based alignment of protein interaction networks (ISMB 2009)
June 29, 2009
Xin Guo
Introducing… the DOMain-oriented Alignment of Interaction Networks (DOMAIN). Previous paradigms include the node-then-edge-alignment paradigm and direct-edge-alignment paradigm. In the latter, interactions are more likely to be conserved. Many studies have suggested that direct PPIs can be mediated by interactions of their domains.
Their method follows the direct-edge-alignment paradigm. In the method: try to find a set of alignable pairs of edgees (APEs), and then try to add some edges between the APEs. Finally they try to find high-scoring alignments. In step 1 (finding APEs) there are two assumptions: two proteins interact if at least one pair of their constituent domains interact, and second assumptions is that two DDIs are independent of each other. APEs are a pair of cross-species PPIs sharing at least one pair of DDIs. DDIs in common are plausibly responsible for PPIs. In scoring an APE, you esitmate species-specific DDI probabilities, and then calculate a mean as their joined probability. For all common DDI they yous a noisy-or formulation to calculate the score. The APE graph is the aligned network graph, and is motiviated by duplication-divergence models, and there are two parts: link dynamics and gene duplication.
To evaluate their method, they used data from DIP for 3 different species (PPI networks), and Pfam-A domains (protein-to-domain mapping), and the backbone DIP network (a subset of DIP). Two other similar methods are NetworkBLAST and MaWISh. In all of their metrics except one, they came out best. NetworkBLAST was the second best.
DOMAIN is the first algorithm to introduce domains to PPI network alignment problem, and the first attempt to align PPIs directly. It has better/similar performance than others, but it can only be applied to a subset of PPIs, however most functionally-annotated proteins are involved.
Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!
Jim Beynon and PRESTA, BBSRC Systems Biology Workshop
December 16, 2008
BBSRC Systems Biology Grantholder Workshop, University of Nottingham, 16 December 2008.
PRESTA stands for Plant Resposes to Environmental STress in Arabidopsis. Even though the environment is changing rapidly, investment in plant research has declined. Abiotic and biotic stresses will function via core response networks embellished with stress-specific pathways. A fundamental component of these responses is transcriptional change. It seems that in many of the components in stress responses, hormones are key: also, everything seems to focus through key pathways. Two approaches are used: top-down modelling via network inference, or bottom-up modelling via already extant knowledge of key genes. This talk focused on the former.
They used high-resolution time-course microarrays which use 31,000 genome sequence tags (you need these to get the information to the modellers). Then, they use a range of different stress response to reveal commonalities (developmental e.g. senesence, pathogens, and abiotic stress). One example: over 48 hours there were 24 time points taken with 4 biololgical and 3 technical replicates. Two-color arrays allow complex loop design. They've been using the MAANOVA program, and even altered it to make it more efficient. You basically end up with an f test that tells you which genes have changed over time. How to select genes for Network Inference Modeling?: GO annotations, genes known to be involved in stress-related processes, trancription factors known to be involved, early response genes and prior knowledge.
There goes the battery again! Grrr…. transcribed paper notes follow, which aren't generally as detailed in my case…
Vairation of network models: 4 out of the 12 prospective genes shown to have altered pathogen growth phenotype. Knockouts in a hub gene showed both up or down-regulation of senesence. They want to add validation to the network model, and have validated various genes via experimental work). Developed APPLE, which is tha tAnalysis of Plant Promoter-Linked Elements. Discovered if overexpress HSF3 the plants are more tolerant to drought and show increased seed yield. HSF3 is part of the stress response but has a wide range of interactions, which is a good thing for building parameterized models. In the future, wants to look at the genetic diversity in the crops, and try to express a more robust response to the environment.
These are just my notes and are not guaranteed to be correct. Please feel free to let me know about any errors, which are all my fault and not the fault of the speaker.