Categories
Meetings & Conferences

Keynote: New Challenges and Opportunities in Network Biology (ISMB 2009)

ISCB Overton Prize Lecture: Trey Ideker, University of California, San Diego

Introduction to Trey (before he starts his talk): Received his PhD working with Leroy Hood in 2001. The curse of systems biology: you will be a jack of all trades, rather than a master of one. On to the talk.

Also worked with Richard Karp. Big question: How does one automatically assemble pathways? Design new perturbations to maximize information gain (this is what he did for his PhD). Ideker et al.: Ann Rev Genomics Hum Genetics 2001 – his PhD work (Systems Biology: A new approach to decoding life).

Let’s think about all the public interaction data: protein-dna interactions, PPIs, biochemical reactions. (Ideker et al Science 2001).  The final figure of that Science manuscript, he feels, launched his career.

Querying biological networks for “Active Modules”, where you can paint the network with colors: for patient expression profile, protein states, any functional assay. This highlights the Interaction Database Dump, aka “Hairballs”, which aren’t good for a whole lot. (Ideker Bioinformatics 2002). In recent work with Chanda and Bandyopadhyay, he’s worked on Project siRNA pheotypes onto a network of Human-human and human-HIV protein interactions. Look at the network modules associated with infection (Konig et al. Cel 2008).

Next: Moving Network Biology into the Clinic: the working map. Importantly, this map doesn’t have to be complete, and there can be some toleration for FP and FN. Their research wants to move from network assembly from genome-scale data to network-based study of disease. From this map, you could get: network-based diagnoses, functional separation of disease gene families, moving from GWAS to network-wide PAS (Pathway AS). Input is: network evolutionary comparison/cross-species alignment to identify conserved modules, projection of molecular profiles on protein networks to reveal active modules, integration of transcriptional interactions with causal or functional links, etc. These working maps are still essentially hairballs, even if they are represented as pretty pictures. But isn’t the cell really a hairball inside anyway? Maybe the secret isn’t figuring out this thing – maybe it’s to use this thing.

Extracting conserved pathways and complexes from cross-species network comparison (with Sharan and Karp): PathBLAST and NetworkBLAST for cross-comparison of networks. Start with two large hairballs; next realize that there is a third network implicit there of protein sequence homologies/orthologies between the two networks; given it is a many-to-many relationships between the networks, find the particular one that is the maximum alignment; highly score dense conserved complexes; then look for conserved interactions and find matched protein pairs (he does use sequence similarity for some things); the interaction scores come from logistic regression on number of observations, expression correlation, and clustering coefficient. They applied it for Plasmodium and Sarcchomyces.

Also did work on Human vs mouse TF-TF networks in brain (Tim Ravasi). You combine these quite readily, and id2, rb1 and cepbd are some examples. What follows is a very nice slide on the timeline of both biological sequence comparison and biological network comparison (Sharan & Ideker. Nat Biotech 2006). Trey thinks there are better things out there now than PathBLAST and networkBLAST.

Genetic interactions (non-physical) form a distinct type of network map (Tong et al. Science 2001). Here, there exists a genetic interaction between gene A and B if phenotype of mutant a is OK, mutant b is OK, and mutant ab is sick. How can you compare these to physical networks? Kelley and Ideker Nat Biotech 2005 worked on systematic identification of parallel pathway relations. Genetic interactions run between clusters of physical interactions, not within them.

Functional maps of protein complexes (Bandyopadhyay et al. PLoS Comp Bio 2008). (Roguev, Science 2008) Genetic interaction maps are conserved between species (S cerevisiae, S pombe) (Thanks to Oliver for that article – I missed it on the slide).

Using ChIP-chip to assemble transcriptional networks underlying genotoxicity (Craig Mak and Chris Workman), and doing network comparison. Firstly, integrate cause-and-effect interactions with physical networks (Yeang, Mak et al. Genome Biology 2005). What if a lot of transcriptional binding is real but inconsequential to cellular function? They’d try to systematically functionally validate all the ChIP-chip data they generated. Workman, Mak et al. Science 2006. Recent extensions to this work: Mak et al. Genome Research 2009. Here, about 10% of TF show an interesting spatial distribution on the genome. Characterize based on the distance to the closest telomere for a given gene. Then characterize a TF by looking at distribution of distances of each one to its chromosome end. There does seem to be condition-specific behaviour: probably it isn’t the TF moving from one part of chromosome to another, but perhaps the genome is moving to and fro around them.

Network-based disease diagnosis. Much work is increasingly moving in this direction. Using protein networks to diagnose breast cancer metastasis. Breast cancers are very heterogeneous. Can we improve the work in terms of reproducibility and classification using further interaction information? If each patient has a mutation in a different gene, what do we do? What if these genes are sequential steps in a pathway, or are subunits in a common complex? Might you then be able to learn a rule for this? Nature Biotech 2009  Taylor et al.

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Data Integration Meetings & Conferences

PT11: Domain-oriented edge-based alignment of protein interaction networks (ISMB 2009)

Xin Guo

Introducing… the DOMain-oriented Alignment of Interaction Networks (DOMAIN). Previous paradigms include the node-then-edge-alignment paradigm and direct-edge-alignment paradigm. In the latter, interactions are more likely to be conserved. Many studies have suggested that direct PPIs can be mediated by interactions of their domains.

Their method follows the direct-edge-alignment paradigm. In the method: try to find a set of alignable pairs of edgees (APEs), and then try to add some edges between the APEs. Finally they try to find high-scoring alignments. In step 1 (finding APEs) there are two assumptions: two proteins interact if at least one pair of their constituent domains interact, and second assumptions is that two DDIs are independent of each other. APEs are a pair of cross-species PPIs sharing at least one pair of DDIs. DDIs in common are plausibly responsible for PPIs. In scoring an APE, you esitmate species-specific DDI probabilities, and then calculate a mean as their joined probability. For all common DDI they yous a noisy-or formulation to calculate the score. The APE graph is the aligned network graph, and is motiviated by duplication-divergence models, and there are two parts: link dynamics and gene duplication.

To evaluate their method, they used data from DIP for 3 different species (PPI networks), and Pfam-A domains (protein-to-domain mapping), and the backbone DIP network (a subset of DIP). Two other similar methods are NetworkBLAST and MaWISh. In all of their metrics except one, they came out best. NetworkBLAST was the second best.

DOMAIN is the first algorithm to introduce domains to PPI network alignment problem, and the first attempt to align PPIs directly. It has better/similar performance than others, but it can only be applied to a subset of PPIs, however most functionally-annotated proteins are involved.

FriendFeed discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences

Analyzing Genome-Scale Metabolic Networks (BioSysBio 2009)

D Fell
University of Oxford

Steve Oliver mentioned that David Fell has invented a number of control analysis coefficients. However, more recently has been working on the structure of networks. You get from genes to metabolic reactions via proteins, protein complexes, enzymes (and therefore EC numbers). He'll concentrate on talking about how to navigate from the genotype to the phenotype. Where does the data come from when building genome-scale metabolic networks? BioCyc, KEGG, IntEnz, EXPASY Enzyme, or Brenda. Alternatively, you can use an annotation tool such as RPS-Blast with PRIAM signatures. In principle, this creates a list of the reeactions encoded in the genome sequence.

You can represent a networks as a matrix with the rows for the metabolites and columns for the changes in states. If a metabolic network is at a steady state, it satisfies the relationship N.v = 0, where N is the stoichiometry matrix. Cannot solve the equation for unique values of v (the rate), but can find out some things about it – there is partial information there, e.g. whether or not reactions can have nonzero values for the reaction rate.

In the analysis approach, it is assumed that: the reaction list is available that has been turned into a stoichiometry matrix; the external metablites – nutrients, waste products, and biomass precursors for growing cells – have been identified; and a third that I, unfortunately, missed. Some quality checks are performed to ensure that the given reactions can actually exist at a steady state. There are problems if, for example, there are reactants with no source (orphan metabolites). The second quality check is to: prune dead reactions, orphan metabolites – or fix them; then check for unemployed enzymes; check that individual reactions are stoichiometrically consistent; check the stoichiometric consistency of the model. More information at Gevorgyan et al Bioinformatics 24, 2245-2251 (2008). He says it helps to recognize that reactions are statements about the composition of compounds, irrespective of whether or not you know the atomic composition.

If you take the KEGG database (either full or subset of it), almost 7% of the reactions are unbalanced. Applications of structural analysis are numerous. He specifically mentions: null space for potentially active or definitely inactive reactions; elementary modes for finding all routes through a network; linear programming; damage analysis; enzyme subsets (functional modules); sets of minimal nutrients that would allow an organism to produce all of its biomass precursors. This is even if we cannot get information about all reaction rates.

They're working on Arabidopsis metabolism. They have extracted 1646 metabolites and 1742 reactions from the AraCyc annotation. Then they removed problematic reactions, leaving 1281 metabolites and 1433 reactions. Then orphans and dead reactions are removed, making 611 / 878. this brings it to the size of the working core of the E.coli model. This core is able to account for the synthesis of the major biomass precursors. Minimal solutions accounting for the growth of heterotrophic culture cells on glucose contain fewer than 230 reactions. This number is quite similar to the minimal set of enzymes required in other organisms (for creating the biomass precursors).

To apply the model, they're doing three things. 1. carrying out a proteomics survey to determine the subset of enzymes expressed in the cells. 2. model suggests that variable ATP demands can be met with little alteration of the minimal set of enzymes. 3. Flux changes in response to variable ATP requirements are confined to a relatively small sub-group of reactions. They plan to theoretically and experimentally test this.

They've also been annotating the S.agalactiae genome. It's a gram-positive bacterium that can be fatal in mothers/newborns in cows. PRIAM often gives multiple predictions for a single gene, so you have to prune out surplus reactions. The results lead to a number of reactions, but not all enzymes in this case are "employed". To optimize the metabolic reconstruction, they aimed to enable proline and lactose metabolism in the model. Solutions were found by simulated annealing approach, which produced optimized models that synthesized proline and consumed lactose. The outcome for proline found that 1.5.1.12 was a missing enzyme. Adding it created 6 more reactions in the model.

They then looked at some transcript arrays that have been done on this bacterium, and found two leading candidates for the missing proline enzyme and one clear candidate for the following step, out of the six genes that might have been involved.

Tools are available to analyze genome-scale models, but there are shortcomings in the current knowledge of metabolism and its representation in databases. Functinal assessment of predicted networks can complement bioinformatic approaches.

He also mentioned a Systems Biochemistry meeting at the University of York, March 22-24 2010. It will cover the systems analysis of metabolism, signallilng and control from a systems perspective, and systems approaches to health and disease.

Personal Comments: He had a very nice breakdown of the types of unbalanced reactions in KEGG in a table in his slides. It was quite surprising and enlightening – I didn't realize any such reactions would get through into KEGG. Thanks! A very good invited talk: well paced, clearly explained.

Monday Session 2
http://friendfeed.com/rooms/biosysbio
http://conferences.theiet.org/biosysbio

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. I'm happy to correct any errors you may spot – just let me know!

Read and post comments |
Send to a friend

original

Categories
Meetings & Conferences

Jim Beynon and PRESTA, BBSRC Systems Biology Workshop

BBSRC Systems Biology Grantholder Workshop, University of Nottingham, 16 December 2008.

PRESTA stands for Plant Resposes to Environmental STress in Arabidopsis. Even though the environment is changing rapidly, investment in plant research has declined. Abiotic and biotic stresses will function via core response networks embellished with stress-specific pathways. A fundamental component of these responses is transcriptional change. It seems that in many of the components in stress responses, hormones are key: also, everything seems to focus through key pathways. Two approaches are used: top-down modelling via network inference, or bottom-up modelling via already extant knowledge of key genes. This talk focused on the former.

They used high-resolution time-course microarrays which use 31,000 genome sequence tags (you need these to get the information to the modellers). Then, they use a range of different stress response to reveal commonalities (developmental e.g. senesence, pathogens, and abiotic stress). One example: over 48 hours there were 24 time points taken with 4 biololgical and 3 technical replicates. Two-color arrays allow complex loop design. They've been using the MAANOVA program, and even altered it to make it more efficient. You basically end up with an f test that tells you which genes have changed over time. How to select genes for Network Inference Modeling?: GO annotations, genes known to be involved in stress-related processes, trancription factors known to be involved, early response genes and prior knowledge.

There goes the battery again! Grrr…. transcribed paper notes follow, which aren't generally as detailed in my case…

Vairation of network models: 4 out of the 12 prospective genes shown to have altered pathogen growth phenotype. Knockouts in a hub gene showed both up or down-regulation of senesence. They want to add validation to the network model, and have validated various genes via experimental work). Developed APPLE, which is tha tAnalysis of Plant Promoter-Linked Elements. Discovered if overexpress HSF3 the plants are more tolerant to drought and show increased seed yield. HSF3 is part of the stress response but has a wide range of interactions, which is a good thing for building parameterized models. In the future, wants to look at the genetic diversity in the crops, and try to express a more robust response to the environment.

These are just my notes and are not guaranteed to be correct. Please feel free to let me know about any errors, which are all my fault and not the fault of the speaker. 🙂

Read and post comments |
Send to a friend

original