Analyzing Genome-Scale Metabolic Networks (BioSysBio 2009)
University of Oxford
Steve Oliver mentioned that David Fell has invented a number of control analysis coefficients. However, more recently has been working on the structure of networks. You get from genes to metabolic reactions via proteins, protein complexes, enzymes (and therefore EC numbers). He'll concentrate on talking about how to navigate from the genotype to the phenotype. Where does the data come from when building genome-scale metabolic networks? BioCyc, KEGG, IntEnz, EXPASY Enzyme, or Brenda. Alternatively, you can use an annotation tool such as RPS-Blast with PRIAM signatures. In principle, this creates a list of the reeactions encoded in the genome sequence.
You can represent a networks as a matrix with the rows for the metabolites and columns for the changes in states. If a metabolic network is at a steady state, it satisfies the relationship N.v = 0, where N is the stoichiometry matrix. Cannot solve the equation for unique values of v (the rate), but can find out some things about it – there is partial information there, e.g. whether or not reactions can have nonzero values for the reaction rate.
In the analysis approach, it is assumed that: the reaction list is available that has been turned into a stoichiometry matrix; the external metablites – nutrients, waste products, and biomass precursors for growing cells – have been identified; and a third that I, unfortunately, missed. Some quality checks are performed to ensure that the given reactions can actually exist at a steady state. There are problems if, for example, there are reactants with no source (orphan metabolites). The second quality check is to: prune dead reactions, orphan metabolites – or fix them; then check for unemployed enzymes; check that individual reactions are stoichiometrically consistent; check the stoichiometric consistency of the model. More information at Gevorgyan et al Bioinformatics 24, 2245-2251 (2008). He says it helps to recognize that reactions are statements about the composition of compounds, irrespective of whether or not you know the atomic composition.
If you take the KEGG database (either full or subset of it), almost 7% of the reactions are unbalanced. Applications of structural analysis are numerous. He specifically mentions: null space for potentially active or definitely inactive reactions; elementary modes for finding all routes through a network; linear programming; damage analysis; enzyme subsets (functional modules); sets of minimal nutrients that would allow an organism to produce all of its biomass precursors. This is even if we cannot get information about all reaction rates.
They're working on Arabidopsis metabolism. They have extracted 1646 metabolites and 1742 reactions from the AraCyc annotation. Then they removed problematic reactions, leaving 1281 metabolites and 1433 reactions. Then orphans and dead reactions are removed, making 611 / 878. this brings it to the size of the working core of the E.coli model. This core is able to account for the synthesis of the major biomass precursors. Minimal solutions accounting for the growth of heterotrophic culture cells on glucose contain fewer than 230 reactions. This number is quite similar to the minimal set of enzymes required in other organisms (for creating the biomass precursors).
To apply the model, they're doing three things. 1. carrying out a proteomics survey to determine the subset of enzymes expressed in the cells. 2. model suggests that variable ATP demands can be met with little alteration of the minimal set of enzymes. 3. Flux changes in response to variable ATP requirements are confined to a relatively small sub-group of reactions. They plan to theoretically and experimentally test this.
They've also been annotating the S.agalactiae genome. It's a gram-positive bacterium that can be fatal in mothers/newborns in cows. PRIAM often gives multiple predictions for a single gene, so you have to prune out surplus reactions. The results lead to a number of reactions, but not all enzymes in this case are "employed". To optimize the metabolic reconstruction, they aimed to enable proline and lactose metabolism in the model. Solutions were found by simulated annealing approach, which produced optimized models that synthesized proline and consumed lactose. The outcome for proline found that 188.8.131.52 was a missing enzyme. Adding it created 6 more reactions in the model.
They then looked at some transcript arrays that have been done on this bacterium, and found two leading candidates for the missing proline enzyme and one clear candidate for the following step, out of the six genes that might have been involved.
Tools are available to analyze genome-scale models, but there are shortcomings in the current knowledge of metabolism and its representation in databases. Functinal assessment of predicted networks can complement bioinformatic approaches.
He also mentioned a Systems Biochemistry meeting at the University of York, March 22-24 2010. It will cover the systems analysis of metabolism, signallilng and control from a systems perspective, and systems approaches to health and disease.
Personal Comments: He had a very nice breakdown of the types of unbalanced reactions in KEGG in a table in his slides. It was quite surprising and enlightening – I didn't realize any such reactions would get through into KEGG. Thanks! A very good invited talk: well paced, clearly explained.
Monday Session 2
Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. I'm happy to correct any errors you may spot – just let me know!
Categories at the mind wobbles
- 80,771 hits
Error: Twitter did not respond. Please wait a few minutes and refresh this page.