Categories
Meetings & Conferences

Is Baker’s Yeast Trapped in a Prisoner’s Dilemma upon Secretion of Extracellular Enzymes? (BioSysBio

S Schuster et al.
Technion – Israel Institute of Technology

Personal Comment: My apologies, but I had serious weirdness on my machine which necessitated a restart of my window manager. Therefore I missed the first half of the talk.

Prisoner's dilemma: if prisoner A reveals the plan of escape to the jailor, while B does not, A is set free and gets £1000. B is kept in prison for 10 years. Various other rules produce different results. For more information, see http://en.wikipedia.org/wiki/Prisoner%27s_dilemma. The payoff matrix shows that if both cooperate, both are set free. If A cooperates and B doesn't, he is punished for his goodwill. If both defect, then both end up with prison: this ending the Nash equilibrium. For microorganisms this can be relevant because there is just the issue of mutation, without blurring the issue with trust etc. 🙂

The snowdrift game is two drivers are stuck. Only the first is needed to shovel to get out, so the other can help, but it doesn't gain them anything. Info on comparison of snowdrift and Prisoner's dilemma available (just a quick google search, haven't properly read it) at http://www.physorg.com/news111145481.html.

John Maynard Smith extended these games for populations. In population dynamics, the only evolutionary stable situation (ESS) for the prisoner's dilemma is that the whole population defects. Apply this to the situation of the glucose gradient around the yeast cell – cooperators get an advantage. The relative fitness of the defectors are dependent on cell number (density).

Under physiological conditions, often snowdrift game, supported by recent experimental data (Gore et al 2009). See also J.Biol. Phys. 34 (20087) 1-17. This is good news for biotechnology, in that it is possible for exoenzyme-producing strains will not necessarily be outcompeted by non-secreting mutants.

Monday Session 2
http://friendfeed.com/rooms/biosysbio
http://conferences.theiet.org/biosysbio

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. I'm happy to correct any errors you may spot – just let me know!

Read and post comments |
Send to a friend

original

Categories
Meetings & Conferences

Analyzing Genome-Scale Metabolic Networks (BioSysBio 2009)

D Fell
University of Oxford

Steve Oliver mentioned that David Fell has invented a number of control analysis coefficients. However, more recently has been working on the structure of networks. You get from genes to metabolic reactions via proteins, protein complexes, enzymes (and therefore EC numbers). He'll concentrate on talking about how to navigate from the genotype to the phenotype. Where does the data come from when building genome-scale metabolic networks? BioCyc, KEGG, IntEnz, EXPASY Enzyme, or Brenda. Alternatively, you can use an annotation tool such as RPS-Blast with PRIAM signatures. In principle, this creates a list of the reeactions encoded in the genome sequence.

You can represent a networks as a matrix with the rows for the metabolites and columns for the changes in states. If a metabolic network is at a steady state, it satisfies the relationship N.v = 0, where N is the stoichiometry matrix. Cannot solve the equation for unique values of v (the rate), but can find out some things about it – there is partial information there, e.g. whether or not reactions can have nonzero values for the reaction rate.

In the analysis approach, it is assumed that: the reaction list is available that has been turned into a stoichiometry matrix; the external metablites – nutrients, waste products, and biomass precursors for growing cells – have been identified; and a third that I, unfortunately, missed. Some quality checks are performed to ensure that the given reactions can actually exist at a steady state. There are problems if, for example, there are reactants with no source (orphan metabolites). The second quality check is to: prune dead reactions, orphan metabolites – or fix them; then check for unemployed enzymes; check that individual reactions are stoichiometrically consistent; check the stoichiometric consistency of the model. More information at Gevorgyan et al Bioinformatics 24, 2245-2251 (2008). He says it helps to recognize that reactions are statements about the composition of compounds, irrespective of whether or not you know the atomic composition.

If you take the KEGG database (either full or subset of it), almost 7% of the reactions are unbalanced. Applications of structural analysis are numerous. He specifically mentions: null space for potentially active or definitely inactive reactions; elementary modes for finding all routes through a network; linear programming; damage analysis; enzyme subsets (functional modules); sets of minimal nutrients that would allow an organism to produce all of its biomass precursors. This is even if we cannot get information about all reaction rates.

They're working on Arabidopsis metabolism. They have extracted 1646 metabolites and 1742 reactions from the AraCyc annotation. Then they removed problematic reactions, leaving 1281 metabolites and 1433 reactions. Then orphans and dead reactions are removed, making 611 / 878. this brings it to the size of the working core of the E.coli model. This core is able to account for the synthesis of the major biomass precursors. Minimal solutions accounting for the growth of heterotrophic culture cells on glucose contain fewer than 230 reactions. This number is quite similar to the minimal set of enzymes required in other organisms (for creating the biomass precursors).

To apply the model, they're doing three things. 1. carrying out a proteomics survey to determine the subset of enzymes expressed in the cells. 2. model suggests that variable ATP demands can be met with little alteration of the minimal set of enzymes. 3. Flux changes in response to variable ATP requirements are confined to a relatively small sub-group of reactions. They plan to theoretically and experimentally test this.

They've also been annotating the S.agalactiae genome. It's a gram-positive bacterium that can be fatal in mothers/newborns in cows. PRIAM often gives multiple predictions for a single gene, so you have to prune out surplus reactions. The results lead to a number of reactions, but not all enzymes in this case are "employed". To optimize the metabolic reconstruction, they aimed to enable proline and lactose metabolism in the model. Solutions were found by simulated annealing approach, which produced optimized models that synthesized proline and consumed lactose. The outcome for proline found that 1.5.1.12 was a missing enzyme. Adding it created 6 more reactions in the model.

They then looked at some transcript arrays that have been done on this bacterium, and found two leading candidates for the missing proline enzyme and one clear candidate for the following step, out of the six genes that might have been involved.

Tools are available to analyze genome-scale models, but there are shortcomings in the current knowledge of metabolism and its representation in databases. Functinal assessment of predicted networks can complement bioinformatic approaches.

He also mentioned a Systems Biochemistry meeting at the University of York, March 22-24 2010. It will cover the systems analysis of metabolism, signallilng and control from a systems perspective, and systems approaches to health and disease.

Personal Comments: He had a very nice breakdown of the types of unbalanced reactions in KEGG in a table in his slides. It was quite surprising and enlightening – I didn't realize any such reactions would get through into KEGG. Thanks! A very good invited talk: well paced, clearly explained.

Monday Session 2
http://friendfeed.com/rooms/biosysbio
http://conferences.theiet.org/biosysbio

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. I'm happy to correct any errors you may spot – just let me know!

Read and post comments |
Send to a friend

original

Categories
Meetings & Conferences

Transcriptional Networks in Haematopoiesis (BioSysBio 2009)

D Miranda-Saavedra et al.
University of Cambridge

Haematopoiesis is blood development – one cell type differentiating into ~14 cell types. BloodExpress, a database of gene expression in mouse haematopoiesis, is mentioned. TFs involved in HSC specification and their regulatory relationships. The vast majority are essential, and malfunctions are implicated in diseases like leukemias. Scl-ChIP sequencing is used. They'd like to know what genes are regulated by SCL. They've used HPC-7 cells in their experiments, which are a haematopoietic cell line, stem cell factor dependent, and multipotential.

The candidate SCL genes were those next to the 228 high-confidence bound regions. They did some in vivo assays (transgenic analyses), looking at the resulting embryos after 2 weeks. He then showed a series of microscopy images of cross-sections under different conditions. They then performed some bioinformatics analyses to explore the regulatory nodes. In one case, they selected genes specifically shut down in the progenitors. They found a series of motifs, motif classes, and candidate factors for those motifs. This is motif discovery.

Haematopoiesis is a powerful stem cell system where transcriptional regulation is the key in driving blood development. Using the BloodExpress database and ChIP-seq on SCL/TAL1 they have extended the known HSC regulatory netowrk. They have a paper in "Blood" that is currently in press.

Monday Session 1
http://friendfeed.com/rooms/biosysbio
http://conferences.theiet.org/biosysbio

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. I'm happy to correct any errors you may spot – just let me know!

Read and post comments |
Send to a friend

original

Categories
Meetings & Conferences

Quantitative Modeling of Transcription Initiation in Bacteria (BioSysBio 2009)

M Djordjevic
Arkansas State University and Arkansas Biosciences Institute

Starts witha nice introduction into RNA polymerase (RNAP). There are a number of stages of transcription by RNA polymerase. The first step in transcription initiation, which he is interested in, is the formation of the open complex. How is the open complex formed? Even after 20 years of research, this question still hasn’t been completely answered. Using their biophysical model, they want to identify some of the quantities related to transcription initiation that are optimized by the design of RNAP and genomic sequence.

Recent findings may help us understand what happens: firstly, a bioinformatics study shoes that the region of ~15bps immediately upstram of transcription start site is prone to to melting; secondly, single molecule experiments show that the promoter region is melted in at least one step. Why is the entire ~15bp region prone to melting? It could be an artifical consequence of the fact that only the upstream -10 region is prone to melting, while the rest of the bubble is not prone to melting – has the same melting energy as random DNA elements. Therefore in the first step, only the -10 region would be melted through thermal fluctuations facilitated by RNAP-ssDNA interactions. This first step has to be rate limiting (from the single-molecule experiment). The second step is where the bubble extends towards the transcription start site. There is very good agreement with experimental data. This is the first quantitative model of open complex formation. The results strongly support the qualitative hypothesis. The model allows the efficient analysis of kinetic properites of DNA sequences on a whole-genome scale.

Is RNAP kinetically trapped at many locations in the genome? That is, does it bind with high affinity but with a low rate of transcription initiation? Such promoters are called cryptic promoters. If not, how is the RNAP and the genomic sequence designed to prevent this? The existence of cryptic promoters has been mentioned as a major cause for false positives in both experimental and computational studies. There is no a priori reason for why binding affinity and the rate of transcription initiation should be related to each other.

The did an experiment with E.coli, which found that as they go to higher binding affinities, most (or all) of these strong binders correspond to functional promoters.  Good correlation between the binding affinity and the rate of transcription initiation is entirely dependent upon the level of RNAP protein domains. The good correlation is not due to the genome sequence. However, is this good correlation due to some generic properties of DNA binding domains? Subsitute specific binding domains with those of different DNA binding proteins. They find that interaction domains of RNAP are hardwired so as to ensure the evasion of crypic promoters.

Is RNAP and/or genomic sequences designed to maximize the rate of transcription from strong promoters? The calculated the difference between maximal transcription activity and average transcription activity for intergeneic sequences. This led to the conclusion that the maximization of rates of transcription for strong promoters is entirely at the level of protein-DNA binding domains, and not at the level of the DNA sequence.

They developed a quantitative model of open complex formation of RNAP, and used it to infer some of the design principles behind transcription initiation by bacterial RNAP.

Monday Session 1
http://friendfeed.com/rooms/biosysbio
http://conferences.theiet.org/biosysbio

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Read and post comments |
Send to a friend

original

Categories
Meetings & Conferences

Four-Stranded DNA: How G-Quadruplexes Control Transcription and Translation (BioSysBio 2009)

JL Huppert et al.
University of Cambridge

They do both computational and experimental work to try to understand these structures. The classical base pair arrangements are not the only structures you can have. You can arrange them in tetrads with a phosphate backbone and potassium ion in the center. This allows you to have a single strand that falls back on itself to form a loop. This 4-stranded DNA could be associated with the human telomeric repeat. Telomerase is responsible for elongating telomeres and keeps them going in things like stem cells, and is also active in 85% of cancers.

These can attach themselves to the promoter and cause altered transcription. A drug or other protein could shift the state of the DNA from having an accessible promoter or not. Many genes involved in cancer have G-quadruplexes in their promoters. He asks: can we predict structure from sequence? Can we get information about their stability, for example? Where are G-quadruplexes found? What do they do? What can we do to them? The Quadparser algorithm was developed, and it looks like there are 379,000 G-quadruplexes encoded in the human genome. This algorithm is not perfect – it doesn't tell us anything about stability, among other things. So, they've developed a non-linear bayesian predictor, with a Gaussian noise model. It uses a list of possible features, fits to these using non-linear model, tolerates outliers and bounds, learns relevance of inputs, and gives predictions and error bars. They tested with 256 datapoints, with a 70/30 split for learning/testing sets. Better than linear regression and more simple Gaussian processes.

Over 40% of all known genes have a G-quadruplex motif in a 1kb promoter region. They are more stable than most. It's a really common regulatory element. It depends on the type of gene, whether or not it has this type of interaction. Oncogenes are enriched: 69% have such motifs.

They looked at one of these interesting proteins, N-ras, which is a GTP-ase protein involved in cell signalling. They found that when you remove the quadruplex, you get 4x as much of the protein. Others have taken this further and found a correlation between the amount of repression and the stability of the quadruplex. The quadruplex can also act as a pause between two closely-spaced genes.

Quadruplexes are extremely well conserved. We can split quadruplexes into the loops and non-loop areas, and find that the variation is localized in the loops rather than the core, non-loop areas by examining SNPs. What is the evolutionary direction of the changes? Are quadruplexes arising or being removed? There are very few mutations that introduce new quadruplexes, and many that cause them to be lost. Where they do arise, they spread through the population.

See http://www.quadruplex.org

Personal Comments: It was quite interesting to hear new things about telomeres, as they're of much interest to those of us researching ageing at CISBAN. As the chatter in the biosysbio FF states, he's very clear with his examples of equations, machine learning types and graphs. He talks very fast, but has so much to fit in! Manages to make it clear as well as fast.

Monday Session 1
http://friendfeed.com/rooms/biosysbio
http://conferences.theiet.org/biosysbio

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. Please let me know of any errors, and I'll fix them!

Send to a friend

original

Categories
Meetings & Conferences

Bayesian Reverse Engineering of Biological Systems and Their Dynamics (BioSysBio 2009)

M P H Stumpf
CISBIC, Imperial College London

How can we learn about the structure of biological systems in a generic sense? He started with a very nice (naughty?) photo of dragonflies doing a cartwheel. You can determine which is the male and which is the female by working out the logistics of the picture. I'll leave you to figure out what that means. How do we get information about this? Literature mining, comparative approaches, and learning from experiments. You can extend these approaches for the molecular realm. He is talking about how they learn from experiments (the first way in the previous sentence) by analysing the change in yeast (cerevisiae) networks. He used the flights to/from Australia to illustrate a network. He reminds us that not all of these interactions occur at the same time, and that other connections are indirect (via other nodes in the network).

Dynamical features of biological systems: 1. Change in network structure 2. Dynamical processes on networks. Both aspects of a system's dynamical behaviour can be learnt from suitable data. Are changes in expression patterns caused by qualitative or quantitative changes in the network? A Bayesian network (BN) has to be represented by a Directed Acyclic Graph (DAG), therefore you cannot have closed loops. Conventional or static BNs cannot represent feedback loops. However, you can unravel these feedback loops over timee and capture the dependency structure that way. Causality introduced via time dependence.

Computation is fairly straightforward. For each gene we have to determine the number of changepoints when its regulatory inputs change. For each phase, the regulatory inputs have to be determined. They have two small examples to illustrate this.

The first example is benomyl stress response in S.cerevisiae. For each cluster of gene expression profiles (WT+4 TF deletion mutants at 5 time points), they figured out which TFs "determine" expression patterns using the tvDBN approach. Changepoints and edgges are placed when the Bayes Factor suggests at least strong evidence for their existence. What does this mean for the networks? Previously we would have drawn links from the TFs to each of the things. However, now, the links are temporally located as well as properly linked.

The second example used a much larger D.melanogaster developmental data set. They had about 2000 genes here, and they inferred 2500-3000 interactions. They focused on those interactions that are either lost or gained during the embryo-larva-pupa stages. There were a very large number of changepoints at the embryo-larvae stage, and very few between the pupa and adult stage (which makes sense as the pupa is basically an adult that is just growing). The changes from embryo-larvae are mostly involved in metabolism, as the embryo changes to become an eating machine.

Bayesian model selection: use posterior probability of a model to calculate the approximate Bayesian Computation (ABC). In ABC, rather than evaluating the likelihood (which is often impossible or prohibitively expensive) you can compare observed and simulated data. In ABC, you start by simulating a parameter and a data set with that parameter value. If the value is less than a threshold value, you accept the parameter. You do this a lot, a large number of times. This will give rise to a posterior distribution. It's a "beautiful" simulation of posterior, but in practice is not practical. Therefore one of his students is working on an ABC Sequential Monte Carlo (SMC) method. It interpolates between the prior and posterior.

Then he showed a lovely video of a 2-d shadow as a projection of an 8 dimensionial thing. Very interesting way to visualize the various parameters. Shows a clear separation between two clouds, which is only visible in certain projections. Therefore the structure of the posterior distribution is not nice in the classical sense, as you have two modes. But you have the ability to find the difference between "stiff" and "sloppy" parameters. 1/3-2/3 of their parameters could be called sloppy in this parlance. In SB, many parameters have "flat" and wide posterior distributions. Measurements of individual "sloppy" parameters are difficulat and published results may be meaningless. We really have to combine the three approaches mentioned at the beginning (learning from experiments, comparative approaches, and literature mining).

Can a biologist fix a radio? Cancer Cell 2002 179-182. Interpretation of how a radio works by a biologist 🙂 .

Personal Comments: Very interesting, lots of humor to keep us interested, and I'm *almost positive* (from looking at the font) that he's used latex. Nice! Very nice network graphs – I wonder what graphics software he used?

Monday Session 1
http://friendfeed.com/rooms/biosysbio
http://conferences.theiet.org/biosysbio

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. Please let me know of any errors, and I'll fix them!

Read and post comments |
Send to a friend

original

Categories
Meetings & Conferences

Genomic and Genetic Approaches to the Systems Biology of the Eukaryotic Cell (BioSysBio 2009)

Steve Oliver
University of Cambridge
Keynote Talk 1
http://friendfeed.com/rooms/biosysbio
http://conferences.theiet.org/biosysbio/

It is not unexpected that Steve Oliver chose to talk about yeast, and calls it the perfect eukaryotic model organism. You can look at it in a coarse-grained, top-down approach. This approach includes research using Metabolic Control Analysis (MCA), which he calls a "shortcut to modelling metabolism". The central device of MCA is the Flux Control coefficient, which is a measure of the degree of control an enzyme has on a pathway flux.

You can change the concentration of gene products and measure the impact on flux. With this in mind, they have in the past created a deletion mutant for each of the protein coding genes in yeast. After insertion of the disruption cassette, you can sporulate and then perform tetrad dissection: if it is an essential gene, then there will not be any growth. If there is growth, it is non-essential and you have a colony containing the knockout.

You can look at the changing proportion of different mutant types in a population. They looked at competition between heterozygous and hemizygous mutants, and looked at the differences in growth rate. They found that, in pairwise comparison between the different condition, if they were haplosufficient in one condition they were sufficient in the others. The grape juice condition looked like nothing else: at the top of the list of haploinsufficient list had to do with the transport of pyrimidines. Haploproficiency (HP) is much more context/condition-dependent. Virtually every gene coding for the 26S proteasome were in the list for N-limited haploproficiency.

Haploinsufficient genes were vastly overrepresented in chromosome III irrespective of the type of limitation (except for one). It's the chromosome that determines the sex. The sexes are "a" and "alpha". There are three types: amphimixis (maximises chances of heterozygosity), haplo-selfing, and automixis/intratetrad mating. Haploinsufficient (HI) genes are more similar to their pre-duplication ancestors than their "ohnologs". The haploinsufficient gene is probably the ancestral version of the pair. K.lactis MAT chromosome III is also enriched for HI orthologs.

You can also do experiments which change the flux and measure products of gene action. This work was done in what his lab called "The Big Experiment", and makes use of a chemostat, from which we can get information about the transcriptome, metabolome, and proteome. These measure transcription changes that are wholly due to the change in flux. 493 genes were upregulated with increasing growth rates, and 398 genes were downregulated with increasing growth rates. Looking at the GO terms for those genes were were upregulated with growth rate include ribosome biogenesis and protein biosysthesis. Only 47 of the HI genes have their transcript levels under growth control, and only 26 of the HP types.

Is this a universal law, or context-dependent? It was found to be entirely context-dependent. Decided to repeat the composition experiments in a completely different environment: a turbidostat rather than a chemostat. In this case, there was no nutrient limitation. Looking at the HI genes in a turbidostat, they have the very functions that are to do with growth rate (note: I missed part of that point).

While the laws determining which genes control growth rate may change according to the selective conditions… (sorry!). The discovery of HP in nutrient-unconstrained conditions suggests that yeast has sacrificed short-term gain in favor of long-term survival.

Have worked with Ross King and the robot scientist work going on there. They've also started on a logical cell model encoded in Prolog. This is essentially a directed graph, with metabolites as nodes and enzymes as arcs. If a path can be found from the cell inputs to all the cell outputs, then the cell can grow.

They've created an experimental cycle that can be navigated by the robot, which can do some work on hypothesis inference. They removed the labels on the graph, and then used Abductive Logic Programming (ALP) to infer those missing arcs/labels in the metabolic graph. Abduction example follows. Rule: If a cell grows, then it can synthesise tryptophan. Fact: cell cannot grow. Therefore, the cell cannot synthesis tryp.

They tried a number of different experimental strategies (e.g. ALP versus naive versus random). Found that ALP was as good as graduate students who were presented with the same problem. They "closed the loop" wrt hypothesis formation, testing and validation. Original work was proof-of-principle as they got the robot to re-discover existing knowledge. They're now working on the discovering new knowledge.

To do this, they expanded the background knowledge of the robot, improved the efficiency of hypothesis generation, and extended the original qualitative methodology to allow for quantitative measurements. Basic premise was that growth should occur iff there was a path from growth medium to defined end end-points. The robot would then try to fill in the gaps in the model, where there must be enzymes etc to carry out specific steps. One strategy is to find yeast homologs of genes coding for proteins with appropriate EC numbers.

They have some new hardware to be their robot scientist, called Adam. It has the capacity to perform over 1000 experiments per day. You find that most of the genes that encode the enzyymes are often telomere-associated and have paralogs elsewhere in the genome. It's a very convoluted situation that was "unlikely to be solved by classical genetics." Keep an eye out for the upcoming Science paper called "The Automation of Science": the paper should appear in a couple of weeks.

Personal Comments: A great speaker and an interesting talk. Unfortunately wasn't able to capture absolutely everything he said…!

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. Please let me know of any errors, and I'll fix them!

Read and post comments |
Send to a friend

original