The second day of IB2006 was the longest of the three days, and the only “full” day. From my point of view only, the talks were more relevant and interesting to my work. The second evening was also the conference dinner, which was very sociable and the conversations continued straight through the dinner and into late in the night back at the conference hotel. But back to the day itself: there was a fantastic keynote by Pedro Mendes, and a number of other interesting talks. The highlights are presented below.
Top-down modeling of biochemical networks, a grand challenge of systems biology (Pedro Mendes)
Systems biology, in his view, is the study of a system through synthesis or analysis, using quantitative and/or high-throughput data. Origins of systems biology as early as 1940s, but with a large amount of work done in the 1969s-70s. It didn’t really take off during this time due to lack of computing power and lack of experimental “ability” for getting the large amounts of data required.
Pedro is interested in the top-down modeling approach because there is a large amount of data, with a lot of numbers, and people naturally want to make models from them. Many people think this isn’t the way to build models, but he believes otherwise. In bottom-up modeling you start with a small number of known reactions and variables, while in top-down modeling you start at a coarse-grained level, with loads of data and you try to work “backwards” (compared to traditionaly modeling procedures) to find the steps that will produce the HTP data you started with. In other words, it derives elementary parts from studying the whole.
BASIS (Colin Gillespie)
Colin gave an interesting talk on the availability and usefullness of a web-based stochastic simulator. You can create, access and run sbml models via web services (and a web-page front-end to the web services). Their aim is to make their own models available to other researchers and also to provide a framework for others to build their own models. In general, their models can be envisaged as networks of individual biochemical mechanisms. Each mechanism is represented by a system of chemical equations, quantified by substrate and product concentrations and the associated reaction rates. The connected series of reactions are then simulated in time. Simulation may be stochastic or deterministic depending on species concentration. They have funding for another 6 years and are planning many additions to the tool.
Multi-model inference of network properties from incomplete data (Michael Stumpf)
Estimates for rates of false positives range from 20 to 60%, and in connection with this he recalls a quote he read at one time stating that gene expression is as close to scientific fraud as is accepted by the scientific establishment. At least at the moment, it appears to be a trade off between data quality and data quantity. In other words, you must take noise into account in any analytical work you do.
For most species, you only have interaction data for a subset of the proteome. Missing such data means that you can get quite different networks (currently known versus “actual” network). This affects summary statistics, among many others. They discovered that generally, inference for networks comprising less than 80% of the full graph should be treated with caution, however above that value the inference model developed is very useful. Given a subnet it is possible to predict some properties of the true network if we know the sampling process. (independent of the process by which the network has grown). For different data sets, there seems to be a huge difference between different experimental labs, and how each has mapped parts of the interactome. However, overall this is a good way of estimating total interactome size by performing this test on multiple subnets from different PPI experiments. There are limitations, though: it ignores multiple splice variants and domain architecture, so any organism affected by these will not necessarily have as good a result. By interrogating all these different models, and averaging over that, useful estimates of total interactome size is possible. Useful estimates can even be retrieved when using partial data as long as the number of nodes is at least 1000.
Other interesting talks included Stuart Moodie’s discussion of the current state of affairs in standardizing systems biology graphical notation and visualization (sbgn, kitano and others), Jab Baumbach’s work on performing knowledge ‘transfers’ for transcriptional regulatory networks from a model species to 3 other similar species important to human pathogen studies, Jan Kuentzer’s biological information system using both C++ and Java called BN++, an eye-opening overview of the current status of biocomputing at Singapore’s Biopolis by Gunaretnam Rajagopal), a lovely swooping demo of a targeted projection pursuit tool for gene expression visualization by Joe Faith, and a wonderfully presented (which in my mind equates to “easily understood” because of her skill as a speaker) statistical talk on modeling microarray data and interpreting and communicating biological results by Yvonne Pittelkow. (Yes, a couple of those were from day one, but they still deserved a mention!)