Data Integration

Background: What does systems biology data look like? (Thesis 1.2)

[Previous: Overview]
[Next: Modelling biological systems]

What does systems biology data look like?

Properties of a system exist that are more than just the sum of their parts; systems that contain these emergent properties are said to be irreducible (Note: Why Systems Matter, accessed December 2011.). Though reductionist methods of research can provide a large amount of detail for specific biological entities, a more holistic systems approach is required to understand emergent systems properties [1]. Such a top-down approach creates a life cycle of systems biology research. Beginning with the Hodgkin–Huxley model of squid axons in 1952 [2], hypotheses have been tested both in the laboratory and through simulations of mathematical models. Data from the laboratory informs these models, which can then be used to inform further experimentation and validate or invalidate hypotheses.

Systems biology focuses on the study of systems as a whole rather than on the examination of individual constituent parts. Data useful to systems biology tends to be large and heterogeneous both in dimensionality and in structure, with modern high-throughput techniques collecting vast amounts of relevant information [3]. It is standard practice to take data points from a sample not just once, but across space, time, geographical location, organisational or even spectral range [4]. The wide variety of experimental types leads to a correspondingly large number of data representations, analysis methods and modelling strategies [5]. The reconciliation of disparate systems biology data, and the concomitant organisation and management of biological data sources into an exploitable “resourceome”, is of great importance to researchers requiring access to existing data [6].

With the maturation of research methods, interpretations of the systems biology life cycle have become correspondingly more complex. Kitano detailed a relatively simple systems biology life cycle in 2002 which is summarised in Figure 1. By 2006, Philippi and colleagues had incorporated a data integration step as described in Figure 2. By 2009, semantics had become important enough to systems biology research that Antezana and colleagues had added formalisation of knowledge and reasoning to the cycle (see Figure 3).

Figure 1: The systems biology life cycle in 2002, based on Kitano [3, Fig.1]. “Dry”, in silico modelling and simulation experiments inform “wet” experiments, which in turn generate data used to create and further inform hypotheses.

Figure 2: The systems biology life cycle in 2006, based on Philippi and colleagues [7, Fig.1b]. Four years after the Kitano [3] life cycle was published, data integration methodologies, highlighted in yellow, were common enough to be added. Further, the entire cycle could be completed with either wet or dry experiments, or a combination of both.

Figure 3: The semantic systems biology life cycle in 2009, based on Antezana and colleagues [6, Fig.2]. The new methods of integration and the addition of a reasoning step are highlighted in yellow. The semantic phase is iterative, shown with an arrow back to the integration and formalism step. The continued importance of the original Kitano life cycle is described with an arrow bypassing the semantic phase. While the original figure by Antezana and colleagues did not explicitly include a reference to in silico research, the experiments described in the paper could have been either dry or wet.

Kitano’s life cycle does not mention databases or integration of generated data with other data sources. Philippi and colleagues’ modified life cycle has these additions as well as the acknowledgement that “dry” in silico experiments produce useful data independently of “wet” experiments. Historically data integration in bioinformatics consisted of cross references between databases or links out via URLs (see Section 1.6 for more information). More complex linking became common as ontologies such as the GO [8] made it possible to reference community-wide hierarchies of descriptive biological terms.

Very recently, with an increase in the use of Semantic Web (Note: technologies such as ontologies, semantic data integration has become an important tool in systems biology research [6] (see Section 1.6). Figure 3 shows an interesting progression in the perception of researchers with regard to the systems biology life cycle with the addition of semantic techniques. By 2009, semantics and ontologies were becoming a bigger part of systems biology research. As such, Antezana and colleagues added the formalisation of data to the integration step, allowing data to be viewed in a semantically uniform way. The semantic data then becomes accessible to computational methods, allowing reasoning and consistency checking of the data. Even so, the research described in this thesis is one of only a handful of projects focusing on semantic data integration in systems biology.

There are four main areas of study in systems biology research: (i) the structure (e.g. interactions and pathways) of a system; (ii) how a system behaves over time, or its dynamics; (iii) the method of controlling and modulating the system; and (iv) the design method, or the deliberate progress using well defined design principles [3]. These four properties are strongly tied to the quantitative modelling aspect of systems biology, and illustrate the importance of such models. However, models are of limited use to either people or computers if they do not have structured biological annotations to provide context [9]. For instance, until SBML [10] models are annotated by the BioModels team, elements often contain short-hand, biologically irrelevant names and descriptions in computationally incompatible free text [11]. While attaching additional biological knowledge to quantitative models is not a requirement for their simulation, without such annotations model sharing, interpretation of simulation results, integration and reuse becomes nearly impossible [9]. Therefore the addition of biologically relevant, computationally accessible metadata will not only enhance the semantics of a model but provide a method of unambiguously identifying its elements.

The majority of systems biology research projects can ultimately be interpreted to produce interconnected data such as gene networks, protein networks and metabolic networks [12]. The level of granularity of these networks of information can vary from large-scale omics networks with thousands of nodes to precisely calibrated quantitative models of specific molecular interactions. The integration of networks and models presents a challenge to systems biology, increasing the importance of bioinformatics techniques to the life science community, a result in opposition to early predictions [13]. In Section 1.3, the description of biological systems is examined through the use of networks and models.


Uwe Sauer, Matthias Heinemann, and Nicola Zamboni. Getting Closer to the Whole Picture. Science, 316(5824):550–551, April 2007.
A. L. Hodgkin and A. F. Huxley. A quantitative description of membrane current and its application to conduction and excitation in nerve. The Journal of physiology, 117(4):500–544, August 1952.
Hiroaki Kitano. Systems Biology: A Brief Overview. Science, 295(5560):1662–1664, March 2002.
Jason R. Swedlow, Suzanna E. Lewis, and Ilya G. Goldberg. Modelling data across labs, genomes, space and time. Nature Cell Biology, 8(11):1190–1194, November 2006.
Katrin Hübner, Sven Sahle, and Ursula Kummer. Applications and trends in systems biology in biochemistry. FEBS Journal, 278(16):2767–2857, August 2011.
Erick Antezana, Martin Kuiper, and Vladimir Mironov. Biological knowledge management: the emerging role of the Semantic Web technologies. Briefings in Bioinformatics, 10(4):392–407, July 2009.
Stephan Philippi and Jacob Kohler. Addressing the problems with life-science databases for traditional uses and systems biology. Nat Rev Genet, 7(6):482–488, June 2006.
Michael Ashburner, Catherine A. Ball, Judith A. Blake, David Botstein, Heather Butler, J. Michael Cherry, Allan P. Davis, Kara Dolinski, Selina S. Dwight, Janan T. Eppig, Midori A. Harris, David P. Hill, Laurie Issel-Tarver, Andrew Kasarskis, Suzanna Lewis, John C. Matese, Joel E. Richardson, Martin Ringwald, Gerald M. Rubin, and Gavin Sherlock. Gene Ontology: tool for the unification of biology. Nature Genetics, 25(1):25–29, May 2000.
Nicolas L. Novere, Andrew Finney, Michael Hucka, Upinder S. Bhalla, Fabien Campagne, Julio Collado-Vides, Edmund J. Crampin, Matt Halstead, Edda Klipp, Pedro Mendes, Poul Nielsen, Herbert Sauro, Bruce Shapiro, Jacky L. Snoep, Hugh D. Spence, and Barry L. Wanner. Minimum information requested in the annotation of biochemical models (MIRIAM). Nature Biotechnology, 23(12):1509–1515, December 2005.
M. Hucka, A. Finney, H. M. Sauro, H. Bolouri, J. C. Doyle, H. Kitano, , the rest of the SBML Forum:, A. P. Arkin, B. J. Bornstein, D. Bray, A. Cornish-Bowden, A. A. Cuellar, S. Dronov, E. D. Gilles, M. Ginkel, V. Gor, I. I. Goryanin, W. J. Hedley, T. C. Hodgman, J. H. Hofmeyr, P. J. Hunter, N. S. Juty, J. L. Kasberger, A. Kremling, U. Kummer, N. Le Novère, L. M. Loew, D. Lucio, P. Mendes, E. Minch, E. D. Mjolsness, Y. Nakayama, M. R. Nelson, P. F. Nielsen, T. Sakurada, J. C. Schaff, B. E. Shapiro, T. S. Shimizu, H. D. Spence, J. Stelling, K. Takahashi, M. Tomita, J. Wagner, and J. Wang. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics, 19(4):524–531, March 2003.
Chen Li, Marco Donizelli, Nicolas Rodriguez, Harish Dharuri, Lukas Endler, Vijayalakshmi Chelliah, Lu Li, Enuo He, Arnaud Henry, Melanie Stefan, Jacky Snoep, Michael Hucka, Nicolas Le Novere, and Camille Laibe. BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models. BMC Systems Biology, 4(1):92+, June 2010.
James E. Ferrell. Q&A: systems biology. Journal of biology, 8(1):2+, January 2009.
Lincoln D. Stein. Bioinformatics: alive and kicking. Genome biology, 9(12):114+, December 2008.

By Allyson Lister

Find me at and

5 replies on “Background: What does systems biology data look like? (Thesis 1.2)”

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s