TT47: Semantic Data Integration for Systems Biology Research (ISMB 2009)

Chris Rawlings, Also speaking: Catherine Canevet and Paul Fisher

BBSRC-funded research collaboration in Newcastle, Manchester, and Rothamsted : ONDEX and Taverna. Demo: Integration and augmentation of yeast metabolome model (Nature Biotech October 2008 26(10). Presented: Taverna and ONDEX. In ONDEX, everything can be seen as a network. To help with this, ONDEX contains an ontology of concept classes, relation types, and additional properties. Their example is yeast jamboree data integration. They have both specific (e.g. KEGG) and generic (e.g. tab delimited) parsers to load in data.

When ONDEX works with Taverna, instead of using the pipeline manager you use the ONDEX web services and access ONDEX from Taverna. This means you can use Taverna to pull in data into ONDEX. So, first parse jamboree data into ONDEX and remove currency metabolites (e.g. ATP, NAD). Add publications to the graph, from which domain experts can view and manually curate that data. Finally, annotate the graph using network analysis results. Then switch to taverna and identify orphans discovered in ONDEX. Retrieve the enzymes relating to the orphans and assemble the PubMed query and then add hits back to the ONDEX graph. Finally, have a look at the completed visualization. Use the ONDEX pipeline manager to upload data – it’s all in a GUI, which is good.

Then followed a live demo.


Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Chris Rawlings on Ondex, BBSRC Systems Biology Workshop

BBSRC Systems Biology Grantholder Workshop, University of Nottingham, 16 December 2008.

Introduction (common knowledge): Systems biology projects need data integration. Syntactic and semantic integration must both be considered.

Ondex ( is a framework for integrating multi-omics data both syntactically and semantically. Everything in Ondex is presented as a network. It is a domain-independent approach to semantic data integration. Though is currently a data warehouse, a more federated approach is planned. Strongly committed to an open-source and standards approach. Rothamsted: software integraion, releases, infrastructure, outreach. Text mining: NaCTeM (National Centre for Text Mining) – Ananiadou. Workflows, Distribution: U of Manchester (Stevens, Goble). Statistical Methods, Semantic Reasoning comes from Newcastle University (Wipat, Wilkinson, Lord). Integrating Data sources is worked on in a variety of ways: SBML loader (being developed at Newcastle Uni), for example. They're getting taverna-ready, and providing exports in phg, jpg, bmp, and vector formats like eps and graphml. Also allowing exports in Prolog (for CISBIC).

Text Mining work: providing enhanced search, access, and associations in the biological literature, analysis of yeast metabolic reactions (Manchester Uni), association of microorganisms and habitats (Newcastle Uni). Lots of outreach activities.

They've integrated data sources from TAIR and the Poplar resource. In Manchester, they've integrated yeast metabolic models, which is a instance of the general problem of comparing SBML models. In Newcastle, David Lydall is using Ondex to annotate and classify the mutant phenotypes for yeast.

