The more things change, the more they stay the same

…also known as Day 1 of the BBSRC Synthetic Biology Standards Workshop at Newcastle University, and musings arising from the day’s experiences.

In my relatively short career (approximately 12 years – wait, how long?) in bioinformatics, I have been involved to a greater or lesser degree in a number of standards efforts. It started in 1999 at the EBI, where I worked on the production of the protein sequence database UniProt. Now, I’m working with systems biology data and beginning to look into synthetic biology. I’ve been involved in the development (or maintenance) of a standard syntax for protein sequence data; standardized biological investigation semantics and syntax; standardized content for genomics and metagenomics information; and standardized systems biology modelling and simulation semantics.

(Bear with me – the reason for this wander through memory lane becomes apparent soon.)

How many standards have you worked on? How can there be multiple standards, and why do we insist on creating new ones? Doesn’t the definition of a standard mean that we would only need one? Not exactly. Take the field of systems biology as an example. Some people are interested in describing a mathematical model, but have no need for storing either the details of how to simulate that model or the results of multiple simulation runs. These are logically separate activities, yet they fall within a single community (systems biology) and are broadly connected. A model is used in a simulation, which then produces results. So, when building a standard, you end up with the same separation: have one standard for the modelling, another for describing a simulation, and a third for structuring the results of a simulation. All that information does not need to be stored in a single location all the time. The separation becomes even more clear when you move across fields.

But this isn’t completely clear cut. Some types of information overlap within standards of a single domain and even among domains, and this is where it gets interesting. Not only do you need a single community talking to each other about standard ways of doing things, but you also need cross-community participation. Such efforts result in even more high-level standards which many different communities can utilize. This is where work such as OBI and FuGE sit: with such standards, you can describe virtually any experiment. The interconnectedness of standards is a whole job (or jobs) in itself – just look at the BioSharing and MIBBI projects. And sometimes standards that seem (at least mostly) orthogonal do share a common ground. Just today, Oliver Ruebenacker posted some thoughts on the biopax-discuss mailing list where he suggests that at least some of BioPAX and SBML share a common ground and might be usefully “COMBINE“d more formally (yes, I’d like to go to COMBINE; no, I don’t think I’ll be able to this year!). (Scroll down that thread for a response by Nicolas Le Novère as to why that isn’t necessarily correct.) So, orthogonality, or the extent to which two or more standards overlap, is sometimes a hard thing to determine.

So, what have I learnt? As always, we must be practical. We should try to develop an elegant solution, but it really, really should be one which is easy to use and intuitive to understand. It’s hard to get to that point, especially as I think that point is (and should be) a moving target. From my perspective, group standards begin with islands of initial research in a field, which then gradually develop into a nascent community. As a field evolves, ‘just-enough’ strategies for storing and structuring data become ‘nowhere-near-enough’. Communication with your peers becomes more and more important, and it becomes imperative that standards are developed.

This may sound obvious, but the practicalities of creating a community standard means such work requires a large amount of effort and continued goodwill. Even with the best of intentions, with every participant working towards the same goal, it can take months – or years – of meetings, document revisions and conference calls to hash out a working standard. This isn’t necessarily a bad thing, though. All voices do need to be heard, and you cannot have a viable standard without input from the community you are creating that standard for. You can have the best structure or semantics in the world, but if it’s been developed without the input of others, you’ll find people strangely reluctant to use it.

Every time I take part in a new standard, I see others like me who have themselves been involved in the creation of standards. It’s refreshing and encouraging. Hopefully the time it takes to create standards will drop as the science community as a whole gets more used to the idea. When I started, the only real standards in biological data (at least that I had heard of) were the structures defined by SWISS-PROT and EMBL/GenBank/DDBJ. By the time I left the EBI in 2006, I could have given you a list a foot long (GO, PSI, and many others), and that list continues to grow. Community engagement and cross-community discussions continue to be popular.

In this context, I can now add synthetic biology standards to my list of standards I’ve been involved in. And, as much as I’ve seen new communities and new standards, I’ve also seen a large overlap in the standardization efforts and an even greater willingness for lots of different researchers to work together, even taking into account the sometimes violent disagreements I’ve witnessed! The more things change, the more they stay the same…

At this stage, it is just a limited involvement, but the BBSRC Synthetic Biology Standards Workshop I’m involved in today and tomorrow is a good place to start with synthetic biology. I describe most of today’s talks in this post, and will continue with another blog post tomorrow. Enjoy!

For those with less time, here is a single sentence for each talk that most resounded with me:

  1. Mike Cooling: Emphasising the ‘re’ in reusable, and make it easier to build and understand large models from reusable components.
  2. Neil Wipat: For a standard to be useful, it must be computationally amenable as well as useful for humans.
  3. Herbert Sauro: Currently there is no formal ontology for synthetic biology, but one will need to be developed.

This meeting is organized by Jen Hallinan and Neil Wipat of Newcastle University. Its purpose is to set up key relationships in the synthetic biology community to aid the development of a standard for that community. Today, I listened to talks by Mike Cooling, Neil Wipat, and Herbert Sauro. I was – unfortunately – unable to be present for the last couple of talks, but will be around again for the second – and final – day of the workshop tomorrow.

Mike Cooling – Bioengineering Institute Auckland, New Zealand

Mike uses CellML (it’s made where he works, but that’s not the only reason…) in his work with systems and synthetic biology models. Among other things, it wraps MathML and partitions the maths, variables and units into reusable pieces. Although many of the parts seem domain specific, CellML itself is actually not domain specific. Further, unlike other modelling languages such as SBML, components in CellML are reusable and can be imported into other models. (Yes, a new package called comp in SBML Level 3 is being created to allow the importing of models into other models, but it isn’t mature – yet.)

How are models stored? There is the CellML repository, but what is out there for synthetic biology? The Registry of Standard Biological Parts was available, but only described physical parts. Therefore they created a Registry of Standard Virtual Parts (SVPs) to complement the original registry. This was developed as a group effort with a number of people including Neil Wipat and Goksel Misirli at Newcastle University.

They start with template mathematical structures (which are little parts of CellML), and then use the import functionality available as part of CellML to combine the templates into larger physical things/processes (‘SVPs’) and ultimately to combine things into system models.

They extended the CellMLRepository to hold the resulting larger multi-file models, which included adding a method of distributed version control and allow the sharing of models between projects through embedded workspaces.

What can these pieces be used for? Some of this work included the creation of a CellML model of the biology represented in Levskaya et al. 2005 and deposit all of the pieces of the model in the CellML repository. Another example is a model he’s working on about shear stress and multi-scale modelling for aneurysms.

Modules are being used and are growing in number, which is great, but he wants to concentrate more at the moment on the ‘re’ of the reusable goal, and make it easier to build and understand large models from reusable components. Some of the integrated services he’d like to have: search and retrieval, (semi-automated) visualization, semantically-meaningful metadata and annotations, and semi-automated composition.

All this work above converges on the importance of metadata. With the CellML Metadata Framework 1.0, not many used it. With version 2.0 they have developed a core specification with is very simple and then provide many additional satellite specifications. For example, there is a biological information satellite, where you use the biomodels qualifiers as relationships between your data and MIRIAM URNs. The main challenge is to find a database that is at the right level of abstraction (e.g. canonical forms of your concept of interest).

Neil Wipat – Newcastle University

Please note Neil Wipat is my PhD supervisor.

Speaking about data standards, tool interoperability, data integration and synthetic biology, a.k.a “Why we need standards”. They would like to promote interoperability and data exchange between their own tools (important!) as well as other tools. They’d also like to facilitate data integration to inform the design of biological systems both from a manual designer’s perspective and from the POV of what is necessary for computational tool use. They’d also like to enable the iterative exchange of data and experimental protocols in the synthetic biology life cycle.

A description of some of the tools developed in Neil’s group (and elsewhere) exemplify the differences in data structures present within synthetic biology. BacilloBricks was created to help get, filter and understand the information from the MIT registry of standard parts. They also created the Repository of Standard Virtual Biological Parts. This SVP repository was then extended with parts from Bacillus and was extended to make use of SBML as well as CellML. This project is called BacilloBricks Virtual. All of these tools use different formats.

It’s great having a database of SVPs, but you need a way of accessing and utilizing the database. Hallinan and Wipat have started a collaboration with Microsoft Research with the people who created a programming language for genetic engineering of living cells called the genetic engineering of cells (GEC) simulator. Some work a summer student did created a GEC compiler for SVPs from BacilloBricks virtual. Goksel has also created the MoSeC system where you can automatically go from a model to a graph to a EMBL file.

They also have BacillusRegNet, which is an information repository about transcription factors for Bacillus spp. It is also a source of orthogonal transcription factors for use in B. subtilis and Geobacillus. Again, it is very important to allow these tools to communicate efficiently.

The data warehouse they’re using is ONDEX. They feed information from the ONDEX data store to the biological parts database. ONDEX was created for systems biology to combine large experimental datasets. ONDEX views everything as a network, and is therefore a graph-based data warehouse. ONDEX has a “mini-ontology” to describe the nodes and edges within it, which makes querying the data (and understanding how the data is structured) much easier. However, it doesn’t include any information about the synthetic biology side of things. Ultimately, they’d like an integrated knowledgebase using ONDEX to provide information about biological virtual parts. Therefore they need a rich data model for synthetic biology data integration (perhaps including an RDF triplestore).

Interoperabiligy, Design and Automation: why we need standards.

Requirement 1. There needs to be interoperability and data exchange among these tools as well as among these tools and other external tools. Requirement 2. Standards for data integration aid the design of synthetic systems. The format must be both computationally amenable and useful for humans. Requirement 3. Automation of the design and characterization of synthetic systems, and this also requires standards.

The requirements of synthetic biology research labs such as Neil Wipat’s make it clear that standards are needed.

KEYNOTE: Herbert Sauro – University of Washington, US

Herbert Sauro described the developing community within synthetic biology, the work on standards that has already begun, and the Synthetic Biology Open Language (SBOL).

He asks us to remember that Synthetic Biology is not biology – it’s engineering! Beware of sending synthetic biology grant proposals to a biology panel! It is a workflow of design-build-test. He’s mainly interested in the bit between building and testing, where verification and debugging happens.

What’s so important about standards? It’s critical in engineering, where if increases productivity and lowers costs. In order to identify the requirement you must describe a need. There is one immediate need: store everything you need to reconstruct an experiment within a paper (for more on this see the Nature Biotech paper by Peccoud et al. 2011: Essential information for synthetic DNA sequences). Currently, it’s almost impossible to reconstruct a synthetic biology experiment from a paper.

There are many areas requiring standards to support the synthetic biology workflow: assembly, design, distributed repositories, laboratory parts management, and simulation/analysis. From a practical POV, the standards effort needs to allow researchers to electronically exchange designs with round tripping, and much more.

The standardization effort for synthetic biology began with a grant from Microsoft in 2008 and the first meeting was in Seattle. The first draft proposal was called PoBoL but was renamed to SBOL. It is a largely unfunded project. In this way, it is very similar to other standardization projects such as OBI.

DARPA mandated 2 weeks ago that all projects funded from Living Foundries must use SBOL.

SBOL is involved in the specification, design and build part of the synthetic biology life cycle (but not in the analysis stage). There are a lot of tools and information resources in the community where communication is desperately needed.

SBOL Semantic, SBOL Visual, and SBOL Script. SBOL Semantic is the one that’s going to be doing all of the exchange between people and tools. SBOL Visual is a controlled vocabulary and symbols for sequence features.

Have you been able to learn anything from SBML/SBGN, as you have a foot in both worlds? SBGN doesn’t address any of the genetic side, and is pretty complicated. You ideally want a very minimalistic design. SBOL semantic is written in UML and is relatively small, though has taken three years to get to this point. But you need host context above and beyond what’s modelled in SBOL Semantic. Without it, you cannot recreate the experiment.

Feature types such as operator sites, promoter sites, terminators, restriction sites etc can go into the sequence ontology (SO). The SO people are quite happy to add these things into their ontology.

SBOLr is a web front end for a knowledgebase of standard biological parts that they used for testing (not publicly accessible yet). TinkerCell is a drag and drop CAD tool for design and simulation. There is a lot of semantic information underneath to determine what is/isn’t possible, though there is no formal ontology. However, you can semantically-annotate all parts within TinkerCell, allowing the plugins to interpret a given design. A TinkerCell model can be composed of sub-models. Makes it easy to swap in new bits of models to see what happens.

WikiDust is a TinkerCell plugin written in Python which searches SBPkb for design components, and ultimately uploads them to a wiki. LibSBOLj is a library for developers to help them connect software to SBOL.

The physical and host context must be modelled to make all of this useful. By using semantic web standards, SBOL becomes extensible.

Currently there is no formal ontology for synthetic biology but one will need to be developed.

Please note that the notes/talks section of this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Keynote: New Challenges and Opportunities in Network Biology (ISMB 2009)

ISCB Overton Prize Lecture: Trey Ideker, University of California, San Diego

Introduction to Trey (before he starts his talk): Received his PhD working with Leroy Hood in 2001. The curse of systems biology: you will be a jack of all trades, rather than a master of one. On to the talk.

Also worked with Richard Karp. Big question: How does one automatically assemble pathways? Design new perturbations to maximize information gain (this is what he did for his PhD). Ideker et al.: Ann Rev Genomics Hum Genetics 2001 – his PhD work (Systems Biology: A new approach to decoding life).

Let’s think about all the public interaction data: protein-dna interactions, PPIs, biochemical reactions. (Ideker et al Science 2001).  The final figure of that Science manuscript, he feels, launched his career.

Querying biological networks for “Active Modules”, where you can paint the network with colors: for patient expression profile, protein states, any functional assay. This highlights the Interaction Database Dump, aka “Hairballs”, which aren’t good for a whole lot. (Ideker Bioinformatics 2002). In recent work with Chanda and Bandyopadhyay, he’s worked on Project siRNA pheotypes onto a network of Human-human and human-HIV protein interactions. Look at the network modules associated with infection (Konig et al. Cel 2008).

Next: Moving Network Biology into the Clinic: the working map. Importantly, this map doesn’t have to be complete, and there can be some toleration for FP and FN. Their research wants to move from network assembly from genome-scale data to network-based study of disease. From this map, you could get: network-based diagnoses, functional separation of disease gene families, moving from GWAS to network-wide PAS (Pathway AS). Input is: network evolutionary comparison/cross-species alignment to identify conserved modules, projection of molecular profiles on protein networks to reveal active modules, integration of transcriptional interactions with causal or functional links, etc. These working maps are still essentially hairballs, even if they are represented as pretty pictures. But isn’t the cell really a hairball inside anyway? Maybe the secret isn’t figuring out this thing – maybe it’s to use this thing.

Extracting conserved pathways and complexes from cross-species network comparison (with Sharan and Karp): PathBLAST and NetworkBLAST for cross-comparison of networks. Start with two large hairballs; next realize that there is a third network implicit there of protein sequence homologies/orthologies between the two networks; given it is a many-to-many relationships between the networks, find the particular one that is the maximum alignment; highly score dense conserved complexes; then look for conserved interactions and find matched protein pairs (he does use sequence similarity for some things); the interaction scores come from logistic regression on number of observations, expression correlation, and clustering coefficient. They applied it for Plasmodium and Sarcchomyces.

Also did work on Human vs mouse TF-TF networks in brain (Tim Ravasi). You combine these quite readily, and id2, rb1 and cepbd are some examples. What follows is a very nice slide on the timeline of both biological sequence comparison and biological network comparison (Sharan & Ideker. Nat Biotech 2006). Trey thinks there are better things out there now than PathBLAST and networkBLAST.

Genetic interactions (non-physical) form a distinct type of network map (Tong et al. Science 2001). Here, there exists a genetic interaction between gene A and B if phenotype of mutant a is OK, mutant b is OK, and mutant ab is sick. How can you compare these to physical networks? Kelley and Ideker Nat Biotech 2005 worked on systematic identification of parallel pathway relations. Genetic interactions run between clusters of physical interactions, not within them.

Functional maps of protein complexes (Bandyopadhyay et al. PLoS Comp Bio 2008). (Roguev, Science 2008) Genetic interaction maps are conserved between species (S cerevisiae, S pombe) (Thanks to Oliver for that article – I missed it on the slide).

Using ChIP-chip to assemble transcriptional networks underlying genotoxicity (Craig Mak and Chris Workman), and doing network comparison. Firstly, integrate cause-and-effect interactions with physical networks (Yeang, Mak et al. Genome Biology 2005). What if a lot of transcriptional binding is real but inconsequential to cellular function? They’d try to systematically functionally validate all the ChIP-chip data they generated. Workman, Mak et al. Science 2006. Recent extensions to this work: Mak et al. Genome Research 2009. Here, about 10% of TF show an interesting spatial distribution on the genome. Characterize based on the distance to the closest telomere for a given gene. Then characterize a TF by looking at distribution of distances of each one to its chromosome end. There does seem to be condition-specific behaviour: probably it isn’t the TF moving from one part of chromosome to another, but perhaps the genome is moving to and fro around them.

Network-based disease diagnosis. Much work is increasingly moving in this direction. Using protein networks to diagnose breast cancer metastasis. Breast cancers are very heterogeneous. Can we improve the work in terms of reproducibility and classification using further interaction information? If each patient has a mutation in a different gene, what do we do? What if these genes are sequential steps in a pathway, or are subunits in a common complex? Might you then be able to learn a rule for this? Nature Biotech 2009  Taylor et al.

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Annotation of SBML Models Through Rule-Based Semantic Integration (ISMB Bio-Ont SIG 2009)

Allyson Lister et al.

I didn’t take any notes on this talk, as it was my own talk and I was giving it. However, I can link you out to the paper on Nature Precedings and the Bio-Ontologies programme on the ISMB website. Let me know if you have questions!

You can download the slides for this presentation from SlideShare.

FriendFeed Discussion: http://ff.im/4xtmz

BioModels Workshop 2009: Day 2

Today was great fun – lots of presentations and lots of lively discussions, of which we were all a part, but which Nicolas Le Novère ("shown" left, courtesy of Falko Krause 🙂 ) also enjoyed.

Here are the notes!

CellML: Catherine Lloyd

Most of the talk aligned with the talk Catherine gave at BioSysBio 2009 this past week. Some parts were new, however. For instance, she seemed to spend a little more time on versioning. A version is an update of a model entry – usually with a traceable model history. A variant is a slightly different model from the same reference. A variant could be the same model adapted for adifferent cell type. Alternatively, variants of a model may be created to reproduce the different figures from a publication.

libAnnotationSBML: Neil Swainston

Automatic Linking of MIRIAM Annotation to a model using web services. He was involved with the creation of the SBML metabolic yeast network, which had MIRIAM annotations. And now that this qualitative information has been published, they're doing some experiments to get quantitative data. They developed a simple CellDesigner plugin as proof-of-concept to allow the linking of a model to their quantitative data repository (not finished yet).

MIRIAM annotations are a form of tagging the model. However, they want to do more: use the annotations to "reason" over the model. By "reason", they mean doing more than just seeing if the model is annotated: but seeing if the model is being annotated well. Do the reactions balance? Such a question cannot solely be answered by libSBML, and they can use ChEBI to do this. As a human, you would go to the ChEBI entry and get the formula from ChEBI. Then, you can compare that to your reaction. Can this be done automatically?

libAnnotationSBML connects to ChEBI, KEGG, UniProt, MIRIAM. This information is presented in a single convenience class. This stuff has a "SBML Reaction Balance Analyser". They don't do any automatic corrections, but they can identify where something doesn't match with ChEBI. Would like to do it automatically in the near future. Would also like to suggest corrections to existing models (incorrect annotations, missing reactants / products, stoichiometry). Would like to intelligently generate models.

Future: support more web services, write it in C++, or perhaps ask the MIRIAM people to have a web service method that retrieves the URL for the wsdl as well as the human-readable URL. However, connections to web services tend to be inconsistent, and therefore you can't always get the information you want.

semanticSBML: Falko Krause

You can find more information here: http://sysbio.molgen.mpg.de/semanticsbml/. Here there is a standalone GUI which is capable of offline annotation. There is also a web interface.

This is in fact a much more interesting application than is suggested by the notes – mainly I was preoccupied with making sure my talk was ready to go, as it was almost my turn. I highly recommend that you have a look at the link above and have a play with this software.

Saint

I didn't speak directly about Saint, as I will be speaking about MFO instead this afternoon. However, as model annotation was being talked about today, I thought it might be useful for me to put up some information about Saint. The presentation and video will be up on the IET website (but isn't yet). In the meantime, here's a rundown of the purpose of Saint.

The creation of accurate quantitative Systems Biology Markup Language (SBML) models is a time-intensive manual process. Modellers need to know and understand both the systems they are modelling and the intricacies of SBML. However, the amount of relevant data for even a relatively small and well-scoped model is overwhelming. Saint, an automated SBML annotation integration environment, aims to aid the modeller and reduce development time by providing extra information about any given SBML model in an easy-to-use interface. Saint accepts SBML-formatted files and integrates information from multiple databases automatically. Any new information that the user agrees with is then automatically added to the SBML model.

The initial functionality of Saint allows the annotation of already-extant species and suggests additional interactions. The user uploads their SBML model, and the portions of the model recognized by Saint are then displayed using a tabular structure. The user can then remove any items they are not interested in annotating. For instance, some terms such as "sink" are modelling artefacts and do not correspond to genes or proteins. Therefore, the user would normally wish to delete this from the search space to prevent any possible matches with actual biological species of a similar name. Once the user is satisfied with the list of items to be annotated, the model is submitted using the "Annotate Listed Items" button at the bottom of the table. A summary of the annotation returned by Saint is then added to the main table. The user can then remove any new annotation that is unsuitable for their model. At any stage, the user may click on the "Annotated Model" tab in Saint, which adds all new annotation to the original model and presents the new SBML model for viewing and download.

While there are a number of tools available for manipulating and validating SBML (e.g. LibSBML), simulating SBML models (e.g. BASIS and the SBML Toolbox ), and analysing simulations (e.g. COPASI,), and running modelling workflows (e.g. Taverna ), Saint is the first to provide basic automatic annotation of SBML models in an easy-to-use GUI. The purpose of Saint is to aid the researcher in the difficult task of information discovery by seamlessly querying multiple databases and providing the results of that query within the SBML model itself. By providing a modelling interface to existing data integration resources and, modellers are able to add valuable information to models quickly and simply.

Saint already generates reactions and associated new species and species references. It is being extended this creation of reactions to also generate skeleton models based around a species or pathway of interest.

SBO: Nick Juty

The sourceforge website has a tracker as well as access to the whole project. You can browse the whole tree from http://www.ebi.ac.uk/sbo. Your search retrieves a series of tables, and they will retrieve obsolete terms so that you can tell what used to be there. The main curation works happens via a web interface that directly talks to the database (this is just for curation). Lots of web services available.

From SBML to SBGN through SBO: Alice Villeger

Semantic annotations as a bridge between standards. Showed a very nice modification to the SBGN reference card where she colored sections by their SBO branch, which then showed up areas where different branches were used for the same type of notation (and therefore were candidates for modification within SBO). She showed that the SBML info needed is in Species Reference => this can be solved by changing the current SBGN specs. Further, there are some SBO terms that have no direct SBML equivalent (e.g. or, and). She gave a number of other examples, too.

It also seems that the compartment in SBGN and the SBML specification don't match. This is because the SBML compartment is not intended to be the same as the SBGN compartment (a functional versus a physical compartment).

Her analysis of the alignment of SBGN and SBO showed up a number of inconsistencies. This was really useful. There should be some machine-readable expression of SBML x SBO and SBGN x SBO. Further, there aren't many models annotated with
SBO yet. And, if they are, they are not always sufficiently precise. One solution could be a MIRIAM to SBO converter program.

http://arcadiapathways.sourceforge.net

http://biomodels.net/meetings/2009/index.html

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. I'm happy to correct any errors you may spot – just let me know!

Read and post comments |
Send to a friend

original

BioModels Workshop 2009: Day 1

BioModels Database Introduction: Nicolas Le Novere

Repository of quantitative models only for the moment: no implicit statement of biochemical accuracy as a consequence of being in the database, but must be of biological interest and only those that have been described in peer-reviewed scientific literature. In terms of curation: model syntax and semantics are checked; and then models are simulated to check the correspondence to the reference; model components are annotated; they improve identification and retrieval. Models are accepted in various formats, and exported in other formats too.

The models come from individuals, existing model repositories, journals, and direct curation from literature by BioModels curators. Within the individuals category, submitters are members of the SBML community and authors. More than 200 journals *advise* deposition, including all PLoS, BMC, and Nature Mol Sys Bio.

BioModels Database Technical Aspects: Chen Li

The infrastructure of the db includes a set of tomcat application server clusters. MySql databases sit behind these server clusters. There is also a mirror site at Caltech. All models in the BioModels database have to pass through the BioModels pipeline: syntax check, consistency check, divergence to either curated or non-curated branch. When a model is submitted, the db parses it and fetches MIRIAM anntoations. It fetches information from GO, UP, ChEBI, and the taxonomy db, and then added into the model. Exports are available in lots of formats: most of the SBML levels, CellML, XPP-Aut, VCell, SciLab, BioPax. For BioPax and VCell they use a Java converter developed in-house; for CellML, SciLab and XPP they use an XSLT, and to build the PDF they use SBML2Latex. There are also SVG, GIF and various other visualizations available. There is also a link to the JWS online simulator.

They also have a Model of the Month, which is available via the web site or via an RSS feed. THey use AJAX for parts of their web interface: to view a models tree that is created based on the GO hierarchy; an internal-only annotation tool; sub-model generation and more. There is also a nice display of the Mathematical equations. They have a set of web services that are publicly accessible. The source code and database schema are available from sourceforge.

BioModels stores the frozen models: the way the models were when the publications were submitted. They need to correspond exactly to how it was published. However, if a modification was created by the authors and then a new paper made, the new version can then go into the database. If the models don't run, they don't reproduce the published results and therefore aren't MIRIAM compliant. Therefore they remain in the non-curated section of the database.

SBML Converters: Nicolas Rodriguez

They have: Scilab, XPP, CellML 1.0, BioPax Level 2, Dot/SVG, Vcell and PDF. For BioPax the original conversion lost a lot of granularity (physical entity -> species, for example). Now, by making use of the MIRIAM annotation, a more precise characterization can be made (e.g. UniProt annotation implies a protein in biopax, which is more specific than physical entity). For CellML, a new conversion from SBML to CellML is being developed by Andrew Miller, but it is still in the early stages. They're waiting for CellML 1.2 + CellML metadata to make the conversion better. The current SVG and GIF exports are not satisfactory, and they're looking for collaboration with other groups or efforts.

Model Curation and Annotation: Lukas Endler

Within the curated branch, models are: checked for MIRIAM compliance, a curation figure is added, model elements are manually added, and they get a BioModels ID. In the non-curated branch they are only slightly edited by curators, and only publication details and creation details are added. For MIRIAM compliance specifically within BioModels (more restrictive than MIRIAM compliance), the models must be: correctly encoded in a standard format (valid SBML), contain a link to a peer-reviewed journal, the creators' contact details, be able to reproduce the results given in the reference publication, and reflect the structure of the processes and formulas described in the reference publication.

The non-curated branch is valid SBML, but not MIRIAM compliant: cannot reproduce results, the models differ in structure from the publication, or it is not a kinetic model. If it is MIRIAM compliant, then it goes into this branch if the models contain kinetics they do not know how to curate yet (boolean models) or some parts are not encoded in SBML (e.g. spatial information). Another reason it would go here if it is MIRIAM compliant is if there is a significant tailback due to insufficient time and workforce, in which case it will be moved into the curated branch as soon as possible.

The curation guidelines are that they should: read the publication; go through the SBML model and compare all the elements (where possible they create reactions out of differential equations, add names to unnamed reactions, rules and events); change names and IDs to correspond to the article; try to reproduce one or two key results of the reference publication and create a curation result (e.g. a figure or table); add notes; move the model to the curated branch for annotation and publication.

http://biomodels.net/meetings/2008/index.html (Yes, it is the 2009 meeting, even though the URL says "2008").

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. I'm happy to correct any errors you may spot – just let me know!

Read and post comments |
Send to a friend

original

SBML Hackathon 2009: Finished

The SBML Hackathon was a really interesting experience for me. I haven't had much time to collect my thoughts, as we've gone straight on to the next phase: the BioModels Workshop or, for some, the trip home.

This was my first Hackathon, and I found the environment conducive to work and the discussions very interesting. You can follow what's being said and has been said about sbml on the #sbml thread on Twitter, too. There were breakouts, discussions, informal talks, posters, competitions and of course the hacking.

It was a really efficient way of finding out the large amount of interesting research and software development happening in the SBML community. I also met a lot of people who previously have only been names on emails. Further, I think many of us have found the beginnings of interesting collaborations, too.

Despite the hail and the rain today, I think the BioModels workshop will be just as interesting, though the format is slightly different. Here's to the next 2.5 days!

Read and post comments |
Send to a friend

original

SBML Hackathon Day 2

Things changing with SBML Level 3

A complete list is available at http://sbml.org/Community/Wiki/SBML_Level_3_Core/Workplan

These are just the ones I found the most interesting as we went through the whole list.

+ Move species type and compartment type outside of the core. These were used for annotation reasons, but could also do it with the species and compartments using their annotation/RDF sections. If the reason to use it was to group together things for annotation, why just for species and compartments? Why not for all things? In which case, a generic mechanism would be a good thing. Further, the original reason for them was as the first step in a generalized reaction (e.g. automatically generate reactions when all matched species are present in the compartment). If they ever generalize reactions, then they will reintroduce something that works in a similar way as an extension. In summary, what these things do will be done within the new Annotation package that will be part of Level 3.
+ Remove default values on optional attributes and make the necessary adjustments.
+ Introduce an SIdRef/UnitSId type. These types will match the SId / UnitSId, and will allow differentiation between ids that are references and ids that are ids. This is a really good idea, and will help out with the Xpath-based referencing method used in the L3 hierarchical modelling extension.
+ Update the units section
+ Update the reactions section. This improves how stoichiometry is dealt with. Will explain reaction extent, add sections for stoichiometry and conversion factor and remove stoichiometryMath. You cannot show a distinction between targets for optimization and those which aren't. However, this isn't a problem that is strictly for SBML, as "parameter" in SBML means something different.
+ Remove the parts of the spec that belong in a Best Practices document
+ Remove the parts of the explanation of kinetics for multicompartment models

http://sbml.org/Events/Hackathons/The_7th_SBML_Hackathon

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. I'm happy to correct any errors you may spot – just let me know!

Read and post comments |
Send to a friend

original