The NormSys registry for modeling standards in systems and synthetic biology

COMBINE 2016

Martin Golebiewski

http://normsys.h-its.org . NormSys covers the COMBINE standards, but they have plans to extend it to further modelling standards. Each standard has multiple versions/levels, and trying to figure out which standard you need to use can be tricky. Normsys provides a summary of each standard as well as a matrix summarizing each of the biological applications that are relevant in this community. Each standard has a detailed listing of what it supports and what it doesn’t with respect to annotation, supported math, unit support, multiscale models, and more.

There are also links to the specification and webpage for the standard as well as publications and model repository records. They also have information on how a given standard may be transformed into other standards. Information on related software is also available. Additional matrices describe what formats are available as input and output wrt format transformation.

NormSys is an information resource for community standards in systems biology. It provides a comparison of their main characteristics and features, and classifies them by fields of application (with examples). Transformation between standards is available, as well as bundled links to corresponding web resources, direct links to validation tools, and faceted browsing for searching with different criteria. The initial focus is on commonly-used standards (e.g. COMBINE) and related efforts.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

SED-ML support in JWS Online

COMBINE 2016

Martin Peters

In an ideal world, you should be able to reproduce any simulation published in a paper. This does happen for some models. You have a website which links the paper from JWS and data from the FAIRDOM hub. Then you can tweak the parameters of a published model and see how the results change. This means that there is a SED-ML database as part of JWS online. Once you’ve made your modifications, you can then re-submit the model back to JWS.

You can also export the COMBINE archive you’ve created in the course of this work and take it away to do more simulations locally. Currently, only time-course simulation is supported (to keep the computational time as low as possible). Excel spreadsheets are used instead of NuML. Further, there is support only for 2d plots. However, they have achieved their goal of being able to go from a paper to the actual simulation of a model from that paper in one click.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

The ZBIT Systems Biology Software and Web Service Collection

COMBINE 2016

Andreas Draeger

In systems biology, people want to perform dynamic simulations, steady-state analyses and others. SBML is the format to use for the model, but you also need a data structure for use in the software, and as such they developed jSBML.

People build models from KEGG, textbooks and more. They try to rebuild KEGG diagrams in CellDesigner, which is very time consuming. Is there a better way to do this? And, indeed, there are even difficulties with this manual method, as some reaction participants present when you study the record aren’t visible in the associated diagram (e.g. the addition of ATP), which can cause issues for novices. Therefore they developed KEGGtranslator to convert KEGG pathways to various file formats. Another way to add a data source to your model is through BioPAX2SBML. Additionally, they’ve created ModelPolisher which can augment models with information from the BiGG database, which is available as a command-line tool and as a web service. For dynamic simulation, they have a tool called SBMLSquezer, which generates kinetic equations automatically and also reads information from SABIO-RK.

This system was applied to all networks in KEGG. They use SBMLsimulator to run the simulations. They’ve developed a documentation system called SBML2LaTeX which helps people document their models.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

COMBINE 2016 Day 3: SigNetSim, A web-based framework for designing kinetic models of molecular signaling networks

COMBINE 2016

 

Vincent Noel

He was asked to develop a web tool which would be easy for biologists and students, but which could use a parallel simulated annealing algorithm and perform model reduction. He used Python to write the core library and the web interface, with some parts of the library in C. In this software, an SBML model is read in and a symbolic math model is built. It is compatible with SBML up to version of L3V1. The integration is performed using C-generated code, which can be executed in parallel. To perform integration for systems of ODEs or DAEs, the software uses the Sundials library. To perform model fitting, the software uses simulated annealing. It also has some compatibility with Jupyter, mainly to allow the symbolic math model to be able to be worked with directly.

SigNetSim’s web interface uses the Django framework with the Bootstrap front end. There is also a simple DB backend for storing experimental data for these models. The library and web interface will be on github, and the paper should be submitted in the next few months. http://cetics.butantan.gov.br/signetsim

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

COMBINE 2016 Day 3: Modelling ageing to enhance a healthy lifespan

COMBINE 2016

Daryl Shanley

Age is a major risk factor for chronic disease, and chronic diseases are the major cause of death and disability in the world, estimated at around 70% (WHO 2005). Molecular damage is the underlying factor in all of these (DNA damage (cancer), dementia and more). Ageing results from the accumulation of molecular damage. There is an irreversible accumulation of macromolecular damage, even though we have ameliorating systems such as the antioxidant systems, some damage escapes repair and builds up. Levels of oxidised protein, mutational frequency in nuclear DNA and mutational frequency in mDNA all increase exponentially with age. This underlying damage gives rise to cellular senescence. Cells which go into a permanent state of cell cycle arrest are called senescent, and they secrete a number of chemicals into the surrounding environment. The number of these cells increase with age. If you remove these senescent cells (e.g. from mice) there is a definite survival enhancement, though we don’t really understand why. So, although overall there aren’t many of them, they do seem to have quite an impact.

The good news is that there is plasticity in ageing. For instance, caloric restriction in mice does allow them to live longer (almost double). In part, this is due to them overeating if they’re allowed to free eat, but these undernourished mice aren’t healthy – they’re infertile, for example (it’s not a “natural state”). Mutations that bring longer life are in genes associated with nutrition – they’re signalling to the organism that there is less food available. This signal is somehow reducing molecular damage. However, it’s hard to test this in humans…

If we build models of known mechanisms, we can explore interventions, and with known interventions we can explore mechanisms. With a lot of background information, we can use the models to optimise synergy/antagonism, dose and timing. Ageing is caused by multiple mechanisms, and most damage increases exponentially – can the cycle be slowed or broken – there is an implication of positive feedback.

After existing knowledge and data has been used to create a calibrated model, then we perform sensitivity analysis and validate the model. Once all that has been done, then you can start using the model to make the predictions you’d like to see. It’s a long journey for a single model! They’ve created a set of Python modules for COPASI called PyCoTools, which allows you to compare models by generating other alternative models based on a starting model.

They are using a systems approach to model the development of the senescent phenotype with a view to find interventions to prevent progression and reverse the phenotype. They’d already been working on the processes involved in this with earlier models of insulin signalling, stress response, DNA damage, mitochondrial dynamics and ROSs.

Bringing all of these models together into an integrative dynamic model for cellular senescence is just the first task; they also needed to create an independent in vitro data set for estimating the integrated model parameters. This data was then used to fit their model. They had to infer what was going on inside the mitochondria, by inferring the internal states for ‘new’ and ‘old’ mitochondria. Then the model was used to make interventions for improving mito function and its phenotype, especially via combinations that would be difficult to perform in the lab.

If you reduce ROS in the model, it has an impact on the entire network. The results can be used to inform later experimental designs. Then there was in vitro confirmation of increased mitochondrial membrane potential during ROS inhibition. The model matched initially, but at a later date it diverged from the lab. When you go back and look at the cells, you find that there was very little movement among the senescent cells, which hampers autophagy. This is why the autophagy/mitophagy was predicted in the model, but wasn’t being seen in the lab. It’s a quality of the senescent cell which is blocking their removal from the cell. Mitochondrial dynamics are reduced over time, driven by an inability to remove the network of dysfunctional mitochondria.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

COMBINE 2016 Day 3: From Grassroots community standards to ISO Standards

COMBINE 2016

Martin Golebiewski

You need standards at every stage of the systems biology life cycle. These standards need to work together, be interoperable. From modelling to simulation, to experimental data and back again – there are standards for each step. There are a large number of community standards for the life sciences, in many different subdomains (he references biosharing.org here.)

This presence of many standards for different domains creates quite a lot of overlap, which can cause issues. Even within a single domain, it is normal to see different standards for different purposes, e.g. for the model description and the simulation of the model, and the results of the simulation etc. The way in which the synbio and sysbio standards interrelate is complex.

In COMBINE, there are the official standards, the associated standardization efforts, and the related standardization efforts. The tasks in COMBINE for the board and the whole community are to: organize concerted meetings (COMBINE and HARMONY), training events for the application of the standards, coordinate standards development, develop common procedures and tools (such as the COMBINE archive) and provide a recognized voice.

A similar approach, but with a broader focus, is the European CHARME network, which has been created to harmonize standardization strategies to increase efficiency and competitiveness of European life-science research. This funds networking action for five years from March 2016. See http://www.cost-charme.eu.  There are 5 working groups within CHARME. WG2 involves innovation transfer, to have more involvement with industry.

NormSys is intended to bring together standards developers, research initiatives, publishers, industry, journals, funders, and standardization bodies. How should standards be published and distributed? How do we convince communities to apply standards, and how do we certify the implementation of standards? There is a nice matrix of the standards they are dealing with at http://normsys.h-its.org/biological/application/matrix.

NormSys is meant to be a bridge builder between research communities, industry and standardization bodies. There are actually a very large number of standardization bodies worldwide. ISO is the world’s largest developer of voluntary international standards. Anything that comes from ISO has to come out of a consensus of 164 national standards bodies, therefore finding such a consensus within ISO can be tricky. Most of the experts involved in the ISO standards are doing it voluntarily, or through dedicated non-ISO projects which fund it.

Within ISO, there are technical committees. These TCs might have further subgroups or working groups. There can also be national groups which have mirror committees, and then delegates from these committees are sent to the international committee meetings. The timeline for the full 6 stages of standard development with ISO can be around 36 months. However, this doesn’t include any of the preliminary work that needs to happen before the official stages begin.

There are three main ISO document types: IS (International standard), TS (Technical specification) and TR (Technical Report). Most relevant for us here is the ISO TC 276 for Biotechnology. Its scope is the standardization in the field of biotechnology processes that include the following: terms and definitions, biobanks and bioresources, analytical methods, bioprocessing, data processing including annotation, analysis, validation, comparability and integration, and finally meterology.

There are 5 WG for this TC: yerminology, biobanks, analytical methods, bioprocessing, and finally data processing and integration (WG5). ISO/IEC JTC 1/SC 29 involves the coding of audio, picture, multimedia and hypermedia information (this includes genome compression). ISO TC 276 WG5 was established in April 2015, and there are 60 experts from 13 countries. He says the next meeting is in Dublin, and there is still scope for people to join and help in this effort.

They’ve been working on standards for data collection, structuring and handling during deposition, preservation and distribution of microbes, recommended MI data set for data publication. One of the most important tasks of WG5 is the standardization of genome compression. This was identified as a need from the MPEG consortium.

The biggest deal for COMBINE is the focus on developing an ISO standard for applying and connecting community modelling standards. “Downstream data processing and integration workflows – minimal requirements for downstream data processing and integration workflows for interfacing and linking heterogeneous data, models and corresponding metadata.”

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

COMBINE 2016 Day 2: SBOL Breakout – Host Context

Design and reality in SynBio Host context and provenance: Neil Wipat, Bryan Bartley

In synthetic biology, you are performing engineering in biology, and it is a combination of wet lab and in silico work. Until now, SBOL has been primarily concerned with the design stage of the process, but SBOL should be able to travel around the entire engineering life cycle, capturing data as it goes. Every data set that is generated throughout the life cycle should be able to be captured within the SBOL structure.

Take as an example the build of a system that has been done as described in the original design, e.g. with the original strain of E.coli. But even if it’s the same design, you’ll get different experiments in different labs, even with the best of intentions – and therefore different experimental data. An SBOL design can be built by many labs and in many ways, in different host contexts. At the moment, SBOL doesn’t capture the difference among these host contexts.

Host context requires information about all of the details of the design – who/what/when/where/why/how, which is why provenance and host context are relevant together. As Bryan mentioned in his talk earlier, characterising a cell during “steady state” can often be subjective and difficult. Measurements of the output of a genetic circuit strongly depends on how well adapted your cells are to the environmental conditions. Further, human error must be taken into account, and it can be necessary to backtrack and check your assumptions. Some components that you’re using may have incomplete QC data.

There was a discussion of the difference between context and provenance: it was decided that the context was like the annotation on the nodes of a graph, and the provenance was how the edges between them were being walked. That is, provenance is how you got to a particular node, and context is about how you would re-create the conditions at that node.

The minimal information for the host context would be placing the host as a type of ModuleDefinition. The Host-specific annotation would be

  • StrainId: reference
  • VendorId: reference
  • TaxonId: reference
  • Genome: reference
  • Genotype: Gnomic string

Gnomic is a machine readable way of representing genotypes (http://github.com/biosustain/gnomic). It was then suggested that we should directly RDFize all of the information contained within Gnomic rather than using a new format that would have to be learnt and parsed. Alternately, use proper ontological terms and reference them with URIs.

PROV-O, the provenance ontology defines 3 core classes: Entity, Activity and Agent. An agent runs an activity to generate one entity from another. Is there an ontology for the activity? Could use something like OBI, but realize that each activity instance is tied to a particular timestamp, and therefore an activity is only done once.

There is a contrasting opinion that the important thing is that an activity can be reused, and therefore there should be a class/definition for each activity which gets instantiated at particular times.

The proposal suggests that all sbol2:identified types be potentially annotated with provenance information. As such, the following additional classes should be added: prov:Derivation, prov:Activity, prov:Agent, prov:Association, prov:Usage. (Though I definitely saw a prov:role in one of the examples.)