Categories
Meetings & Conferences Uncategorized

The ZBIT Systems Biology Software and Web Service Collection

COMBINE 2016

Andreas Draeger

In systems biology, people want to perform dynamic simulations, steady-state analyses and others. SBML is the format to use for the model, but you also need a data structure for use in the software, and as such they developed jSBML.

People build models from KEGG, textbooks and more. They try to rebuild KEGG diagrams in CellDesigner, which is very time consuming. Is there a better way to do this? And, indeed, there are even difficulties with this manual method, as some reaction participants present when you study the record aren’t visible in the associated diagram (e.g. the addition of ATP), which can cause issues for novices. Therefore they developed KEGGtranslator to convert KEGG pathways to various file formats. Another way to add a data source to your model is through BioPAX2SBML. Additionally, they’ve created ModelPolisher which can augment models with information from the BiGG database, which is available as a command-line tool and as a web service. For dynamic simulation, they have a tool called SBMLSquezer, which generates kinetic equations automatically and also reads information from SABIO-RK.

This system was applied to all networks in KEGG. They use SBMLsimulator to run the simulations. They’ve developed a documentation system called SBML2LaTeX which helps people document their models.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences

COMBINE 2016 Day 3: SigNetSim, A web-based framework for designing kinetic models of molecular signaling networks

COMBINE 2016

 

Vincent Noel

He was asked to develop a web tool which would be easy for biologists and students, but which could use a parallel simulated annealing algorithm and perform model reduction. He used Python to write the core library and the web interface, with some parts of the library in C. In this software, an SBML model is read in and a symbolic math model is built. It is compatible with SBML up to version of L3V1. The integration is performed using C-generated code, which can be executed in parallel. To perform integration for systems of ODEs or DAEs, the software uses the Sundials library. To perform model fitting, the software uses simulated annealing. It also has some compatibility with Jupyter, mainly to allow the symbolic math model to be able to be worked with directly.

SigNetSim’s web interface uses the Django framework with the Bootstrap front end. There is also a simple DB backend for storing experimental data for these models. The library and web interface will be on github, and the paper should be submitted in the next few months. http://cetics.butantan.gov.br/signetsim

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences

COMBINE 2016 Day 3: Modelling ageing to enhance a healthy lifespan

COMBINE 2016

Daryl Shanley

Age is a major risk factor for chronic disease, and chronic diseases are the major cause of death and disability in the world, estimated at around 70% (WHO 2005). Molecular damage is the underlying factor in all of these (DNA damage (cancer), dementia and more). Ageing results from the accumulation of molecular damage. There is an irreversible accumulation of macromolecular damage, even though we have ameliorating systems such as the antioxidant systems, some damage escapes repair and builds up. Levels of oxidised protein, mutational frequency in nuclear DNA and mutational frequency in mDNA all increase exponentially with age. This underlying damage gives rise to cellular senescence. Cells which go into a permanent state of cell cycle arrest are called senescent, and they secrete a number of chemicals into the surrounding environment. The number of these cells increase with age. If you remove these senescent cells (e.g. from mice) there is a definite survival enhancement, though we don’t really understand why. So, although overall there aren’t many of them, they do seem to have quite an impact.

The good news is that there is plasticity in ageing. For instance, caloric restriction in mice does allow them to live longer (almost double). In part, this is due to them overeating if they’re allowed to free eat, but these undernourished mice aren’t healthy – they’re infertile, for example (it’s not a “natural state”). Mutations that bring longer life are in genes associated with nutrition – they’re signalling to the organism that there is less food available. This signal is somehow reducing molecular damage. However, it’s hard to test this in humans…

If we build models of known mechanisms, we can explore interventions, and with known interventions we can explore mechanisms. With a lot of background information, we can use the models to optimise synergy/antagonism, dose and timing. Ageing is caused by multiple mechanisms, and most damage increases exponentially – can the cycle be slowed or broken – there is an implication of positive feedback.

After existing knowledge and data has been used to create a calibrated model, then we perform sensitivity analysis and validate the model. Once all that has been done, then you can start using the model to make the predictions you’d like to see. It’s a long journey for a single model! They’ve created a set of Python modules for COPASI called PyCoTools, which allows you to compare models by generating other alternative models based on a starting model.

They are using a systems approach to model the development of the senescent phenotype with a view to find interventions to prevent progression and reverse the phenotype. They’d already been working on the processes involved in this with earlier models of insulin signalling, stress response, DNA damage, mitochondrial dynamics and ROSs.

Bringing all of these models together into an integrative dynamic model for cellular senescence is just the first task; they also needed to create an independent in vitro data set for estimating the integrated model parameters. This data was then used to fit their model. They had to infer what was going on inside the mitochondria, by inferring the internal states for ‘new’ and ‘old’ mitochondria. Then the model was used to make interventions for improving mito function and its phenotype, especially via combinations that would be difficult to perform in the lab.

If you reduce ROS in the model, it has an impact on the entire network. The results can be used to inform later experimental designs. Then there was in vitro confirmation of increased mitochondrial membrane potential during ROS inhibition. The model matched initially, but at a later date it diverged from the lab. When you go back and look at the cells, you find that there was very little movement among the senescent cells, which hampers autophagy. This is why the autophagy/mitophagy was predicted in the model, but wasn’t being seen in the lab. It’s a quality of the senescent cell which is blocking their removal from the cell. Mitochondrial dynamics are reduced over time, driven by an inability to remove the network of dysfunctional mitochondria.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences

COMBINE 2016 Day 3: From Grassroots community standards to ISO Standards

COMBINE 2016

Martin Golebiewski

You need standards at every stage of the systems biology life cycle. These standards need to work together, be interoperable. From modelling to simulation, to experimental data and back again – there are standards for each step. There are a large number of community standards for the life sciences, in many different subdomains (he references biosharing.org here.)

This presence of many standards for different domains creates quite a lot of overlap, which can cause issues. Even within a single domain, it is normal to see different standards for different purposes, e.g. for the model description and the simulation of the model, and the results of the simulation etc. The way in which the synbio and sysbio standards interrelate is complex.

In COMBINE, there are the official standards, the associated standardization efforts, and the related standardization efforts. The tasks in COMBINE for the board and the whole community are to: organize concerted meetings (COMBINE and HARMONY), training events for the application of the standards, coordinate standards development, develop common procedures and tools (such as the COMBINE archive) and provide a recognized voice.

A similar approach, but with a broader focus, is the European CHARME network, which has been created to harmonize standardization strategies to increase efficiency and competitiveness of European life-science research. This funds networking action for five years from March 2016. See http://www.cost-charme.eu.  There are 5 working groups within CHARME. WG2 involves innovation transfer, to have more involvement with industry.

NormSys is intended to bring together standards developers, research initiatives, publishers, industry, journals, funders, and standardization bodies. How should standards be published and distributed? How do we convince communities to apply standards, and how do we certify the implementation of standards? There is a nice matrix of the standards they are dealing with at http://normsys.h-its.org/biological/application/matrix.

NormSys is meant to be a bridge builder between research communities, industry and standardization bodies. There are actually a very large number of standardization bodies worldwide. ISO is the world’s largest developer of voluntary international standards. Anything that comes from ISO has to come out of a consensus of 164 national standards bodies, therefore finding such a consensus within ISO can be tricky. Most of the experts involved in the ISO standards are doing it voluntarily, or through dedicated non-ISO projects which fund it.

Within ISO, there are technical committees. These TCs might have further subgroups or working groups. There can also be national groups which have mirror committees, and then delegates from these committees are sent to the international committee meetings. The timeline for the full 6 stages of standard development with ISO can be around 36 months. However, this doesn’t include any of the preliminary work that needs to happen before the official stages begin.

There are three main ISO document types: IS (International standard), TS (Technical specification) and TR (Technical Report). Most relevant for us here is the ISO TC 276 for Biotechnology. Its scope is the standardization in the field of biotechnology processes that include the following: terms and definitions, biobanks and bioresources, analytical methods, bioprocessing, data processing including annotation, analysis, validation, comparability and integration, and finally meterology.

There are 5 WG for this TC: yerminology, biobanks, analytical methods, bioprocessing, and finally data processing and integration (WG5). ISO/IEC JTC 1/SC 29 involves the coding of audio, picture, multimedia and hypermedia information (this includes genome compression). ISO TC 276 WG5 was established in April 2015, and there are 60 experts from 13 countries. He says the next meeting is in Dublin, and there is still scope for people to join and help in this effort.

They’ve been working on standards for data collection, structuring and handling during deposition, preservation and distribution of microbes, recommended MI data set for data publication. One of the most important tasks of WG5 is the standardization of genome compression. This was identified as a need from the MPEG consortium.

The biggest deal for COMBINE is the focus on developing an ISO standard for applying and connecting community modelling standards. “Downstream data processing and integration workflows – minimal requirements for downstream data processing and integration workflows for interfacing and linking heterogeneous data, models and corresponding metadata.”

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences

COMBINE 2016 Day 2: SBOL Breakout – Host Context

Design and reality in SynBio Host context and provenance: Neil Wipat, Bryan Bartley

In synthetic biology, you are performing engineering in biology, and it is a combination of wet lab and in silico work. Until now, SBOL has been primarily concerned with the design stage of the process, but SBOL should be able to travel around the entire engineering life cycle, capturing data as it goes. Every data set that is generated throughout the life cycle should be able to be captured within the SBOL structure.

Take as an example the build of a system that has been done as described in the original design, e.g. with the original strain of E.coli. But even if it’s the same design, you’ll get different experiments in different labs, even with the best of intentions – and therefore different experimental data. An SBOL design can be built by many labs and in many ways, in different host contexts. At the moment, SBOL doesn’t capture the difference among these host contexts.

Host context requires information about all of the details of the design – who/what/when/where/why/how, which is why provenance and host context are relevant together. As Bryan mentioned in his talk earlier, characterising a cell during “steady state” can often be subjective and difficult. Measurements of the output of a genetic circuit strongly depends on how well adapted your cells are to the environmental conditions. Further, human error must be taken into account, and it can be necessary to backtrack and check your assumptions. Some components that you’re using may have incomplete QC data.

There was a discussion of the difference between context and provenance: it was decided that the context was like the annotation on the nodes of a graph, and the provenance was how the edges between them were being walked. That is, provenance is how you got to a particular node, and context is about how you would re-create the conditions at that node.

The minimal information for the host context would be placing the host as a type of ModuleDefinition. The Host-specific annotation would be

  • StrainId: reference
  • VendorId: reference
  • TaxonId: reference
  • Genome: reference
  • Genotype: Gnomic string

Gnomic is a machine readable way of representing genotypes (http://github.com/biosustain/gnomic). It was then suggested that we should directly RDFize all of the information contained within Gnomic rather than using a new format that would have to be learnt and parsed. Alternately, use proper ontological terms and reference them with URIs.

PROV-O, the provenance ontology defines 3 core classes: Entity, Activity and Agent. An agent runs an activity to generate one entity from another. Is there an ontology for the activity? Could use something like OBI, but realize that each activity instance is tied to a particular timestamp, and therefore an activity is only done once.

There is a contrasting opinion that the important thing is that an activity can be reused, and therefore there should be a class/definition for each activity which gets instantiated at particular times.

The proposal suggests that all sbol2:identified types be potentially annotated with provenance information. As such, the following additional classes should be added: prov:Derivation, prov:Activity, prov:Agent, prov:Association, prov:Usage. (Though I definitely saw a prov:role in one of the examples.)

Categories
Meetings & Conferences

COMBINE 2016 Day 2: Version and Variant Control for Synthetic Biology

COMBINE 2016

Bryan Bartley

Synthetic biology, as with many projects, gets complex quickly and could be improved through the use of versioning systems. SBOL currently supports versioning of designs, but not constructs. Further, the versioning for synthetic biology needs to track provenance and contextual information. But how do we approach versioning in biological systems? In biology, branching tends to be how its done (constructing in parallel). Feature branches are much more the rule in biology than successive commits.

Variant Control is based on phylogenetic analysis of DNA sequences. (Scoring matrix -> multiple sequence alignment -> pairwise distance matrix -> phylogenetic tree). In Variant Control, the composition of genetic circuits are encoded as sequences. Then you can do a MSA on these sequences of circuits, performing a parts-based phylogenetic analysis. From this, you get a tree of variants.

Next, add semantic annotations to score the alignments. Going up the hierarchy to reach a common SO term creates a penalty score. Variant control clusters similar designs by both sequence and functional similarity (e.g. repressors together).

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences

COMBINE 2016 Day 2: How to Remember and Revisit Many Genetic Design Variants Automatically

COMBINE 2016

Nicholas Roehner

In other words, a version control system for variations on genetic design.

A 4-gene cluster can be encoded (even with just a library of 16 parts) over 684,000 variants. Clearly, a GenBank files are not appropriate here. Their solution is Knox, where the genetic design space is only about 200k, rather than gigabytes. This “genetic design space” is a format where each edge is labelled with a *set* of parts, from which you can create paths. Design spaces can be concatenated via graph operations using Knox, merged in a variety of different ways.

If you build up a series of these operations, you can then create a Very Large Things. A single design would encode all of the various paths. These design spaces can be stored, and versioned, like is done with git. Combining design spaces in Knox also merges version histories. You can also branch a design space, giving you two different versions to work with. Reversion is also supported.

There is a RESTful API to allow connection between the web application and the graph database. Finch and Eugene are two products which use Knox. In Finch, you can encode variable length designs as it uses regular expressions. This makes it more machine-comparable and mergeable. This can make it harder for humans though, which is where Eugene is beneficial, as it is a more human readable and writeable language, though it is less expressive than Finch and has a fixed design length.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences

COMBINE 2016 Day 2: A new thermodynamics-based approach to aid the design of natural-product biosynthetic pathways

COMBINE 2016

Hiroyuki Kuwahara

The design of biosynthetic systems involves a large search space, therefore it is essential to have a computational tool to predict productive pathways to aid in that design. There are a number of pre-existing approaches including flux-based analysis based (host often limited to e.coli), reaction count-based, and thermodynamic favorability based (but the effects of competing reactions cannot be captured, and ranking doesn’t depend on the host’s metabolic system). They wanted to be able, given a starting material, a target product, and a host organism, to find promising biosynthetic routes by allowing the introduction of foreign metabolic enzymes into the host.

They have a host-dependent weighting scheme in which the ranking of pathways based on this can be widely different from the thermodynamic favorability approach. They first compute the weight for each edge in the function, such that they can have different weights even if the energy value is identical. In this way, you can include in the model additional further steps that may lower otherwise high-scoring reactions if their routes lead to undesirable consequences.

They have also developed SBOLme.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences

COMBINE 2016 Day 2: Data Integration and Mining for Synthetic Biology Design

COMBINE 2016

Goksel Misirli

How can we use ontologies to facilitate synthetic biology? Engineering biological systems is challenging, and integrating the data about them is even more so. Information may be spread out in different databases, different formats, and different semantics. This information should be integrated to inform and constrain biological design. Therefore onto Gruber, and his “specification of a conceptualization” definition of ontologies. Ontologies are useful for capturing different relationships between biological parts and to facilitate data mining. They are already used widely in bioinformatics, including GO, SO, SBO, SBOL etc.

They have created the Synthetic Biology Ontology (SyBiOnt), available at http://w3id.org/synbio/ont. The SyBiOnt knowledgebase includes information about sequences, annotations, metabolic pathways, gene regulatory networks, protein-protein interactions, and gene expression. Once the KB was built, you examine it via a set of competency questions. For example, which parts can be used as inducible promoters? When an appropriate query was run, 51 promoters were classified as inducible within the KB.

They also performed an automatic identification of biological parts, and classified according to activator sites, repressor sites, inducible promoters, repressible promotors, SigA promoters, SigB promoters, constitutive promoters, repressor encoding CDSs, activator encoding CDSs, response regulator encoding CDs and more.

There were many other competency questions that could be, and were, asked.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences

COMBINE 2016 Day 2: Creating a SynBio Software Commons

COMBINE 2016

 

Curtis Madsen & Nicholas Roehner

Nona was created to address an issue with academic software concerning the software development cycle: built, develop, publish and then get lost as people move around in academia. However, academics can work with Nona to get feedback and develop a community which can help with the maintenance process of your software.

How do you participate? http://nonasoftware.org and browse currently-available software. Software is broken down into specification, design, data management and integration types. You can transfer the software to Nona and have them host it, or you can host the software and they will provide a link to both the homepage and the github or similar repository.

When you’re ready to submit software to Nona, you start by choosing a license (to work with Nona, you must have an Open Source license). Then you provide a link to the github repo (or simply give a tarball to Nona, who will put it on github). Nona will provide promotional materials, FAQs, forums etc for your software.

In February 2017 there will be a 2 1/2 day hackathon (Nona Works) where they bring together biologists and computer scientists.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!