COMBINE 2016 Day 2: Version and Variant Control for Synthetic Biology

COMBINE 2016

Bryan Bartley

Synthetic biology, as with many projects, gets complex quickly and could be improved through the use of versioning systems. SBOL currently supports versioning of designs, but not constructs. Further, the versioning for synthetic biology needs to track provenance and contextual information. But how do we approach versioning in biological systems? In biology, branching tends to be how its done (constructing in parallel). Feature branches are much more the rule in biology than successive commits.

Variant Control is based on phylogenetic analysis of DNA sequences. (Scoring matrix -> multiple sequence alignment -> pairwise distance matrix -> phylogenetic tree). In Variant Control, the composition of genetic circuits are encoded as sequences. Then you can do a MSA on these sequences of circuits, performing a parts-based phylogenetic analysis. From this, you get a tree of variants.

Next, add semantic annotations to score the alignments. Going up the hierarchy to reach a common SO term creates a penalty score. Variant control clusters similar designs by both sequence and functional similarity (e.g. repressors together).

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

COMBINE 2016 Day 2: How to Remember and Revisit Many Genetic Design Variants Automatically

COMBINE 2016

Nicholas Roehner

In other words, a version control system for variations on genetic design.

A 4-gene cluster can be encoded (even with just a library of 16 parts) over 684,000 variants. Clearly, a GenBank files are not appropriate here. Their solution is Knox, where the genetic design space is only about 200k, rather than gigabytes. This “genetic design space” is a format where each edge is labelled with a *set* of parts, from which you can create paths. Design spaces can be concatenated via graph operations using Knox, merged in a variety of different ways.

If you build up a series of these operations, you can then create a Very Large Things. A single design would encode all of the various paths. These design spaces can be stored, and versioned, like is done with git. Combining design spaces in Knox also merges version histories. You can also branch a design space, giving you two different versions to work with. Reversion is also supported.

There is a RESTful API to allow connection between the web application and the graph database. Finch and Eugene are two products which use Knox. In Finch, you can encode variable length designs as it uses regular expressions. This makes it more machine-comparable and mergeable. This can make it harder for humans though, which is where Eugene is beneficial, as it is a more human readable and writeable language, though it is less expressive than Finch and has a fixed design length.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

COMBINE 2016 Day 2: A new thermodynamics-based approach to aid the design of natural-product biosynthetic pathways

COMBINE 2016

Hiroyuki Kuwahara

The design of biosynthetic systems involves a large search space, therefore it is essential to have a computational tool to predict productive pathways to aid in that design. There are a number of pre-existing approaches including flux-based analysis based (host often limited to e.coli), reaction count-based, and thermodynamic favorability based (but the effects of competing reactions cannot be captured, and ranking doesn’t depend on the host’s metabolic system). They wanted to be able, given a starting material, a target product, and a host organism, to find promising biosynthetic routes by allowing the introduction of foreign metabolic enzymes into the host.

They have a host-dependent weighting scheme in which the ranking of pathways based on this can be widely different from the thermodynamic favorability approach. They first compute the weight for each edge in the function, such that they can have different weights even if the energy value is identical. In this way, you can include in the model additional further steps that may lower otherwise high-scoring reactions if their routes lead to undesirable consequences.

They have also developed SBOLme.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

COMBINE 2016 Day 2: Data Integration and Mining for Synthetic Biology Design

COMBINE 2016

Goksel Misirli

How can we use ontologies to facilitate synthetic biology? Engineering biological systems is challenging, and integrating the data about them is even more so. Information may be spread out in different databases, different formats, and different semantics. This information should be integrated to inform and constrain biological design. Therefore onto Gruber, and his “specification of a conceptualization” definition of ontologies. Ontologies are useful for capturing different relationships between biological parts and to facilitate data mining. They are already used widely in bioinformatics, including GO, SO, SBO, SBOL etc.

They have created the Synthetic Biology Ontology (SyBiOnt), available at http://w3id.org/synbio/ont. The SyBiOnt knowledgebase includes information about sequences, annotations, metabolic pathways, gene regulatory networks, protein-protein interactions, and gene expression. Once the KB was built, you examine it via a set of competency questions. For example, which parts can be used as inducible promoters? When an appropriate query was run, 51 promoters were classified as inducible within the KB.

They also performed an automatic identification of biological parts, and classified according to activator sites, repressor sites, inducible promoters, repressible promotors, SigA promoters, SigB promoters, constitutive promoters, repressor encoding CDSs, activator encoding CDSs, response regulator encoding CDs and more.

There were many other competency questions that could be, and were, asked.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

COMBINE 2016 Day 2: Creating a SynBio Software Commons

COMBINE 2016

 

Curtis Madsen & Nicholas Roehner

Nona was created to address an issue with academic software concerning the software development cycle: built, develop, publish and then get lost as people move around in academia. However, academics can work with Nona to get feedback and develop a community which can help with the maintenance process of your software.

How do you participate? http://nonasoftware.org and browse currently-available software. Software is broken down into specification, design, data management and integration types. You can transfer the software to Nona and have them host it, or you can host the software and they will provide a link to both the homepage and the github or similar repository.

When you’re ready to submit software to Nona, you start by choosing a license (to work with Nona, you must have an Open Source license). Then you provide a link to the github repo (or simply give a tarball to Nona, who will put it on github). Nona will provide promotional materials, FAQs, forums etc for your software.

In February 2017 there will be a 2 1/2 day hackathon (Nona Works) where they bring together biologists and computer scientists.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

COMBINE 2016 Day 2: cy3sbml

COMBINE 2016

Matthias Koenig

Cytoscape is an open-source platform for visualizing networks. cy3sbml visualizes SBML information within the network context. It should visualize computational models and simulations which seamlessly integrate with computational modeling workflows and frameworks. cy3sbml is not a model builder, simulator or analysis tool. Accepted formats are SBML, OMEX, ResearchObjects, ODF, Cytoscape session files.You can import via file or URL, it has batch support and dedicated web services for BioModels.

In the networks, nodes correspond mainly to SBase objects and edges to the links between them. It works with models anywhere from small to genome-scale in size. There are multiple views (full networks, kinetic and base networks). They support annotations to retrieve information about respective SBMLOBject for each thing. The information can also be exported as RDF. There is also validation, and it produces a tabular validation report.

The data can be mapped to networks via node and edge attributes (eg via sid or metaid) and can be imported via CSV. Programmatic access is via the REST API via cyREST. It integrates with other applications, e.g. cy3sabiork for pulling kinetic information from SABIO-RK, and cyfluxviz for visualizations of FluxDistributions.

He then showed us a live demo.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

COMBINE 2016 Day 2: pathwayDesigner

COMBINE 2016

Herbert Sauro

When pathwayDesigner was first written, there was no libSML, and his original parser remained inside his code for a long time – until this past summer, to be specific! There is also a direct link to libRoadRunner as he would like to make this into a realtime simulator at some point. There are now some new node styles, and has been doing some work on plugins. There are about 8 or 9 plugins to date, including parameter scanning, MCA (sensitivity) plugin appropriated from copasi, sliders, arrow designer. There is also an AutoLayout library in C++ which uses these the same algorithm as the original layout method in SBW. It also has Python bindings. Included with it is a test plugin which generates random networks so you can play with the layout options.

There is also a feature where you can generate Splines. There is also an antimony plugin which allows you to load in a network as text and it will be displayed in pathwayDesigner. There is even a Mac version in alpha available now. Within the next year, he’d like to finish the Mac version, the python plugin, the layout functionality, support for alias nodes and perhaps render extension. Longer term, he’d like to focus on that realtime simulator.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!