Categories
Meetings & Conferences

COMBINE 2016 Day 2: SBOL Breakout – Host Context

Design and reality in SynBio Host context and provenance: Neil Wipat, Bryan Bartley

In synthetic biology, you are performing engineering in biology, and it is a combination of wet lab and in silico work. Until now, SBOL has been primarily concerned with the design stage of the process, but SBOL should be able to travel around the entire engineering life cycle, capturing data as it goes. Every data set that is generated throughout the life cycle should be able to be captured within the SBOL structure.

Take as an example the build of a system that has been done as described in the original design, e.g. with the original strain of E.coli. But even if it’s the same design, you’ll get different experiments in different labs, even with the best of intentions – and therefore different experimental data. An SBOL design can be built by many labs and in many ways, in different host contexts. At the moment, SBOL doesn’t capture the difference among these host contexts.

Host context requires information about all of the details of the design – who/what/when/where/why/how, which is why provenance and host context are relevant together. As Bryan mentioned in his talk earlier, characterising a cell during “steady state” can often be subjective and difficult. Measurements of the output of a genetic circuit strongly depends on how well adapted your cells are to the environmental conditions. Further, human error must be taken into account, and it can be necessary to backtrack and check your assumptions. Some components that you’re using may have incomplete QC data.

There was a discussion of the difference between context and provenance: it was decided that the context was like the annotation on the nodes of a graph, and the provenance was how the edges between them were being walked. That is, provenance is how you got to a particular node, and context is about how you would re-create the conditions at that node.

The minimal information for the host context would be placing the host as a type of ModuleDefinition. The Host-specific annotation would be

  • StrainId: reference
  • VendorId: reference
  • TaxonId: reference
  • Genome: reference
  • Genotype: Gnomic string

Gnomic is a machine readable way of representing genotypes (http://github.com/biosustain/gnomic). It was then suggested that we should directly RDFize all of the information contained within Gnomic rather than using a new format that would have to be learnt and parsed. Alternately, use proper ontological terms and reference them with URIs.

PROV-O, the provenance ontology defines 3 core classes: Entity, Activity and Agent. An agent runs an activity to generate one entity from another. Is there an ontology for the activity? Could use something like OBI, but realize that each activity instance is tied to a particular timestamp, and therefore an activity is only done once.

There is a contrasting opinion that the important thing is that an activity can be reused, and therefore there should be a class/definition for each activity which gets instantiated at particular times.

The proposal suggests that all sbol2:identified types be potentially annotated with provenance information. As such, the following additional classes should be added: prov:Derivation, prov:Activity, prov:Agent, prov:Association, prov:Usage. (Though I definitely saw a prov:role in one of the examples.)

Categories
Meetings & Conferences

COMBINE 2016 Day 2: Version and Variant Control for Synthetic Biology

COMBINE 2016

Bryan Bartley

Synthetic biology, as with many projects, gets complex quickly and could be improved through the use of versioning systems. SBOL currently supports versioning of designs, but not constructs. Further, the versioning for synthetic biology needs to track provenance and contextual information. But how do we approach versioning in biological systems? In biology, branching tends to be how its done (constructing in parallel). Feature branches are much more the rule in biology than successive commits.

Variant Control is based on phylogenetic analysis of DNA sequences. (Scoring matrix -> multiple sequence alignment -> pairwise distance matrix -> phylogenetic tree). In Variant Control, the composition of genetic circuits are encoded as sequences. Then you can do a MSA on these sequences of circuits, performing a parts-based phylogenetic analysis. From this, you get a tree of variants.

Next, add semantic annotations to score the alignments. Going up the hierarchy to reach a common SO term creates a penalty score. Variant control clusters similar designs by both sequence and functional similarity (e.g. repressors together).

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences

COMBINE 2016 Day 2: How to Remember and Revisit Many Genetic Design Variants Automatically

COMBINE 2016

Nicholas Roehner

In other words, a version control system for variations on genetic design.

A 4-gene cluster can be encoded (even with just a library of 16 parts) over 684,000 variants. Clearly, a GenBank files are not appropriate here. Their solution is Knox, where the genetic design space is only about 200k, rather than gigabytes. This “genetic design space” is a format where each edge is labelled with a *set* of parts, from which you can create paths. Design spaces can be concatenated via graph operations using Knox, merged in a variety of different ways.

If you build up a series of these operations, you can then create a Very Large Things. A single design would encode all of the various paths. These design spaces can be stored, and versioned, like is done with git. Combining design spaces in Knox also merges version histories. You can also branch a design space, giving you two different versions to work with. Reversion is also supported.

There is a RESTful API to allow connection between the web application and the graph database. Finch and Eugene are two products which use Knox. In Finch, you can encode variable length designs as it uses regular expressions. This makes it more machine-comparable and mergeable. This can make it harder for humans though, which is where Eugene is beneficial, as it is a more human readable and writeable language, though it is less expressive than Finch and has a fixed design length.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences

COMBINE 2016 Day 2: A new thermodynamics-based approach to aid the design of natural-product biosynthetic pathways

COMBINE 2016

Hiroyuki Kuwahara

The design of biosynthetic systems involves a large search space, therefore it is essential to have a computational tool to predict productive pathways to aid in that design. There are a number of pre-existing approaches including flux-based analysis based (host often limited to e.coli), reaction count-based, and thermodynamic favorability based (but the effects of competing reactions cannot be captured, and ranking doesn’t depend on the host’s metabolic system). They wanted to be able, given a starting material, a target product, and a host organism, to find promising biosynthetic routes by allowing the introduction of foreign metabolic enzymes into the host.

They have a host-dependent weighting scheme in which the ranking of pathways based on this can be widely different from the thermodynamic favorability approach. They first compute the weight for each edge in the function, such that they can have different weights even if the energy value is identical. In this way, you can include in the model additional further steps that may lower otherwise high-scoring reactions if their routes lead to undesirable consequences.

They have also developed SBOLme.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences

COMBINE 2016 Day 2: Data Integration and Mining for Synthetic Biology Design

COMBINE 2016

Goksel Misirli

How can we use ontologies to facilitate synthetic biology? Engineering biological systems is challenging, and integrating the data about them is even more so. Information may be spread out in different databases, different formats, and different semantics. This information should be integrated to inform and constrain biological design. Therefore onto Gruber, and his “specification of a conceptualization” definition of ontologies. Ontologies are useful for capturing different relationships between biological parts and to facilitate data mining. They are already used widely in bioinformatics, including GO, SO, SBO, SBOL etc.

They have created the Synthetic Biology Ontology (SyBiOnt), available at http://w3id.org/synbio/ont. The SyBiOnt knowledgebase includes information about sequences, annotations, metabolic pathways, gene regulatory networks, protein-protein interactions, and gene expression. Once the KB was built, you examine it via a set of competency questions. For example, which parts can be used as inducible promoters? When an appropriate query was run, 51 promoters were classified as inducible within the KB.

They also performed an automatic identification of biological parts, and classified according to activator sites, repressor sites, inducible promoters, repressible promotors, SigA promoters, SigB promoters, constitutive promoters, repressor encoding CDSs, activator encoding CDSs, response regulator encoding CDs and more.

There were many other competency questions that could be, and were, asked.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences

COMBINE 2016 Day 2: Creating a SynBio Software Commons

COMBINE 2016

 

Curtis Madsen & Nicholas Roehner

Nona was created to address an issue with academic software concerning the software development cycle: built, develop, publish and then get lost as people move around in academia. However, academics can work with Nona to get feedback and develop a community which can help with the maintenance process of your software.

How do you participate? http://nonasoftware.org and browse currently-available software. Software is broken down into specification, design, data management and integration types. You can transfer the software to Nona and have them host it, or you can host the software and they will provide a link to both the homepage and the github or similar repository.

When you’re ready to submit software to Nona, you start by choosing a license (to work with Nona, you must have an Open Source license). Then you provide a link to the github repo (or simply give a tarball to Nona, who will put it on github). Nona will provide promotional materials, FAQs, forums etc for your software.

In February 2017 there will be a 2 1/2 day hackathon (Nona Works) where they bring together biologists and computer scientists.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences

COMBINE 2016 Day 2: cy3sbml

COMBINE 2016

Matthias Koenig

Cytoscape is an open-source platform for visualizing networks. cy3sbml visualizes SBML information within the network context. It should visualize computational models and simulations which seamlessly integrate with computational modeling workflows and frameworks. cy3sbml is not a model builder, simulator or analysis tool. Accepted formats are SBML, OMEX, ResearchObjects, ODF, Cytoscape session files.You can import via file or URL, it has batch support and dedicated web services for BioModels.

In the networks, nodes correspond mainly to SBase objects and edges to the links between them. It works with models anywhere from small to genome-scale in size. There are multiple views (full networks, kinetic and base networks). They support annotations to retrieve information about respective SBMLOBject for each thing. The information can also be exported as RDF. There is also validation, and it produces a tabular validation report.

The data can be mapped to networks via node and edge attributes (eg via sid or metaid) and can be imported via CSV. Programmatic access is via the REST API via cyREST. It integrates with other applications, e.g. cy3sabiork for pulling kinetic information from SABIO-RK, and cyfluxviz for visualizations of FluxDistributions.

He then showed us a live demo.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences

COMBINE 2016 Day 2: pathwayDesigner

COMBINE 2016

Herbert Sauro

When pathwayDesigner was first written, there was no libSML, and his original parser remained inside his code for a long time – until this past summer, to be specific! There is also a direct link to libRoadRunner as he would like to make this into a realtime simulator at some point. There are now some new node styles, and has been doing some work on plugins. There are about 8 or 9 plugins to date, including parameter scanning, MCA (sensitivity) plugin appropriated from copasi, sliders, arrow designer. There is also an AutoLayout library in C++ which uses these the same algorithm as the original layout method in SBW. It also has Python bindings. Included with it is a test plugin which generates random networks so you can play with the layout options.

There is also a feature where you can generate Splines. There is also an antimony plugin which allows you to load in a network as text and it will be displayed in pathwayDesigner. There is even a Mac version in alpha available now. Within the next year, he’d like to finish the Mac version, the python plugin, the layout functionality, support for alias nodes and perhaps render extension. Longer term, he’d like to focus on that realtime simulator.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences

COMBINE 2016 Day 2: CellML, PMR and Osmium

David Nickerson, University of Auckland

Apologies to David, my train was late and I missed the beginning of this talk. PMR2 facilitated model exchange directly between modellers. PMR2 allows the use of workspaces, which users can create themselves and can store any kind of data. It uses git for version control. Embedded workspaces allow modularity and reuse, and relative references facilitate sharing and archiving – and combined archives.

Sitting on top of the idea of workspace are exposures, which is a presentation view of a workspace. It has plugins for various types of data and indexes metadata. This allows lovely rendering of the content, and allows the creation of custom plugins.

Most recently, they’re moving to Osmium (http://osmium.readthedocs.io). At its core is the refactored PMR2 stack which is now called RepoDono. Here, they are removing the distinction between workspaces and exposures. This includes calmjs as well.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences Semantics and Ontologies

UKON 2016: Identifying Basic Level Entities in a Data Graph

These are my notes for the Marwan Al-Tawil, Vania Dimitrova, Dhaval Thakker, Brandon Bennett talk at the UK Ontology Network Meeting on 14 April, 2016.

Source https://lh3.googleusercontent.com/-Ql7RFDSgvxQ/AAAAAAAAAAI/AAAAAAAAAFA/pnoDTCze85Q/s120-c/photo.jpg 14 April 2016

What makes some entities more important, and provide better paths? He observed in his study that central entities with many subclasses are good potential anchors, recognition is a key enabler for knowledge expansion, and encourage connections to discover new entities linked to recognized ones. How can we develop automatic ways to identify knowledge anchors? Category objects (commonly-used objects from daily life) carry the most information, possess the highest category cue validity and are, therefore, the most differentiated from one another.

They have two approaches: distinctiveness (identifies the most differentiated entities whose cues link to its members, and not to other entities) and homogeneity. Distinctiveness metrics were adopted from formal concept analysis, and applied to the ontology. The homogeneity metrics were created with set-based similarity metrics.

Experiment and evaluation. Images of all taxonomical entities linked via subClassOf were presented in 10 different surveys. Benchmarking sets were used to determine accuracy and frequency. Questions were of two types: accurate naming of a category entity (parent) when a leaf entity is seen, and accurate naming with its exact name, child or parent.

When analysing the data, they found that precision values were poor. Inspecting the false positives, they noticed two reasons: picking entities with a low number of subclasses, and returning FP entities which had long label names.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!