What should you think about when you think about standards?
The creation of a new standard is very exciting (yes, really). You can easily get caught up in the fun of the moment, and just start creating requirements and minimal checklists and formats and ontologies…. But what should you be thinking about when you start down this road? Today, the second and final day of the BBSRC Synthetic Biology Standards Workshop, was about discussing what parts of a synthetic biology standard are unique to that standard, and what can be drawn from other sources. And, ultimately, it was about reminding ourselves not to reinvent the wheel and not to require more information than the community was willing to provide.
Matthew Pocock had a great introduction into this topic when he summarized what he thinks about when he thinks about standards. Make sure you don’t miss my notes on his presentation further down this post.
(If you’re interested, have a look at yesterday’s blog post on the first day of this workshop: The more things change, the more they stay the same.)
Half a day was a perfect amount of time to get the ball rolling, but we could have talked all day and into the next. Other workshops are planned for the coming months, and it will be very interesting to see what happens as things progress, both in person and via remote discussions.
Once again, for the time constrained among us, here are my favorite sentences from the presentations and discussions of the day:
- Dick Kitney: Synthetic biology is already important in industry, and if you want to work with major industrial companies, you need to get acceptance for your standards, making the existing standard (DICOM) very relevant to what we do here.
- Matthew Pocock: Divide your nascent standard into a continuum of uniqueness, from the components of your standard which are completely unique to your field, through to those which are important but have overlap with a few other related fields , and finally to the components which are integral to the standard but which are also almost completely generic.
- Discussion 1: Modelling for the purposes of design is very different from modelling for the purposes of analysis and explanation of existing biology.
- Discussion 2: I learnt that, just as in every other field I’ve been involved in, there are terms in synthetic biology so overloaded with meaning (for example, “part”) it is better to use a new word when you want to add those concepts to an ontology or controlled vocabulary.
Dick Kitney – Imperial College London: “Systematic Design and Standards in Synthetic Biology”
Dick Kitney discussed how SynBIS, a synthetic biology web-based information system with an integrated BioCAD and modelling suite, was developed and how it is currently used. There are three parts to the CAD in SynBIS: DNA assembly, characterization, and chassis (data for SynBIS). They are using automation in the lab as much as possible. With BioCAD, you can use a parallel strategy for both computer modelling and the synthetic biology itself.
With SynBIS, you can get inputs from other systems as well as part descriptions, models and model data from internal sources. SynBIS has 4 layers: an Interface/HTML layer, a communication layer, an application layer and and a database layer.
Information can be structured into four types: the biological “continuum” (or the squishy stuff), modalities (experimental types, standards relating to such), (sorry – missed this one), and ontologies. SynBIS incorporates the DICOM standard for their biological information. DICOM can be used and modified to store/send parts and associated metadata, related images, and related/collected data. They are interested in DICOM because of the industrialization of synthetic biology. Most major industries and companies already use the DICOM standard. If you want to work with major industrial companies, you need to get acceptance for your standards, making DICOM very important. The large number of users of DICOM are a result of large amounts of effort going into the creation of this modular, modality-friendly standard.
Images are getting more and more important for synthetic biology. If you rely on GFP fluorescence, for example, then you need high levels of accuracy in order to replicate results. DICOM helps you do this. It isn’t just a file format, and includes transfer protocols etc. Each image in DICOM has its own metadata.
What are the downsides of DICOM? DICOM is very complex, and most academics might not have the resources to make use of it (it has a huge 3,000-page document). In actuality, however, it is a lot easier to use then you might think. There are libraries, viewers and standard packages that hide most of the complexity. What is the most popular use of DICOM right now? MRCT, ultrasound, light microscopy, lab data, and many other modalities. In a hospital, most machines’ outputs are compliant with DICOM.
As SBOL develops and expands, they plan to incorporate it into SynBIS.
Issues relating to the standard – Run by Matthew Pocock
The rest of the workshop was structured discussion on the practical aspects of building this standard. Matthew Pocock corralled us all and made sure we remained useful, and also provided the discussion points.
To start, Matt provided some background. What does he ponder when he thinks about standards? Adoption of the standard for one, and who your adopters might be. Such people would be both/either providers of data and/or consumers of data. Also, both machines and humans will interact with the standard. The standard should be easy-to-implement, with a low buy-in.
You need to think about copyright and licensing issues: who owns it, maintains it. Are people allowed to change it for their own or public use? Your standard needs to have a clearly-defined scope: you don’t want it to force you to think about what you’re not interested in. To do this, you should have a list of competency questions.
You want the standard to be orthogonal with other standards and compose into it any other related standards you wish to use but which don’t belong in your new standard. You should have a minimal level of compliance in order for your data to be accepted.
Finally, above all, users of your standard would like it to be lightweight and agile.
What are the technical areas that standards often cover? You should have domain-specific models of what you’re interested in (terminologies, ontologies, UML): essentially, what your data looks like. You also need to have a method of data persistence and protocols, e.g. how you write it down (format, XML, etc.). You also need to think about transport of the data, or how you move it about (SOAP, REST, etc.). Access has to be thought about as well, or how you query for some of the data (SQL, DAS, custom API, etc.).
Within synthetic biology, there is a continuum from incredibly generic, useful standards through to things that are absolutely unique to our (synthetic biology) use case, and then in between is stuff that’s really important, but which might be shared with some other areas such as systems biology. For example, LIMS, and generic metadata are completely generic and can be taken care of by things like Dublin Core. DNA sequence and features are important to synthetic biology, but are not unique to it. Synthetic biology’s peculiar constraints include things like a chassis. You could say that host is synonymous with chassis, but in fact they are completely different roles. Chassis is a term used to describe something very specific in synthetic biology.
Some fields relevant to synthetic biology: microscopy, all the ‘omics, genetic and metabolic engineering, bioinformatics.
Consider the unique ↔ generic continuum: where do activities in the synthetic biology lifecycle lie on the diagram? What standards already exist for these? What standards are missing?
The notes that follow are a merge of the results from the two groups, but it may be an imperfect merge and as a consequence, there may be some overlap.
UNIQUE (to synthetic biology)
- design (the composition of behaviour (rather than of DNA, for example)).
- modelling a novel design is different than modelling for systems biology, which seeks to discover information about existing pathways and interactions
- quantification for design
- Desired behaviour: higher-level design, intention. I am of the opinion that other fields also have an intention when performing an experiment, which may or may not be realized during the course of an experiment. I may be wrong in this, however. And I don’t mean an expected outcome – that is something different again.
- Device (reusable) / parts / components
- Multi-component, multiple-stage assembly
- assembly and machine-automated characterization, experiments and protocols (some of this might be covered in more generic standards such as OBI)
- Scale and scaling of design
- engineering approaches
- computational accessibility
- positional information
- metabolic load (burden)
- evolutionary stability
- modelling (from systems biology): some aspects of both types of modelling are common.
- you use modelling tools in different ways when you are starting from a synbio viewpoint
- SBML, CellML, BioPAX
- module/motifs/components – reusable models
- Biological interfaces (rips, pops)
- parts catalogues
- interactions between parts (and hosts)
- sequence information
- robustness to various conditions
- scaling of production
- Experimental (Data, Protocols)
- OBI + FuGE
- sequence and feature metadata
- SO, GO
- success/performance metrics (comparison with specs)
- manufacturing/production cost
From the components of a synthetic biology standard identified in discusison 1, choose two and answer:
- what data must be captured by the standard?
- What existing standards should it leverage?
- Where do the boundaries lie?
Parts and Devices
What data must be captured by the standard? Part/device definition/nomenclature, sequence data, type (enumerated list), relationships between parts (enumerated list / ontology), part aggregation (ordering and composition of nested parts), incompatibilities/contraindications (including range of hosts where the chassis is viable), part buffers and interfaces/Input/Output (as a sub-type of part), provenance, curation level. Any improvements (include what changes were made, and why they were made (e.g. mcherry with the linkers removed)); versioning information (version number, release notes, feature list, and known issues); equivalent parts which are customized for other chassis (codon optimization and usage, chassis-agnostic part); Provenance information including authorship, originating lab, and the date/age of the part (much covered by the SBOL-seq standard); the derivation of the part from other parts or other biological sequence databases, and a human- and machine-readable description of the derivation.
What existing standards? SBOL, DICOM, SO, EMBL, MIBBI
Boundaries: Device efficiency (only works in the biological contexts it’s been described in), chassis and its environment, related parts could be organized into part ‘families’ (perhaps use GO for some of this), also might be able to attach other quantitative information that could be common across some parts.
We need to state the type of the device, and we would need a new specification for each type of device, e.g. a promoter is not a GFP. We need to know some measurement information such as statistics, experimental conditions required to record, lab, protocols. Another important value is whether or not you’re using a reference part or device. The context information would include the chassis, in vitro/in vivo, conditions, half-life, and interactions with other devices/hosts.
Please note that the notes/talks section of this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!