What should you think about when you think about standards?

The creation of a new standard is very exciting (yes, really). You can easily get caught up in the fun of the moment, and just start creating requirements and minimal checklists and formats and ontologies…. But what should you be thinking about when you start down this road? Today, the second and final day of the BBSRC Synthetic Biology Standards Workshop, was about discussing what parts of a synthetic biology standard are unique to that standard, and what can be drawn from other sources. And, ultimately, it was about reminding ourselves not to reinvent the wheel and not to require more information than the community was willing to provide.

Matthew Pocock had a great introduction into this topic when he summarized what he thinks about when he thinks about standards.  Make sure you don’t miss my notes on his presentation further down this post.

(If you’re interested, have a look at yesterday’s blog post on the first day of this workshop: The more things change, the more they stay the same.)

Half a day was a perfect amount of time to get the ball rolling, but we could have talked all day and into the next. Other workshops are planned for the coming months, and it will be very interesting to see what happens as things progress, both in person and via remote discussions.

Once again, for the time constrained among us, here are my favorite sentences from the presentations and discussions of the day:

  1. Dick Kitney: Synthetic biology is already important in industry, and if you want to work with major industrial companies, you need to get acceptance for your standards, making the existing standard (DICOM) very relevant to what we do here.
  2. Matthew Pocock: Divide your nascent standard into a continuum of uniqueness, from the components of your standard which are completely unique to your field, through to those which are important but have overlap with a few other related fields , and finally to the components which are integral to the standard but which are also almost completely generic.
  3. Discussion 1: Modelling for the purposes of design is very different from modelling for the purposes of analysis and explanation of existing biology.
  4. Discussion 2: I learnt that, just as in every other field I’ve been involved in, there are terms in synthetic biology so overloaded with meaning (for example, “part”) it is better to use a new word when you want to add those concepts to an ontology or controlled vocabulary.

Dick Kitney – Imperial College London: “Systematic Design and Standards in Synthetic Biology”

Dick Kitney discussed how SynBIS, a synthetic biology web-based information system with an integrated BioCAD and modelling suite, was developed and how it is currently used. There are three parts to the CAD in SynBIS: DNA assembly, characterization, and chassis (data for SynBIS). They are using automation in the lab as much as possible. With BioCAD, you can use a parallel strategy for both computer modelling and the synthetic biology itself.

With SynBIS, you can get inputs from other systems as well as part descriptions, models and model data from internal sources. SynBIS has 4 layers: an Interface/HTML layer, a communication layer, an application layer and and a database layer.

Information can be structured into four types: the biological “continuum” (or the squishy stuff), modalities (experimental types, standards relating to such), (sorry – missed this one), and ontologies. SynBIS incorporates the DICOM standard for their biological information. DICOM can be used and modified to store/send parts and associated metadata, related images, and related/collected data. They are interested in DICOM because of the industrialization of synthetic biology. Most major industries and companies already use the DICOM standard. If you want to work with major industrial companies, you need to get acceptance for your standards, making DICOM very important. The large number of users of DICOM are a result of large amounts of effort going into the creation of this modular, modality-friendly standard.

Images are getting more and more important for synthetic biology. If you rely on GFP fluorescence, for example, then you need high levels of accuracy in order to replicate results. DICOM helps you do this. It isn’t just a file format, and includes transfer protocols etc. Each image in DICOM has its own metadata.

What are the downsides of DICOM? DICOM is very complex, and most academics might not have the resources to make use of it (it has a huge 3,000-page document). In actuality, however, it is a lot easier to use then you might think. There are libraries, viewers and standard packages that hide most of the complexity. What is the most popular use of DICOM right now? MRCT, ultrasound, light microscopy, lab data, and many other modalities. In a hospital, most machines’ outputs are compliant with DICOM.

As SBOL develops and expands, they plan to incorporate it into SynBIS.

Issues relating to the standard – Run by Matthew Pocock

The rest of the workshop was structured discussion on the practical aspects of building this standard. Matthew Pocock corralled us all and made sure we remained useful, and also provided the discussion points.

To start, Matt provided some background. What does he ponder when he thinks about standards? Adoption of the standard for one, and who your adopters might be. Such people would be both/either providers of data and/or consumers of data. Also, both machines and humans will interact with the standard. The standard should be easy-to-implement, with a low buy-in.

You need to think about copyright and licensing issues: who owns it, maintains it. Are people allowed to change it for their own or public use? Your standard needs to have a clearly-defined scope: you don’t want it to force you to think about what you’re not interested in. To do this, you should have a list of competency questions.

You want the standard to be orthogonal with other standards and compose into it any other related standards you wish to use but which don’t belong in your new standard. You should have a minimal level of compliance in order for your data to be accepted.

Finally, above all, users of your standard would like it to be lightweight and agile.

What are the technical areas that standards often cover? You should have domain-specific models of what you’re interested in (terminologies, ontologies, UML): essentially, what your data looks like. You also need to have a method of data persistence and protocols, e.g. how you write it down (format, XML, etc.). You also need to think about transport of the data, or how you move it about (SOAP, REST, etc.). Access has to be thought about as well, or how you query for some of the data (SQL, DAS, custom API, etc.).

Within synthetic biology, there is a continuum from incredibly generic, useful standards through to things that are absolutely unique to our (synthetic biology) use case, and then in between is stuff that’s really important, but which might be shared with some other areas such as systems biology. For example, LIMS, and generic metadata are completely generic and can be taken care of by things like Dublin Core. DNA sequence and features are important to synthetic biology, but are not unique to it. Synthetic biology’s peculiar constraints include things like a chassis. You could say that host is synonymous with chassis, but in fact they are completely different roles. Chassis is a term used to describe something very specific in synthetic biology.

Some fields relevant to synthetic biology: microscopy, all the ‘omics, genetic and metabolic engineering, bioinformatics.

Discussion 1

Consider the unique ↔ generic continuum: where do activities in the synthetic biology lifecycle lie on the diagram? What standards already exist for these? What standards are missing?

The notes that follow are a merge of the results from the two groups, but it may be an imperfect merge and as a consequence, there may be some overlap.

UNIQUE (to synthetic biology)

  • design (the composition of behaviour (rather than of DNA, for example)).
    • modelling a novel design is different than modelling for systems biology, which seeks to discover information about existing pathways and interactions
    • quantification for design
  • Desired behaviour: higher-level design, intention. I am of the opinion that other fields also have an intention when performing an experiment, which may or may not be realized during the course of an experiment. I may be wrong in this, however. And I don’t mean an expected outcome – that is something different again.
  • Device (reusable) / parts / components
  • Multi-component, multiple-stage assembly
    • biobricks
    • assembly and machine-automated characterization, experiments and protocols (some of this might be covered in more generic standards such as OBI)
  • Scale and scaling of design
  • engineering approaches
  • characterization
  • computational accessibility
  • positional information
  • metabolic load (burden)
  • evolutionary stability

IMPORTANT

  • modelling (from systems biology): some aspects of both types of modelling are common.
    • you use modelling tools in different ways when you are starting from a synbio viewpoint
    • SBML, CellML, BioPAX
  • module/motifs/components – reusable models
  • Biological interfaces (rips, pops)
  • parts catalogues
  • interactions between parts (and hosts)
  • sequence information
  • robustness to various conditions
  • scaling of production

GENERIC

  • Experimental (Data, Protocols)
    • OBI + FuGE
  • sequence and feature metadata
    • SO, GO
  • LIMS
  • success/performance metrics (comparison with specs)
  • manufacturing/production cost

Discussion 2

From the components of a synthetic biology standard identified in discusison 1, choose two and answer:

  • what data must be captured by the standard?
  • What existing standards should it leverage?
  • Where do the boundaries lie?

Parts and Devices

What data must be captured by the standard? Part/device definition/nomenclature, sequence data, type (enumerated list), relationships between parts (enumerated list / ontology), part aggregation (ordering and composition of nested parts), incompatibilities/contraindications (including range of hosts where the chassis is viable), part buffers and interfaces/Input/Output (as a sub-type of part), provenance, curation level. Any improvements (include what changes were made, and why they were made (e.g. mcherry with the linkers removed)); versioning information (version number, release notes, feature list, and known issues); equivalent parts which are customized for other chassis (codon optimization and usage, chassis-agnostic part); Provenance information including authorship, originating lab, and the date/age of the part (much covered by the SBOL-seq standard); the derivation of the part from other parts or other biological sequence databases, and a human- and machine-readable description of the derivation.

What existing standards? SBOL, DICOM, SO, EMBL, MIBBI

Boundaries: Device efficiency (only works in the biological contexts it’s been described in), chassis and its environment, related parts could be organized into part ‘families’ (perhaps use GO for some of this), also might be able to attach other quantitative information that could be common across some parts.

Characterization

We need to state the type of the device, and we would need a new specification for each type of device, e.g. a promoter is not a GFP. We need to know some measurement information such as statistics, experimental conditions required to record, lab, protocols. Another important value is whether or not you’re using a reference part or device. The context information would include the chassis, in vitro/in vivo, conditions, half-life, and interactions with other devices/hosts.

Please note that the notes/talks section of this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Advertisements

The more things change, the more they stay the same

…also known as Day 1 of the BBSRC Synthetic Biology Standards Workshop at Newcastle University, and musings arising from the day’s experiences.

In my relatively short career (approximately 12 years – wait, how long?) in bioinformatics, I have been involved to a greater or lesser degree in a number of standards efforts. It started in 1999 at the EBI, where I worked on the production of the protein sequence database UniProt. Now, I’m working with systems biology data and beginning to look into synthetic biology. I’ve been involved in the development (or maintenance) of a standard syntax for protein sequence data; standardized biological investigation semantics and syntax; standardized content for genomics and metagenomics information; and standardized systems biology modelling and simulation semantics.

(Bear with me – the reason for this wander through memory lane becomes apparent soon.)

How many standards have you worked on? How can there be multiple standards, and why do we insist on creating new ones? Doesn’t the definition of a standard mean that we would only need one? Not exactly. Take the field of systems biology as an example. Some people are interested in describing a mathematical model, but have no need for storing either the details of how to simulate that model or the results of multiple simulation runs. These are logically separate activities, yet they fall within a single community (systems biology) and are broadly connected. A model is used in a simulation, which then produces results. So, when building a standard, you end up with the same separation: have one standard for the modelling, another for describing a simulation, and a third for structuring the results of a simulation. All that information does not need to be stored in a single location all the time. The separation becomes even more clear when you move across fields.

But this isn’t completely clear cut. Some types of information overlap within standards of a single domain and even among domains, and this is where it gets interesting. Not only do you need a single community talking to each other about standard ways of doing things, but you also need cross-community participation. Such efforts result in even more high-level standards which many different communities can utilize. This is where work such as OBI and FuGE sit: with such standards, you can describe virtually any experiment. The interconnectedness of standards is a whole job (or jobs) in itself – just look at the BioSharing and MIBBI projects. And sometimes standards that seem (at least mostly) orthogonal do share a common ground. Just today, Oliver Ruebenacker posted some thoughts on the biopax-discuss mailing list where he suggests that at least some of BioPAX and SBML share a common ground and might be usefully “COMBINE“d more formally (yes, I’d like to go to COMBINE; no, I don’t think I’ll be able to this year!). (Scroll down that thread for a response by Nicolas Le Novère as to why that isn’t necessarily correct.) So, orthogonality, or the extent to which two or more standards overlap, is sometimes a hard thing to determine.

So, what have I learnt? As always, we must be practical. We should try to develop an elegant solution, but it really, really should be one which is easy to use and intuitive to understand. It’s hard to get to that point, especially as I think that point is (and should be) a moving target. From my perspective, group standards begin with islands of initial research in a field, which then gradually develop into a nascent community. As a field evolves, ‘just-enough’ strategies for storing and structuring data become ‘nowhere-near-enough’. Communication with your peers becomes more and more important, and it becomes imperative that standards are developed.

This may sound obvious, but the practicalities of creating a community standard means such work requires a large amount of effort and continued goodwill. Even with the best of intentions, with every participant working towards the same goal, it can take months – or years – of meetings, document revisions and conference calls to hash out a working standard. This isn’t necessarily a bad thing, though. All voices do need to be heard, and you cannot have a viable standard without input from the community you are creating that standard for. You can have the best structure or semantics in the world, but if it’s been developed without the input of others, you’ll find people strangely reluctant to use it.

Every time I take part in a new standard, I see others like me who have themselves been involved in the creation of standards. It’s refreshing and encouraging. Hopefully the time it takes to create standards will drop as the science community as a whole gets more used to the idea. When I started, the only real standards in biological data (at least that I had heard of) were the structures defined by SWISS-PROT and EMBL/GenBank/DDBJ. By the time I left the EBI in 2006, I could have given you a list a foot long (GO, PSI, and many others), and that list continues to grow. Community engagement and cross-community discussions continue to be popular.

In this context, I can now add synthetic biology standards to my list of standards I’ve been involved in. And, as much as I’ve seen new communities and new standards, I’ve also seen a large overlap in the standardization efforts and an even greater willingness for lots of different researchers to work together, even taking into account the sometimes violent disagreements I’ve witnessed! The more things change, the more they stay the same…

At this stage, it is just a limited involvement, but the BBSRC Synthetic Biology Standards Workshop I’m involved in today and tomorrow is a good place to start with synthetic biology. I describe most of today’s talks in this post, and will continue with another blog post tomorrow. Enjoy!

For those with less time, here is a single sentence for each talk that most resounded with me:

  1. Mike Cooling: Emphasising the ‘re’ in reusable, and make it easier to build and understand large models from reusable components.
  2. Neil Wipat: For a standard to be useful, it must be computationally amenable as well as useful for humans.
  3. Herbert Sauro: Currently there is no formal ontology for synthetic biology, but one will need to be developed.

This meeting is organized by Jen Hallinan and Neil Wipat of Newcastle University. Its purpose is to set up key relationships in the synthetic biology community to aid the development of a standard for that community. Today, I listened to talks by Mike Cooling, Neil Wipat, and Herbert Sauro. I was – unfortunately – unable to be present for the last couple of talks, but will be around again for the second – and final – day of the workshop tomorrow.

Mike Cooling – Bioengineering Institute Auckland, New Zealand

Mike uses CellML (it’s made where he works, but that’s not the only reason…) in his work with systems and synthetic biology models. Among other things, it wraps MathML and partitions the maths, variables and units into reusable pieces. Although many of the parts seem domain specific, CellML itself is actually not domain specific. Further, unlike other modelling languages such as SBML, components in CellML are reusable and can be imported into other models. (Yes, a new package called comp in SBML Level 3 is being created to allow the importing of models into other models, but it isn’t mature – yet.)

How are models stored? There is the CellML repository, but what is out there for synthetic biology? The Registry of Standard Biological Parts was available, but only described physical parts. Therefore they created a Registry of Standard Virtual Parts (SVPs) to complement the original registry. This was developed as a group effort with a number of people including Neil Wipat and Goksel Misirli at Newcastle University.

They start with template mathematical structures (which are little parts of CellML), and then use the import functionality available as part of CellML to combine the templates into larger physical things/processes (‘SVPs’) and ultimately to combine things into system models.

They extended the CellMLRepository to hold the resulting larger multi-file models, which included adding a method of distributed version control and allow the sharing of models between projects through embedded workspaces.

What can these pieces be used for? Some of this work included the creation of a CellML model of the biology represented in Levskaya et al. 2005 and deposit all of the pieces of the model in the CellML repository. Another example is a model he’s working on about shear stress and multi-scale modelling for aneurysms.

Modules are being used and are growing in number, which is great, but he wants to concentrate more at the moment on the ‘re’ of the reusable goal, and make it easier to build and understand large models from reusable components. Some of the integrated services he’d like to have: search and retrieval, (semi-automated) visualization, semantically-meaningful metadata and annotations, and semi-automated composition.

All this work above converges on the importance of metadata. With the CellML Metadata Framework 1.0, not many used it. With version 2.0 they have developed a core specification with is very simple and then provide many additional satellite specifications. For example, there is a biological information satellite, where you use the biomodels qualifiers as relationships between your data and MIRIAM URNs. The main challenge is to find a database that is at the right level of abstraction (e.g. canonical forms of your concept of interest).

Neil Wipat – Newcastle University

Please note Neil Wipat is my PhD supervisor.

Speaking about data standards, tool interoperability, data integration and synthetic biology, a.k.a “Why we need standards”. They would like to promote interoperability and data exchange between their own tools (important!) as well as other tools. They’d also like to facilitate data integration to inform the design of biological systems both from a manual designer’s perspective and from the POV of what is necessary for computational tool use. They’d also like to enable the iterative exchange of data and experimental protocols in the synthetic biology life cycle.

A description of some of the tools developed in Neil’s group (and elsewhere) exemplify the differences in data structures present within synthetic biology. BacilloBricks was created to help get, filter and understand the information from the MIT registry of standard parts. They also created the Repository of Standard Virtual Biological Parts. This SVP repository was then extended with parts from Bacillus and was extended to make use of SBML as well as CellML. This project is called BacilloBricks Virtual. All of these tools use different formats.

It’s great having a database of SVPs, but you need a way of accessing and utilizing the database. Hallinan and Wipat have started a collaboration with Microsoft Research with the people who created a programming language for genetic engineering of living cells called the genetic engineering of cells (GEC) simulator. Some work a summer student did created a GEC compiler for SVPs from BacilloBricks virtual. Goksel has also created the MoSeC system where you can automatically go from a model to a graph to a EMBL file.

They also have BacillusRegNet, which is an information repository about transcription factors for Bacillus spp. It is also a source of orthogonal transcription factors for use in B. subtilis and Geobacillus. Again, it is very important to allow these tools to communicate efficiently.

The data warehouse they’re using is ONDEX. They feed information from the ONDEX data store to the biological parts database. ONDEX was created for systems biology to combine large experimental datasets. ONDEX views everything as a network, and is therefore a graph-based data warehouse. ONDEX has a “mini-ontology” to describe the nodes and edges within it, which makes querying the data (and understanding how the data is structured) much easier. However, it doesn’t include any information about the synthetic biology side of things. Ultimately, they’d like an integrated knowledgebase using ONDEX to provide information about biological virtual parts. Therefore they need a rich data model for synthetic biology data integration (perhaps including an RDF triplestore).

Interoperabiligy, Design and Automation: why we need standards.

Requirement 1. There needs to be interoperability and data exchange among these tools as well as among these tools and other external tools. Requirement 2. Standards for data integration aid the design of synthetic systems. The format must be both computationally amenable and useful for humans. Requirement 3. Automation of the design and characterization of synthetic systems, and this also requires standards.

The requirements of synthetic biology research labs such as Neil Wipat’s make it clear that standards are needed.

KEYNOTE: Herbert Sauro – University of Washington, US

Herbert Sauro described the developing community within synthetic biology, the work on standards that has already begun, and the Synthetic Biology Open Language (SBOL).

He asks us to remember that Synthetic Biology is not biology – it’s engineering! Beware of sending synthetic biology grant proposals to a biology panel! It is a workflow of design-build-test. He’s mainly interested in the bit between building and testing, where verification and debugging happens.

What’s so important about standards? It’s critical in engineering, where if increases productivity and lowers costs. In order to identify the requirement you must describe a need. There is one immediate need: store everything you need to reconstruct an experiment within a paper (for more on this see the Nature Biotech paper by Peccoud et al. 2011: Essential information for synthetic DNA sequences). Currently, it’s almost impossible to reconstruct a synthetic biology experiment from a paper.

There are many areas requiring standards to support the synthetic biology workflow: assembly, design, distributed repositories, laboratory parts management, and simulation/analysis. From a practical POV, the standards effort needs to allow researchers to electronically exchange designs with round tripping, and much more.

The standardization effort for synthetic biology began with a grant from Microsoft in 2008 and the first meeting was in Seattle. The first draft proposal was called PoBoL but was renamed to SBOL. It is a largely unfunded project. In this way, it is very similar to other standardization projects such as OBI.

DARPA mandated 2 weeks ago that all projects funded from Living Foundries must use SBOL.

SBOL is involved in the specification, design and build part of the synthetic biology life cycle (but not in the analysis stage). There are a lot of tools and information resources in the community where communication is desperately needed.

SBOL Semantic, SBOL Visual, and SBOL Script. SBOL Semantic is the one that’s going to be doing all of the exchange between people and tools. SBOL Visual is a controlled vocabulary and symbols for sequence features.

Have you been able to learn anything from SBML/SBGN, as you have a foot in both worlds? SBGN doesn’t address any of the genetic side, and is pretty complicated. You ideally want a very minimalistic design. SBOL semantic is written in UML and is relatively small, though has taken three years to get to this point. But you need host context above and beyond what’s modelled in SBOL Semantic. Without it, you cannot recreate the experiment.

Feature types such as operator sites, promoter sites, terminators, restriction sites etc can go into the sequence ontology (SO). The SO people are quite happy to add these things into their ontology.

SBOLr is a web front end for a knowledgebase of standard biological parts that they used for testing (not publicly accessible yet). TinkerCell is a drag and drop CAD tool for design and simulation. There is a lot of semantic information underneath to determine what is/isn’t possible, though there is no formal ontology. However, you can semantically-annotate all parts within TinkerCell, allowing the plugins to interpret a given design. A TinkerCell model can be composed of sub-models. Makes it easy to swap in new bits of models to see what happens.

WikiDust is a TinkerCell plugin written in Python which searches SBPkb for design components, and ultimately uploads them to a wiki. LibSBOLj is a library for developers to help them connect software to SBOL.

The physical and host context must be modelled to make all of this useful. By using semantic web standards, SBOL becomes extensible.

Currently there is no formal ontology for synthetic biology but one will need to be developed.

Please note that the notes/talks section of this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Special Session 4: Adam Arkin on Synthetic Biology (ISMB 2009)

Running the Net: Finding and Employing the OPerating Principles of Cellular Systems
Adam Arkin
Part of the Advances and Challenges in Computational Biology, hosted by PLoS Computational Biology

The need for scientific standards and cooperation. Very much data driven in synthetic biology. We’ve been genetic engineering since the dawn of agriculture (teosinte, cows etc). And with dogs, which started around 10,000 years ago. Then the extremely different breeds we have today. That such differences would cause survival effects in the “wild” doesn’t bother many people. Next is the classic example of the cane toad, which destroyed environmental diversity.

Synthetic biology is dedicated to making the engineering of new complex functionsin cells vastly more transparent, and that openness is a really important part. It is trying to find solutions to problems in health, energy, environment, and security.

How can we reduce the time and improve the reliability of biosynthesis? Engineering is all about well-characterized, standard parts and devices. You need standards in parts, protocols, repositories, registries, publications, data and metadata. This helps a lot when you have groups and need to perform coordinated sciences: linux is an example of this working. But is design scalable? While applications will always have application-specific parts, there are sets of functions common or probable in all applications.

You can have structures that regulate most parts of gene expression. In talking about probability of elongation, they use an antisense-RNA-mediated transcription attenuator, which has a recognition motif, a possible terminator, and a coding sequence. Through a series of steps, if a antisense RNA absent, then you get transcription (and the opposite is true too): this is a NOT gate. For transcriptional attenuators, it is possible to design orthogonal mutant lock-key mechanisms. You can obtain orthogonal pairs by rational design but there is a certain attenuation loss. They can’t explain everything about the functioning of these devices. Want to improve communication in this respect. If you put two attenuators on the same transcript, it behaves about as you expect: a NOT-OR gate.

Bacteria engineered as pathogens to target particular human tissue (e.g. tumors). To do that, you have to build many different modules with its own computational and culure unit tests. These different modules/models can be re-used, e.g. in the iGEM competition. The problem is that the complexity of the engineering problem is greatly increased beyond that found in chemical production / bioreactors.

Absolute requirements: openness, transparency, standards, team-science approaches.

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Keynote: Towards Scalable Synthetic Biology and Engineering Beyond the Bioreactor (BioSysBio 2009)

Adam Arkin
UC Berkeley

People have been doing "Old School" synbio for a long time, of course: take corn (which came from Teosinte), dogs. But is selective breeding actually equivalent, in some sense, to "old school" synthetic biology? He argues that they are like synbio because they are human-designed. He further argues that the main difference is that in synbio, you know what you're doing. Non-synthetic biology: artifical introduction of cane toads in Australia, which is a gigantic mess. His point is that the biggest threat to biodiversity and human health is general things that already exist.

So the point of synbio is that it could make things more transparent, efficient, reliable, predictable and safe. How can we reduce the time and improve the reliability of biosynthesis? standardized parts, CAD, methods for quickly assembling parts, etc. But is design scalable? Applications will always have application-specific parts, but there are sets of function common or probable in all applications.

Transcriptional Logics. Why RNA transcripts? There are lots of different shapes, it avoids promoter limitations (physical homogeneity), and many are governed by Watson-Crick base pairing (and therefore designable). You can put multiple attenuators in series. You can also put different antisenses together to make different logic gates.

Protein Logics: Increasing flux through a biosynthetic pathway. Different activities of various enzymes – different turnovers. Loss of substrate through runoff to other pathways. Solution: build a scaffold tolocalize the enzymes and substrates (import from eukaryotes). Then he spent some time describing recombinases and invertase dynamics.

Evolved systems are complex and subtle. Synbio organisms need to deal with the same uncertainity and competition as the existing organisms. Spent some time talking about treating cancer with bacteria. Why do bacteria grow preferentially in tumors? Better nutrient concentrations, reduced immune surveillance, differential growth rates, and differential clearance rates. In humans, the bacteria that have been tried are pathogens, which make you sick, and you needs LOADS of it in the body. There is one that's used for bladder cancer, and has an 85% success rate.

Wednesday Session 3
http://friendfeed.com/rooms/biosysbio
http://conferences.theiet.org/biosysbio

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. I'm happy to correct any errors you may spot – just let me know!

Read and post comments |
Send to a friend

original

Building a New Biology (BioSysBio 2009)

Drew Endy
Stanford University, and BioBricks Foundation

Overview: Puzzle related to SB and informing some of his engineering work. Then a ramble through the science of genetics. Last part is a debrief on BioBrick public agreements.

Part 1. If SB is going to scale, we really need to think about the underlying "physics engine", you could do worse than look to Gillespie's work on a well-mixed system. This underlies much of the stochastic systems that underly SB, such as the differentiation of stem cells. A lot of work is based on this idea. Another good system is phage lambda: a phage infects a cell, leading to two outcomes: lysogen + dormancy, or lysing of the cell. If you infect 100 cells with exactly 1 phage molecule each, you get a distribution of behaviour. How is the physics working here? How does an individual cell decide which fate is in store? About 10 years ago, A Arkin took this molecular biology and mapped it to a physics model. From this model it became clear how this variability arises. Can you predetermine what cell fate will occur before lamba infects it? Endy looked into this. They collected different types of cells: both tiny and large (e.g. with the latter, about to divide and with the former just after division). They then scored each cell for the different fates. In the tiny cells, lysogeny is favored 4 to 1, whereas in big cells, lysis is favored 4 to 1. In the end, this is a deterministic model. There might be some discrete transition where certain parts of the cell cycle favor certain fates. They found, however, that there was a continuous distribution of lysis/lysogeny. Further examination found that there was a third, mixed fate. This is that the cell divides before it decides what to do, and the daughter cells will then decide what to do.

They have looked at this process in time, and how it works at the single-cell level. N is a protein made almost immediately upon infection – its activity is not strongly coordinated with cell fate. Cll *is* strongly associated, however. Q protein also studied. In a small bacterium, 100 molecules of repressor are constrained more in the physical sense, so you need 400 of Cro to balance; while in a bigger bacterium there is more space and only 100 Cro are needed. However, this theory may not work as the things may take too long to be built.

Part 2. How much DNA is there on earth? Well, it must be finite. he's not sure about these numbers1E10 tons bacteria (5% DNA)… 5E35 bp on the planet. How long would it take us to sequence it? A conservative estimate – and a little out of date – is about 5E23 months – one mole of months! If current trends hold, a typical RO1 (grant) in 2090 could have: sequence all DNA on earth in the first month of project. 🙂

If there is a finite amount of dna on the planet, could we finish the science of genetics or SB? If true, could we then finish early? Is genetics bounded? Well, if these three things hold true, perhaps yes: genomes have finite lengths; Fixation of rates of mutants in poopulations are finite; Atrophy rates of functional genetic elements are > 0.

Is the underlying math equal to perturbation design? Take the bacteriophage T7 (references a 1969 paper about it from Virology): in that, 19 genes have been identified by isolating the mutants and expect 10 more. By 1989 the sequence came out, and there were acutally 50 genes. So, mutagenesis and screening only got some of the genes. About 40% of the elements didn't have a function assigned.

Could a biologist fix a radio? Endy's question is: could an engineer fix an evolved radio (see Koza et al.)?

Part 3. Who owns BioFAB? What legal things do we need to do for BioBricks? Patents are slow and expensive, copyright is cheap but does not apply, and various other things have other problems. Therefore they have drafted the BioBrick Public Agreements document. He then showed the actual early draft document. They're trying to create a commons of free parts. Open Technology Platform for BioBricks.

Personal Comments: Best statement from Endy: "Really intelligent design would have documentation." (Not sure if it is his statement, or attributed to someone else).

Wednesday Session 3
http://friendfeed.com/rooms/biosysbio
http://conferences.theiet.org/biosysbio

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. I'm happy to correct any errors you may spot – just let me know!

Read and post comments |
Send to a friend

original

Programming RNA Devices to Control Cellular Information Processing (BioSysBio 2009)

C Smolke
Caltech

This talk is more focused on synbio. There are many natural chemicals and materials with useful properties, and it would be great to be able to do things with them. Examples are taxol from pacific yew, codeine and morphine from opium poppies, and butanol from clostridium, spider silk and abalone shell, and the rubber tree. It is much more efficient to get these useful chemicals grown inside a bacterium rather than its natural source. These microbial factories are a useful application area for synbio. Similarly, intelligent therapeutics is another application area for synbio. In IT, two biomarkers together would (via other steps) produce a programmed output. You could link these programs to biosensors, or perform metabolic reprogramming, performed programmed growth and more. The ultimate goal is to be able to engineer systems. These systems generally need to interface with their environment.

Synbio *also* has circuitry, sensors and actuators, just like more traditional forms of engineering has. Foundational technologies (synthesis) -> Engineering Frameworks (standardization and composition) -> Engineered Biological Systems (environment, health and medicine). An information processing control (IPC) molecule would have three functions, as mentioned earlier: sensor, computation (process information from sensor and regulate activity of the actuator), and actuator. There are variety of inputs for sensor (small molecules, proteins, RNA, DNA, metal ions, temperature, pH, etc). The actuator could link to various mechanisms like transcription, translation, degradation, splicing, enzyme activity, complex formation, etc. Key engineering properties to think about are scalability, portability, utility, composability, and reliability.

What type of substrate should we build this IPC systems on? What about RNA synthetic biology? You'd go from RNA parts -> RNA devices -> engineered systems. Experimental frameworks provide general rules for assembling the parts into higher order devices. Then you organize devices into systems, which use in silico design frameworks for programming quantitative device performance. Why RNA? The biology of functional RNAs is one reason: noncoding regulatory RNA pathways are very useful. You can also have RNA sensor elements (aptamers), which bind a wide range of ligands with high specificity and affinity. Thirdly, RNA is a very programmable molecule.

They've developed a number of modular frameworks for assembling RNA devices, and she then gave a good explanation of one of them. In this explanation, she mentions that the transmitter can be modified to achieve desired gate function. The remaining nodes (or points of integration) can be used to assemble devices that exhibit desired information processing operations. A sensor + transmitter + actuator = device. The transmitter component for a buffer gate works via competitive binding between two strands. As the input increases in the cell a particular conformation is favored and gene expression is turned on. An inverter gate is the exact opposite. They wanted to make sure these sorts of frameworks are modular. They can do this by using a different receptor for the sensor to make it responsive to a different molecule.

You can also build higher-order information processing devices using these simpler modular devices. For instance, you might want to separate a gradient of an input signal into discrete parts. Another example would be the processing of multiple inputs, or cooperativity of the inputs.

The first architecture they proposed (SI 1): signal integration within the 3' UTR – multiple devices in series. They can build AND and NOR gates, as well as bandpass signal filters and others. In the output signal filter device, devices result in shifts in basal expression levels and output swing. Independent function is supported by matches to predicted values – the two devices linked in tandem are acting independently.

SI 2: a different type of architecture where signal integration is being performed at a single ribozyme core through both stems. You can make a NAND gate by coupling two inverter gates.

SI 3: Two sensor transmitter components are coupled onto a single ribozyme stem. This allows them to work in series. You can perform signal gain (cooperativity) as well as some gate types. With cooperativity, input A will modulate the second component which allows a second input A to bind to the second component.

Modularity of the actuator domain: using an shRNA switch – this exhibits similar properties to the ribozyme device.

How do we take these components and put them into real applications? One application is immune system therapies, where RNA-based systems offer teh potential for tight, programmable regulation over target protein levels. She had a really nice example of how she used a series of ribozymes to tune t-cell proliferation with RNA signal filters. After you get the right response, you need to create stable cell lines. Showed this working in mice.

Personal Comments: A very clear, very interesting talk on her work. Thanks very much!

Wednesday Session 1
http://friendfeed.com/rooms/biosysbio
http://conferences.theiet.org/biosysbio

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. I'm happy to correct any errors you may spot – just let me know!

Read and post comments |
Send to a friend

original

CSBE Symposium Day 2: From Systematic to Synthetic Biology

Notes CSBE Symposium Day 2: From Systematic to Synthetic Biology
September 5, 2007

The last day of the symposium was also very good. My notes aren’t as long this time, which may or may not be a good thing, depending on your point of view.

Today was a half-day with the last two talks containing information not really in my discipline. This meant that I didn’t follow them as well as the others, and therefore my notes aren’t much use. However, I have included the names of the authors and titles of the talks as indicators of what was discussed.

Jussi Taipale, University of Helsinki

“Systems Biology of Cancer”

How do growth factors and oncogenes regulate cell proliferation? Questions include:
+ Multicellularity: how is cell cycle regulation integrated with signals and transcriptional networks controlling differentiation?
+ organ-specific growth control
+ specificity of oncogenes to particular tissues/tumor types

Many oncogenes regulate the same processes. Cancer is a highly multigenic disease. There are only a few pheontypes to cancer. The main ones are unrestricted growth, invasion of other organs, metastasis. ~350 genes controlling essentially 3 phenotypes. They use computational (prediction of targets of oncogenic TFs) and experimental (expression profiling of cancers with known mutations) methods to identify transcriptional targets of oncogenic signalling pathways. They needed to determine the affinity of all single-base mismatch oligos for all three GLI TFs. Very often the highest-affinity is known, but not the lower-affinity sites.

Regulatory SNPS (rSNPS): placed all known SNPs into human genome and aligned against mouse to discover the impact of SNPs on binding sites and regulatory areas. rSNPs are thought to explain much of individual variation in the human population, and thus are likely to contribute to predisposition to diseases such as cancer. Application of EEL to prediction of regulatory SNPs. Initial analysis against HAPMAP data looks promising, however other data sets need to be done to confirm results.

Also, they look at transcriptional circuits regulating TFs. For screening, they initially started with flow cytometry analysis looking at Drosophila S2 cells as they have similar cell cylces to human. They found that DNA-content phenotypes are detectable with flow cytometry. They also did genome-wide pooling to analyze functional redundancy: the closest homologues for all drosophila proteins were identified using BLASTP. It doesn’t look like there’s much redundancy.

+ Systems biology of the metazoan cell cycle
They have id’ed approx 600 genes which affect the cell cycle in S2 cells. They get an 80% hit rate of known strong effectors based on alaysis of 19 different protein complexes and pathways. Approx 650 genes have been cloned to Gateway vectors for the analysis of overexpression phenotype, enzyme-substrate relationships (half-life etc), PPIs (TAP-tag, fragment2hybrid), and subcellular localizations. They also did an analysis of the transcriptional network. The transcriptional analysis includes: identification of target genes of all TFs affecting the cell cycle (whole-genome profiling after RNAi of all TFs affecting cell cycle or cell size, and determination of binding specificities of the TFs followed by EEL analysis in Drosophila species), ID of pathways affecting the activities of the TFs (whole-genome profiling of all strong hits, and clustering), ID’ing of signalling inputs to cell cycle machinery and unstable proteins that are transcriptionally regulated.

Mark Bradley, University of Edinburgh

“High-throughput chemical biology”

+ Encoded Libraries
A way to interrogate 10000 molecules on a DNA microarray: 10000 peptide compounds and 10000 tags, attached to each other via a linker. The tags allow us to ID the compound its attached to, and makes it possible to deliver the compound to a specific location on a 2D DNA microarray. peptide attached to a linker, which is attached to a tag, which is attached to PNA, which can attach to the DNA on the microarray. It is better to have a PNA/DNA than DNA/DNA.
The peptides all contain a quencher and a fluorescein donor. When a protease comes along it will cleave the peptide and liberate the quencher and give us fluorescence.
They have a 10000-member FRET-based library. Then treat with protease (3d) and put onto a 2d microarray. This is a transformation of 10000 solution assays into a 2d microarray. These are high-density, clean, arrays made with an OGT custome DNA microarray. Every PNA has a preferential “home” to go to in the array. There are 22,500 oligos on the array for replicates plus 2,500 controls. DNA is printed in random locations by OGT (Agilent) and use BlueGnome software analysis. All binding duplicates are compared.
They display the data using 40 cube plots with 1000 peptides per cube with one position defined. xyz are three different amino acids with the 4th amino acid being fixed.
Peptide Arrays and Cell Binding: Have also started using this method to identify ligands for cells.

+ Cellular Chips and Polymer Manipulation
A polymer coating provides specificity for white blood cells when removing them using filters (Sepacell) from whole blood. They have a program to identify new bio-compatible polymers for topics like prevention of binding. One approach they like is ink-jet printing. They want to do the same thing but rather than 3 colors, they want to do it with polymers or monomers.

+ Microwell Array Technology: single-cell loading and transfection
You can get 4000 wells on a microscope slide. If you seed with about 10000 cells per mL, you get >85% of wells with one cell per well. You can then propogate within the wells.

+ Future Directions
encoded proteomes for arraying all proteins; peptide arrays via inkjet printing; and more.

Jamal Tazi, CNRS, Montpellier

“Small molecule screens for splicing inhibitors”

Paul Ko Ferrigno, Leeds Institute of Molecular Medicine

“Label-free protein microarrays for systems biology”

Read and post comments
|
Send to a friend

original