Categories
Meetings & Conferences Standards

What should you think about when you think about standards?

The creation of a new standard is very exciting (yes, really). You can easily get caught up in the fun of the moment, and just start creating requirements and minimal checklists and formats and ontologies…. But what should you be thinking about when you start down this road? Today, the second and final day of the BBSRC Synthetic Biology Standards Workshop, was about discussing what parts of a synthetic biology standard are unique to that standard, and what can be drawn from other sources. And, ultimately, it was about reminding ourselves not to reinvent the wheel and not to require more information than the community was willing to provide.

Matthew Pocock had a great introduction into this topic when he summarized what he thinks about when he thinks about standards.  Make sure you don’t miss my notes on his presentation further down this post.

(If you’re interested, have a look at yesterday’s blog post on the first day of this workshop: The more things change, the more they stay the same.)

Half a day was a perfect amount of time to get the ball rolling, but we could have talked all day and into the next. Other workshops are planned for the coming months, and it will be very interesting to see what happens as things progress, both in person and via remote discussions.

Once again, for the time constrained among us, here are my favorite sentences from the presentations and discussions of the day:

  1. Dick Kitney: Synthetic biology is already important in industry, and if you want to work with major industrial companies, you need to get acceptance for your standards, making the existing standard (DICOM) very relevant to what we do here.
  2. Matthew Pocock: Divide your nascent standard into a continuum of uniqueness, from the components of your standard which are completely unique to your field, through to those which are important but have overlap with a few other related fields , and finally to the components which are integral to the standard but which are also almost completely generic.
  3. Discussion 1: Modelling for the purposes of design is very different from modelling for the purposes of analysis and explanation of existing biology.
  4. Discussion 2: I learnt that, just as in every other field I’ve been involved in, there are terms in synthetic biology so overloaded with meaning (for example, “part”) it is better to use a new word when you want to add those concepts to an ontology or controlled vocabulary.

Dick Kitney – Imperial College London: “Systematic Design and Standards in Synthetic Biology”

Dick Kitney discussed how SynBIS, a synthetic biology web-based information system with an integrated BioCAD and modelling suite, was developed and how it is currently used. There are three parts to the CAD in SynBIS: DNA assembly, characterization, and chassis (data for SynBIS). They are using automation in the lab as much as possible. With BioCAD, you can use a parallel strategy for both computer modelling and the synthetic biology itself.

With SynBIS, you can get inputs from other systems as well as part descriptions, models and model data from internal sources. SynBIS has 4 layers: an Interface/HTML layer, a communication layer, an application layer and and a database layer.

Information can be structured into four types: the biological “continuum” (or the squishy stuff), modalities (experimental types, standards relating to such), (sorry – missed this one), and ontologies. SynBIS incorporates the DICOM standard for their biological information. DICOM can be used and modified to store/send parts and associated metadata, related images, and related/collected data. They are interested in DICOM because of the industrialization of synthetic biology. Most major industries and companies already use the DICOM standard. If you want to work with major industrial companies, you need to get acceptance for your standards, making DICOM very important. The large number of users of DICOM are a result of large amounts of effort going into the creation of this modular, modality-friendly standard.

Images are getting more and more important for synthetic biology. If you rely on GFP fluorescence, for example, then you need high levels of accuracy in order to replicate results. DICOM helps you do this. It isn’t just a file format, and includes transfer protocols etc. Each image in DICOM has its own metadata.

What are the downsides of DICOM? DICOM is very complex, and most academics might not have the resources to make use of it (it has a huge 3,000-page document). In actuality, however, it is a lot easier to use then you might think. There are libraries, viewers and standard packages that hide most of the complexity. What is the most popular use of DICOM right now? MRCT, ultrasound, light microscopy, lab data, and many other modalities. In a hospital, most machines’ outputs are compliant with DICOM.

As SBOL develops and expands, they plan to incorporate it into SynBIS.

Issues relating to the standard – Run by Matthew Pocock

The rest of the workshop was structured discussion on the practical aspects of building this standard. Matthew Pocock corralled us all and made sure we remained useful, and also provided the discussion points.

To start, Matt provided some background. What does he ponder when he thinks about standards? Adoption of the standard for one, and who your adopters might be. Such people would be both/either providers of data and/or consumers of data. Also, both machines and humans will interact with the standard. The standard should be easy-to-implement, with a low buy-in.

You need to think about copyright and licensing issues: who owns it, maintains it. Are people allowed to change it for their own or public use? Your standard needs to have a clearly-defined scope: you don’t want it to force you to think about what you’re not interested in. To do this, you should have a list of competency questions.

You want the standard to be orthogonal with other standards and compose into it any other related standards you wish to use but which don’t belong in your new standard. You should have a minimal level of compliance in order for your data to be accepted.

Finally, above all, users of your standard would like it to be lightweight and agile.

What are the technical areas that standards often cover? You should have domain-specific models of what you’re interested in (terminologies, ontologies, UML): essentially, what your data looks like. You also need to have a method of data persistence and protocols, e.g. how you write it down (format, XML, etc.). You also need to think about transport of the data, or how you move it about (SOAP, REST, etc.). Access has to be thought about as well, or how you query for some of the data (SQL, DAS, custom API, etc.).

Within synthetic biology, there is a continuum from incredibly generic, useful standards through to things that are absolutely unique to our (synthetic biology) use case, and then in between is stuff that’s really important, but which might be shared with some other areas such as systems biology. For example, LIMS, and generic metadata are completely generic and can be taken care of by things like Dublin Core. DNA sequence and features are important to synthetic biology, but are not unique to it. Synthetic biology’s peculiar constraints include things like a chassis. You could say that host is synonymous with chassis, but in fact they are completely different roles. Chassis is a term used to describe something very specific in synthetic biology.

Some fields relevant to synthetic biology: microscopy, all the ‘omics, genetic and metabolic engineering, bioinformatics.

Discussion 1

Consider the unique ↔ generic continuum: where do activities in the synthetic biology lifecycle lie on the diagram? What standards already exist for these? What standards are missing?

The notes that follow are a merge of the results from the two groups, but it may be an imperfect merge and as a consequence, there may be some overlap.

UNIQUE (to synthetic biology)

  • design (the composition of behaviour (rather than of DNA, for example)).
    • modelling a novel design is different than modelling for systems biology, which seeks to discover information about existing pathways and interactions
    • quantification for design
  • Desired behaviour: higher-level design, intention. I am of the opinion that other fields also have an intention when performing an experiment, which may or may not be realized during the course of an experiment. I may be wrong in this, however. And I don’t mean an expected outcome – that is something different again.
  • Device (reusable) / parts / components
  • Multi-component, multiple-stage assembly
    • biobricks
    • assembly and machine-automated characterization, experiments and protocols (some of this might be covered in more generic standards such as OBI)
  • Scale and scaling of design
  • engineering approaches
  • characterization
  • computational accessibility
  • positional information
  • metabolic load (burden)
  • evolutionary stability

IMPORTANT

  • modelling (from systems biology): some aspects of both types of modelling are common.
    • you use modelling tools in different ways when you are starting from a synbio viewpoint
    • SBML, CellML, BioPAX
  • module/motifs/components – reusable models
  • Biological interfaces (rips, pops)
  • parts catalogues
  • interactions between parts (and hosts)
  • sequence information
  • robustness to various conditions
  • scaling of production

GENERIC

  • Experimental (Data, Protocols)
    • OBI + FuGE
  • sequence and feature metadata
    • SO, GO
  • LIMS
  • success/performance metrics (comparison with specs)
  • manufacturing/production cost

Discussion 2

From the components of a synthetic biology standard identified in discusison 1, choose two and answer:

  • what data must be captured by the standard?
  • What existing standards should it leverage?
  • Where do the boundaries lie?

Parts and Devices

What data must be captured by the standard? Part/device definition/nomenclature, sequence data, type (enumerated list), relationships between parts (enumerated list / ontology), part aggregation (ordering and composition of nested parts), incompatibilities/contraindications (including range of hosts where the chassis is viable), part buffers and interfaces/Input/Output (as a sub-type of part), provenance, curation level. Any improvements (include what changes were made, and why they were made (e.g. mcherry with the linkers removed)); versioning information (version number, release notes, feature list, and known issues); equivalent parts which are customized for other chassis (codon optimization and usage, chassis-agnostic part); Provenance information including authorship, originating lab, and the date/age of the part (much covered by the SBOL-seq standard); the derivation of the part from other parts or other biological sequence databases, and a human- and machine-readable description of the derivation.

What existing standards? SBOL, DICOM, SO, EMBL, MIBBI

Boundaries: Device efficiency (only works in the biological contexts it’s been described in), chassis and its environment, related parts could be organized into part ‘families’ (perhaps use GO for some of this), also might be able to attach other quantitative information that could be common across some parts.

Characterization

We need to state the type of the device, and we would need a new specification for each type of device, e.g. a promoter is not a GFP. We need to know some measurement information such as statistics, experimental conditions required to record, lab, protocols. Another important value is whether or not you’re using a reference part or device. The context information would include the chassis, in vitro/in vivo, conditions, half-life, and interactions with other devices/hosts.

Please note that the notes/talks section of this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences Standards

The more things change, the more they stay the same

…also known as Day 1 of the BBSRC Synthetic Biology Standards Workshop at Newcastle University, and musings arising from the day’s experiences.

In my relatively short career (approximately 12 years – wait, how long?) in bioinformatics, I have been involved to a greater or lesser degree in a number of standards efforts. It started in 1999 at the EBI, where I worked on the production of the protein sequence database UniProt. Now, I’m working with systems biology data and beginning to look into synthetic biology. I’ve been involved in the development (or maintenance) of a standard syntax for protein sequence data; standardized biological investigation semantics and syntax; standardized content for genomics and metagenomics information; and standardized systems biology modelling and simulation semantics.

(Bear with me – the reason for this wander through memory lane becomes apparent soon.)

How many standards have you worked on? How can there be multiple standards, and why do we insist on creating new ones? Doesn’t the definition of a standard mean that we would only need one? Not exactly. Take the field of systems biology as an example. Some people are interested in describing a mathematical model, but have no need for storing either the details of how to simulate that model or the results of multiple simulation runs. These are logically separate activities, yet they fall within a single community (systems biology) and are broadly connected. A model is used in a simulation, which then produces results. So, when building a standard, you end up with the same separation: have one standard for the modelling, another for describing a simulation, and a third for structuring the results of a simulation. All that information does not need to be stored in a single location all the time. The separation becomes even more clear when you move across fields.

But this isn’t completely clear cut. Some types of information overlap within standards of a single domain and even among domains, and this is where it gets interesting. Not only do you need a single community talking to each other about standard ways of doing things, but you also need cross-community participation. Such efforts result in even more high-level standards which many different communities can utilize. This is where work such as OBI and FuGE sit: with such standards, you can describe virtually any experiment. The interconnectedness of standards is a whole job (or jobs) in itself – just look at the BioSharing and MIBBI projects. And sometimes standards that seem (at least mostly) orthogonal do share a common ground. Just today, Oliver Ruebenacker posted some thoughts on the biopax-discuss mailing list where he suggests that at least some of BioPAX and SBML share a common ground and might be usefully “COMBINE“d more formally (yes, I’d like to go to COMBINE; no, I don’t think I’ll be able to this year!). (Scroll down that thread for a response by Nicolas Le Novère as to why that isn’t necessarily correct.) So, orthogonality, or the extent to which two or more standards overlap, is sometimes a hard thing to determine.

So, what have I learnt? As always, we must be practical. We should try to develop an elegant solution, but it really, really should be one which is easy to use and intuitive to understand. It’s hard to get to that point, especially as I think that point is (and should be) a moving target. From my perspective, group standards begin with islands of initial research in a field, which then gradually develop into a nascent community. As a field evolves, ‘just-enough’ strategies for storing and structuring data become ‘nowhere-near-enough’. Communication with your peers becomes more and more important, and it becomes imperative that standards are developed.

This may sound obvious, but the practicalities of creating a community standard means such work requires a large amount of effort and continued goodwill. Even with the best of intentions, with every participant working towards the same goal, it can take months – or years – of meetings, document revisions and conference calls to hash out a working standard. This isn’t necessarily a bad thing, though. All voices do need to be heard, and you cannot have a viable standard without input from the community you are creating that standard for. You can have the best structure or semantics in the world, but if it’s been developed without the input of others, you’ll find people strangely reluctant to use it.

Every time I take part in a new standard, I see others like me who have themselves been involved in the creation of standards. It’s refreshing and encouraging. Hopefully the time it takes to create standards will drop as the science community as a whole gets more used to the idea. When I started, the only real standards in biological data (at least that I had heard of) were the structures defined by SWISS-PROT and EMBL/GenBank/DDBJ. By the time I left the EBI in 2006, I could have given you a list a foot long (GO, PSI, and many others), and that list continues to grow. Community engagement and cross-community discussions continue to be popular.

In this context, I can now add synthetic biology standards to my list of standards I’ve been involved in. And, as much as I’ve seen new communities and new standards, I’ve also seen a large overlap in the standardization efforts and an even greater willingness for lots of different researchers to work together, even taking into account the sometimes violent disagreements I’ve witnessed! The more things change, the more they stay the same…

At this stage, it is just a limited involvement, but the BBSRC Synthetic Biology Standards Workshop I’m involved in today and tomorrow is a good place to start with synthetic biology. I describe most of today’s talks in this post, and will continue with another blog post tomorrow. Enjoy!

For those with less time, here is a single sentence for each talk that most resounded with me:

  1. Mike Cooling: Emphasising the ‘re’ in reusable, and make it easier to build and understand large models from reusable components.
  2. Neil Wipat: For a standard to be useful, it must be computationally amenable as well as useful for humans.
  3. Herbert Sauro: Currently there is no formal ontology for synthetic biology, but one will need to be developed.

This meeting is organized by Jen Hallinan and Neil Wipat of Newcastle University. Its purpose is to set up key relationships in the synthetic biology community to aid the development of a standard for that community. Today, I listened to talks by Mike Cooling, Neil Wipat, and Herbert Sauro. I was – unfortunately – unable to be present for the last couple of talks, but will be around again for the second – and final – day of the workshop tomorrow.

Mike Cooling – Bioengineering Institute Auckland, New Zealand

Mike uses CellML (it’s made where he works, but that’s not the only reason…) in his work with systems and synthetic biology models. Among other things, it wraps MathML and partitions the maths, variables and units into reusable pieces. Although many of the parts seem domain specific, CellML itself is actually not domain specific. Further, unlike other modelling languages such as SBML, components in CellML are reusable and can be imported into other models. (Yes, a new package called comp in SBML Level 3 is being created to allow the importing of models into other models, but it isn’t mature – yet.)

How are models stored? There is the CellML repository, but what is out there for synthetic biology? The Registry of Standard Biological Parts was available, but only described physical parts. Therefore they created a Registry of Standard Virtual Parts (SVPs) to complement the original registry. This was developed as a group effort with a number of people including Neil Wipat and Goksel Misirli at Newcastle University.

They start with template mathematical structures (which are little parts of CellML), and then use the import functionality available as part of CellML to combine the templates into larger physical things/processes (‘SVPs’) and ultimately to combine things into system models.

They extended the CellMLRepository to hold the resulting larger multi-file models, which included adding a method of distributed version control and allow the sharing of models between projects through embedded workspaces.

What can these pieces be used for? Some of this work included the creation of a CellML model of the biology represented in Levskaya et al. 2005 and deposit all of the pieces of the model in the CellML repository. Another example is a model he’s working on about shear stress and multi-scale modelling for aneurysms.

Modules are being used and are growing in number, which is great, but he wants to concentrate more at the moment on the ‘re’ of the reusable goal, and make it easier to build and understand large models from reusable components. Some of the integrated services he’d like to have: search and retrieval, (semi-automated) visualization, semantically-meaningful metadata and annotations, and semi-automated composition.

All this work above converges on the importance of metadata. With the CellML Metadata Framework 1.0, not many used it. With version 2.0 they have developed a core specification with is very simple and then provide many additional satellite specifications. For example, there is a biological information satellite, where you use the biomodels qualifiers as relationships between your data and MIRIAM URNs. The main challenge is to find a database that is at the right level of abstraction (e.g. canonical forms of your concept of interest).

Neil Wipat – Newcastle University

Please note Neil Wipat is my PhD supervisor.

Speaking about data standards, tool interoperability, data integration and synthetic biology, a.k.a “Why we need standards”. They would like to promote interoperability and data exchange between their own tools (important!) as well as other tools. They’d also like to facilitate data integration to inform the design of biological systems both from a manual designer’s perspective and from the POV of what is necessary for computational tool use. They’d also like to enable the iterative exchange of data and experimental protocols in the synthetic biology life cycle.

A description of some of the tools developed in Neil’s group (and elsewhere) exemplify the differences in data structures present within synthetic biology. BacilloBricks was created to help get, filter and understand the information from the MIT registry of standard parts. They also created the Repository of Standard Virtual Biological Parts. This SVP repository was then extended with parts from Bacillus and was extended to make use of SBML as well as CellML. This project is called BacilloBricks Virtual. All of these tools use different formats.

It’s great having a database of SVPs, but you need a way of accessing and utilizing the database. Hallinan and Wipat have started a collaboration with Microsoft Research with the people who created a programming language for genetic engineering of living cells called the genetic engineering of cells (GEC) simulator. Some work a summer student did created a GEC compiler for SVPs from BacilloBricks virtual. Goksel has also created the MoSeC system where you can automatically go from a model to a graph to a EMBL file.

They also have BacillusRegNet, which is an information repository about transcription factors for Bacillus spp. It is also a source of orthogonal transcription factors for use in B. subtilis and Geobacillus. Again, it is very important to allow these tools to communicate efficiently.

The data warehouse they’re using is ONDEX. They feed information from the ONDEX data store to the biological parts database. ONDEX was created for systems biology to combine large experimental datasets. ONDEX views everything as a network, and is therefore a graph-based data warehouse. ONDEX has a “mini-ontology” to describe the nodes and edges within it, which makes querying the data (and understanding how the data is structured) much easier. However, it doesn’t include any information about the synthetic biology side of things. Ultimately, they’d like an integrated knowledgebase using ONDEX to provide information about biological virtual parts. Therefore they need a rich data model for synthetic biology data integration (perhaps including an RDF triplestore).

Interoperabiligy, Design and Automation: why we need standards.

Requirement 1. There needs to be interoperability and data exchange among these tools as well as among these tools and other external tools. Requirement 2. Standards for data integration aid the design of synthetic systems. The format must be both computationally amenable and useful for humans. Requirement 3. Automation of the design and characterization of synthetic systems, and this also requires standards.

The requirements of synthetic biology research labs such as Neil Wipat’s make it clear that standards are needed.

KEYNOTE: Herbert Sauro – University of Washington, US

Herbert Sauro described the developing community within synthetic biology, the work on standards that has already begun, and the Synthetic Biology Open Language (SBOL).

He asks us to remember that Synthetic Biology is not biology – it’s engineering! Beware of sending synthetic biology grant proposals to a biology panel! It is a workflow of design-build-test. He’s mainly interested in the bit between building and testing, where verification and debugging happens.

What’s so important about standards? It’s critical in engineering, where if increases productivity and lowers costs. In order to identify the requirement you must describe a need. There is one immediate need: store everything you need to reconstruct an experiment within a paper (for more on this see the Nature Biotech paper by Peccoud et al. 2011: Essential information for synthetic DNA sequences). Currently, it’s almost impossible to reconstruct a synthetic biology experiment from a paper.

There are many areas requiring standards to support the synthetic biology workflow: assembly, design, distributed repositories, laboratory parts management, and simulation/analysis. From a practical POV, the standards effort needs to allow researchers to electronically exchange designs with round tripping, and much more.

The standardization effort for synthetic biology began with a grant from Microsoft in 2008 and the first meeting was in Seattle. The first draft proposal was called PoBoL but was renamed to SBOL. It is a largely unfunded project. In this way, it is very similar to other standardization projects such as OBI.

DARPA mandated 2 weeks ago that all projects funded from Living Foundries must use SBOL.

SBOL is involved in the specification, design and build part of the synthetic biology life cycle (but not in the analysis stage). There are a lot of tools and information resources in the community where communication is desperately needed.

SBOL Semantic, SBOL Visual, and SBOL Script. SBOL Semantic is the one that’s going to be doing all of the exchange between people and tools. SBOL Visual is a controlled vocabulary and symbols for sequence features.

Have you been able to learn anything from SBML/SBGN, as you have a foot in both worlds? SBGN doesn’t address any of the genetic side, and is pretty complicated. You ideally want a very minimalistic design. SBOL semantic is written in UML and is relatively small, though has taken three years to get to this point. But you need host context above and beyond what’s modelled in SBOL Semantic. Without it, you cannot recreate the experiment.

Feature types such as operator sites, promoter sites, terminators, restriction sites etc can go into the sequence ontology (SO). The SO people are quite happy to add these things into their ontology.

SBOLr is a web front end for a knowledgebase of standard biological parts that they used for testing (not publicly accessible yet). TinkerCell is a drag and drop CAD tool for design and simulation. There is a lot of semantic information underneath to determine what is/isn’t possible, though there is no formal ontology. However, you can semantically-annotate all parts within TinkerCell, allowing the plugins to interpret a given design. A TinkerCell model can be composed of sub-models. Makes it easy to swap in new bits of models to see what happens.

WikiDust is a TinkerCell plugin written in Python which searches SBPkb for design components, and ultimately uploads them to a wiki. LibSBOLj is a library for developers to help them connect software to SBOL.

The physical and host context must be modelled to make all of this useful. By using semantic web standards, SBOL becomes extensible.

Currently there is no formal ontology for synthetic biology but one will need to be developed.

Please note that the notes/talks section of this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences

Special Session 4: Adam Arkin on Synthetic Biology (ISMB 2009)

Running the Net: Finding and Employing the OPerating Principles of Cellular Systems
Adam Arkin
Part of the Advances and Challenges in Computational Biology, hosted by PLoS Computational Biology

The need for scientific standards and cooperation. Very much data driven in synthetic biology. We’ve been genetic engineering since the dawn of agriculture (teosinte, cows etc). And with dogs, which started around 10,000 years ago. Then the extremely different breeds we have today. That such differences would cause survival effects in the “wild” doesn’t bother many people. Next is the classic example of the cane toad, which destroyed environmental diversity.

Synthetic biology is dedicated to making the engineering of new complex functionsin cells vastly more transparent, and that openness is a really important part. It is trying to find solutions to problems in health, energy, environment, and security.

How can we reduce the time and improve the reliability of biosynthesis? Engineering is all about well-characterized, standard parts and devices. You need standards in parts, protocols, repositories, registries, publications, data and metadata. This helps a lot when you have groups and need to perform coordinated sciences: linux is an example of this working. But is design scalable? While applications will always have application-specific parts, there are sets of functions common or probable in all applications.

You can have structures that regulate most parts of gene expression. In talking about probability of elongation, they use an antisense-RNA-mediated transcription attenuator, which has a recognition motif, a possible terminator, and a coding sequence. Through a series of steps, if a antisense RNA absent, then you get transcription (and the opposite is true too): this is a NOT gate. For transcriptional attenuators, it is possible to design orthogonal mutant lock-key mechanisms. You can obtain orthogonal pairs by rational design but there is a certain attenuation loss. They can’t explain everything about the functioning of these devices. Want to improve communication in this respect. If you put two attenuators on the same transcript, it behaves about as you expect: a NOT-OR gate.

Bacteria engineered as pathogens to target particular human tissue (e.g. tumors). To do that, you have to build many different modules with its own computational and culure unit tests. These different modules/models can be re-used, e.g. in the iGEM competition. The problem is that the complexity of the engineering problem is greatly increased beyond that found in chemical production / bioreactors.

Absolute requirements: openness, transparency, standards, team-science approaches.

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences

Keynote: Towards Scalable Synthetic Biology and Engineering Beyond the Bioreactor (BioSysBio 2009)

Adam Arkin
UC Berkeley

People have been doing "Old School" synbio for a long time, of course: take corn (which came from Teosinte), dogs. But is selective breeding actually equivalent, in some sense, to "old school" synthetic biology? He argues that they are like synbio because they are human-designed. He further argues that the main difference is that in synbio, you know what you're doing. Non-synthetic biology: artifical introduction of cane toads in Australia, which is a gigantic mess. His point is that the biggest threat to biodiversity and human health is general things that already exist.

So the point of synbio is that it could make things more transparent, efficient, reliable, predictable and safe. How can we reduce the time and improve the reliability of biosynthesis? standardized parts, CAD, methods for quickly assembling parts, etc. But is design scalable? Applications will always have application-specific parts, but there are sets of function common or probable in all applications.

Transcriptional Logics. Why RNA transcripts? There are lots of different shapes, it avoids promoter limitations (physical homogeneity), and many are governed by Watson-Crick base pairing (and therefore designable). You can put multiple attenuators in series. You can also put different antisenses together to make different logic gates.

Protein Logics: Increasing flux through a biosynthetic pathway. Different activities of various enzymes – different turnovers. Loss of substrate through runoff to other pathways. Solution: build a scaffold tolocalize the enzymes and substrates (import from eukaryotes). Then he spent some time describing recombinases and invertase dynamics.

Evolved systems are complex and subtle. Synbio organisms need to deal with the same uncertainity and competition as the existing organisms. Spent some time talking about treating cancer with bacteria. Why do bacteria grow preferentially in tumors? Better nutrient concentrations, reduced immune surveillance, differential growth rates, and differential clearance rates. In humans, the bacteria that have been tried are pathogens, which make you sick, and you needs LOADS of it in the body. There is one that's used for bladder cancer, and has an 85% success rate.

Wednesday Session 3
http://friendfeed.com/rooms/biosysbio
http://conferences.theiet.org/biosysbio

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. I'm happy to correct any errors you may spot – just let me know!

Read and post comments |
Send to a friend

original

Categories
Meetings & Conferences Software and Tools

Building a New Biology (BioSysBio 2009)

Drew Endy
Stanford University, and BioBricks Foundation

Overview: Puzzle related to SB and informing some of his engineering work. Then a ramble through the science of genetics. Last part is a debrief on BioBrick public agreements.

Part 1. If SB is going to scale, we really need to think about the underlying "physics engine", you could do worse than look to Gillespie's work on a well-mixed system. This underlies much of the stochastic systems that underly SB, such as the differentiation of stem cells. A lot of work is based on this idea. Another good system is phage lambda: a phage infects a cell, leading to two outcomes: lysogen + dormancy, or lysing of the cell. If you infect 100 cells with exactly 1 phage molecule each, you get a distribution of behaviour. How is the physics working here? How does an individual cell decide which fate is in store? About 10 years ago, A Arkin took this molecular biology and mapped it to a physics model. From this model it became clear how this variability arises. Can you predetermine what cell fate will occur before lamba infects it? Endy looked into this. They collected different types of cells: both tiny and large (e.g. with the latter, about to divide and with the former just after division). They then scored each cell for the different fates. In the tiny cells, lysogeny is favored 4 to 1, whereas in big cells, lysis is favored 4 to 1. In the end, this is a deterministic model. There might be some discrete transition where certain parts of the cell cycle favor certain fates. They found, however, that there was a continuous distribution of lysis/lysogeny. Further examination found that there was a third, mixed fate. This is that the cell divides before it decides what to do, and the daughter cells will then decide what to do.

They have looked at this process in time, and how it works at the single-cell level. N is a protein made almost immediately upon infection – its activity is not strongly coordinated with cell fate. Cll *is* strongly associated, however. Q protein also studied. In a small bacterium, 100 molecules of repressor are constrained more in the physical sense, so you need 400 of Cro to balance; while in a bigger bacterium there is more space and only 100 Cro are needed. However, this theory may not work as the things may take too long to be built.

Part 2. How much DNA is there on earth? Well, it must be finite. he's not sure about these numbers1E10 tons bacteria (5% DNA)… 5E35 bp on the planet. How long would it take us to sequence it? A conservative estimate – and a little out of date – is about 5E23 months – one mole of months! If current trends hold, a typical RO1 (grant) in 2090 could have: sequence all DNA on earth in the first month of project. 🙂

If there is a finite amount of dna on the planet, could we finish the science of genetics or SB? If true, could we then finish early? Is genetics bounded? Well, if these three things hold true, perhaps yes: genomes have finite lengths; Fixation of rates of mutants in poopulations are finite; Atrophy rates of functional genetic elements are > 0.

Is the underlying math equal to perturbation design? Take the bacteriophage T7 (references a 1969 paper about it from Virology): in that, 19 genes have been identified by isolating the mutants and expect 10 more. By 1989 the sequence came out, and there were acutally 50 genes. So, mutagenesis and screening only got some of the genes. About 40% of the elements didn't have a function assigned.

Could a biologist fix a radio? Endy's question is: could an engineer fix an evolved radio (see Koza et al.)?

Part 3. Who owns BioFAB? What legal things do we need to do for BioBricks? Patents are slow and expensive, copyright is cheap but does not apply, and various other things have other problems. Therefore they have drafted the BioBrick Public Agreements document. He then showed the actual early draft document. They're trying to create a commons of free parts. Open Technology Platform for BioBricks.

Personal Comments: Best statement from Endy: "Really intelligent design would have documentation." (Not sure if it is his statement, or attributed to someone else).

Wednesday Session 3
http://friendfeed.com/rooms/biosysbio
http://conferences.theiet.org/biosysbio

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. I'm happy to correct any errors you may spot – just let me know!

Read and post comments |
Send to a friend

original

Categories
Meetings & Conferences

Programming RNA Devices to Control Cellular Information Processing (BioSysBio 2009)

C Smolke
Caltech

This talk is more focused on synbio. There are many natural chemicals and materials with useful properties, and it would be great to be able to do things with them. Examples are taxol from pacific yew, codeine and morphine from opium poppies, and butanol from clostridium, spider silk and abalone shell, and the rubber tree. It is much more efficient to get these useful chemicals grown inside a bacterium rather than its natural source. These microbial factories are a useful application area for synbio. Similarly, intelligent therapeutics is another application area for synbio. In IT, two biomarkers together would (via other steps) produce a programmed output. You could link these programs to biosensors, or perform metabolic reprogramming, performed programmed growth and more. The ultimate goal is to be able to engineer systems. These systems generally need to interface with their environment.

Synbio *also* has circuitry, sensors and actuators, just like more traditional forms of engineering has. Foundational technologies (synthesis) -> Engineering Frameworks (standardization and composition) -> Engineered Biological Systems (environment, health and medicine). An information processing control (IPC) molecule would have three functions, as mentioned earlier: sensor, computation (process information from sensor and regulate activity of the actuator), and actuator. There are variety of inputs for sensor (small molecules, proteins, RNA, DNA, metal ions, temperature, pH, etc). The actuator could link to various mechanisms like transcription, translation, degradation, splicing, enzyme activity, complex formation, etc. Key engineering properties to think about are scalability, portability, utility, composability, and reliability.

What type of substrate should we build this IPC systems on? What about RNA synthetic biology? You'd go from RNA parts -> RNA devices -> engineered systems. Experimental frameworks provide general rules for assembling the parts into higher order devices. Then you organize devices into systems, which use in silico design frameworks for programming quantitative device performance. Why RNA? The biology of functional RNAs is one reason: noncoding regulatory RNA pathways are very useful. You can also have RNA sensor elements (aptamers), which bind a wide range of ligands with high specificity and affinity. Thirdly, RNA is a very programmable molecule.

They've developed a number of modular frameworks for assembling RNA devices, and she then gave a good explanation of one of them. In this explanation, she mentions that the transmitter can be modified to achieve desired gate function. The remaining nodes (or points of integration) can be used to assemble devices that exhibit desired information processing operations. A sensor + transmitter + actuator = device. The transmitter component for a buffer gate works via competitive binding between two strands. As the input increases in the cell a particular conformation is favored and gene expression is turned on. An inverter gate is the exact opposite. They wanted to make sure these sorts of frameworks are modular. They can do this by using a different receptor for the sensor to make it responsive to a different molecule.

You can also build higher-order information processing devices using these simpler modular devices. For instance, you might want to separate a gradient of an input signal into discrete parts. Another example would be the processing of multiple inputs, or cooperativity of the inputs.

The first architecture they proposed (SI 1): signal integration within the 3' UTR – multiple devices in series. They can build AND and NOR gates, as well as bandpass signal filters and others. In the output signal filter device, devices result in shifts in basal expression levels and output swing. Independent function is supported by matches to predicted values – the two devices linked in tandem are acting independently.

SI 2: a different type of architecture where signal integration is being performed at a single ribozyme core through both stems. You can make a NAND gate by coupling two inverter gates.

SI 3: Two sensor transmitter components are coupled onto a single ribozyme stem. This allows them to work in series. You can perform signal gain (cooperativity) as well as some gate types. With cooperativity, input A will modulate the second component which allows a second input A to bind to the second component.

Modularity of the actuator domain: using an shRNA switch – this exhibits similar properties to the ribozyme device.

How do we take these components and put them into real applications? One application is immune system therapies, where RNA-based systems offer teh potential for tight, programmable regulation over target protein levels. She had a really nice example of how she used a series of ribozymes to tune t-cell proliferation with RNA signal filters. After you get the right response, you need to create stable cell lines. Showed this working in mice.

Personal Comments: A very clear, very interesting talk on her work. Thanks very much!

Wednesday Session 1
http://friendfeed.com/rooms/biosysbio
http://conferences.theiet.org/biosysbio

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. I'm happy to correct any errors you may spot – just let me know!

Read and post comments |
Send to a friend

original

Categories
CISBAN Meetings & Conferences

CSBE Symposium Day 2: From Systematic to Synthetic Biology

Notes CSBE Symposium Day 2: From Systematic to Synthetic Biology
September 5, 2007

The last day of the symposium was also very good. My notes aren’t as long this time, which may or may not be a good thing, depending on your point of view.

Today was a half-day with the last two talks containing information not really in my discipline. This meant that I didn’t follow them as well as the others, and therefore my notes aren’t much use. However, I have included the names of the authors and titles of the talks as indicators of what was discussed.

Jussi Taipale, University of Helsinki

“Systems Biology of Cancer”

How do growth factors and oncogenes regulate cell proliferation? Questions include:
+ Multicellularity: how is cell cycle regulation integrated with signals and transcriptional networks controlling differentiation?
+ organ-specific growth control
+ specificity of oncogenes to particular tissues/tumor types

Many oncogenes regulate the same processes. Cancer is a highly multigenic disease. There are only a few pheontypes to cancer. The main ones are unrestricted growth, invasion of other organs, metastasis. ~350 genes controlling essentially 3 phenotypes. They use computational (prediction of targets of oncogenic TFs) and experimental (expression profiling of cancers with known mutations) methods to identify transcriptional targets of oncogenic signalling pathways. They needed to determine the affinity of all single-base mismatch oligos for all three GLI TFs. Very often the highest-affinity is known, but not the lower-affinity sites.

Regulatory SNPS (rSNPS): placed all known SNPs into human genome and aligned against mouse to discover the impact of SNPs on binding sites and regulatory areas. rSNPs are thought to explain much of individual variation in the human population, and thus are likely to contribute to predisposition to diseases such as cancer. Application of EEL to prediction of regulatory SNPs. Initial analysis against HAPMAP data looks promising, however other data sets need to be done to confirm results.

Also, they look at transcriptional circuits regulating TFs. For screening, they initially started with flow cytometry analysis looking at Drosophila S2 cells as they have similar cell cylces to human. They found that DNA-content phenotypes are detectable with flow cytometry. They also did genome-wide pooling to analyze functional redundancy: the closest homologues for all drosophila proteins were identified using BLASTP. It doesn’t look like there’s much redundancy.

+ Systems biology of the metazoan cell cycle
They have id’ed approx 600 genes which affect the cell cycle in S2 cells. They get an 80% hit rate of known strong effectors based on alaysis of 19 different protein complexes and pathways. Approx 650 genes have been cloned to Gateway vectors for the analysis of overexpression phenotype, enzyme-substrate relationships (half-life etc), PPIs (TAP-tag, fragment2hybrid), and subcellular localizations. They also did an analysis of the transcriptional network. The transcriptional analysis includes: identification of target genes of all TFs affecting the cell cycle (whole-genome profiling after RNAi of all TFs affecting cell cycle or cell size, and determination of binding specificities of the TFs followed by EEL analysis in Drosophila species), ID of pathways affecting the activities of the TFs (whole-genome profiling of all strong hits, and clustering), ID’ing of signalling inputs to cell cycle machinery and unstable proteins that are transcriptionally regulated.

Mark Bradley, University of Edinburgh

“High-throughput chemical biology”

+ Encoded Libraries
A way to interrogate 10000 molecules on a DNA microarray: 10000 peptide compounds and 10000 tags, attached to each other via a linker. The tags allow us to ID the compound its attached to, and makes it possible to deliver the compound to a specific location on a 2D DNA microarray. peptide attached to a linker, which is attached to a tag, which is attached to PNA, which can attach to the DNA on the microarray. It is better to have a PNA/DNA than DNA/DNA.
The peptides all contain a quencher and a fluorescein donor. When a protease comes along it will cleave the peptide and liberate the quencher and give us fluorescence.
They have a 10000-member FRET-based library. Then treat with protease (3d) and put onto a 2d microarray. This is a transformation of 10000 solution assays into a 2d microarray. These are high-density, clean, arrays made with an OGT custome DNA microarray. Every PNA has a preferential “home” to go to in the array. There are 22,500 oligos on the array for replicates plus 2,500 controls. DNA is printed in random locations by OGT (Agilent) and use BlueGnome software analysis. All binding duplicates are compared.
They display the data using 40 cube plots with 1000 peptides per cube with one position defined. xyz are three different amino acids with the 4th amino acid being fixed.
Peptide Arrays and Cell Binding: Have also started using this method to identify ligands for cells.

+ Cellular Chips and Polymer Manipulation
A polymer coating provides specificity for white blood cells when removing them using filters (Sepacell) from whole blood. They have a program to identify new bio-compatible polymers for topics like prevention of binding. One approach they like is ink-jet printing. They want to do the same thing but rather than 3 colors, they want to do it with polymers or monomers.

+ Microwell Array Technology: single-cell loading and transfection
You can get 4000 wells on a microscope slide. If you seed with about 10000 cells per mL, you get >85% of wells with one cell per well. You can then propogate within the wells.

+ Future Directions
encoded proteomes for arraying all proteins; peptide arrays via inkjet printing; and more.

Jamal Tazi, CNRS, Montpellier

“Small molecule screens for splicing inhibitors”

Paul Ko Ferrigno, Leeds Institute of Molecular Medicine

“Label-free protein microarrays for systems biology”

Read and post comments
|
Send to a friend

original

Categories
CISBAN Meetings & Conferences

CSBE Symposium Day 1: From Systematic to Synthetic Biology

I have been fortunate enough to be invited to go (as a CISBAN representative) to the two-day symposium on sytems and synthetic biology organized by CSBE in Edinburgh, UK for the 4-5 September.

Before I get to the nitty-gritty, here are my awards for….

…Most Fun Talk: Drew Endy, MIT. He gave his talk at the beginning of the first session, directly after Andrew Millar's introduction. The projector was still broken at this point, so he demonstrated some fantastic skills as a lecturer and did the entire thing on the blackboards. An interesting speaker well able to think on his feet!
…Best-Organized Slide Presentation: Angelika Amon, MIT. She is a fantastic speaker and had beautiful slides: no slide had more than one sentence on it, and she generally followed the formula 1) Ask a question 2) answer the question in one sentence 3) answer the question with pictures, generally with one slide per step. It was a beautiful thing. She also had a really interesting talk about aneuploidy, which definitely helped!

And no, I have no bias towards MIT – in fact I have no professional relationships to them at present – it just turned out that way!

Here are my notes from Day 1. Both me and my fellow CISBAN representative (hi Steve!) had a great time, and were extremely well-fed at lunchtime. Please note that these are my notes, and I may have not understood some things, and therefore made a mistake. Please let me know if there should be any corrections!

Notes CSBE Symposium Day 1: From Systematic to Synthetic Biology
September 4, 2007

Amdrew Millar, CSBE

He provided background into the development of CSBE and the main focus of their work. They have received £11 million in startup funding over 5 years, together with a £7 million grant from University of Edinburgh. This is primarily infrastructure funding, and they are currently working on getting grants for research work. In 2009 they will be moving into to the new building (Waddington Building) that is being built.

The research focus is not on a particular biological question, but instead on the process of systems modelling. It is very difficult for experimentalists to engage with SB and to transform their data into real models. Initially, there are three biological areas that will inform the larger systems modelling theme. All theoretical/informatics research will be integrated into the systems biology software infrastructure core (SBSI). Experimental projects will hang off this core as well, with the Kinetic Parameter Facility (KPF) included. The three projects are the RNA metabolism project (yeast), macrophage project (using human cell cultures), and the circadian clock project (arabidopsis). These projects are intentionally diverse. The wet lab projects differ in size/scale and in current levels of understanding.

Wet-lab biologists are generally neither rigourous or interested in providing kinetic parameters. However, the KPF will help resolve this problem. They are also working with new theoretical tools, e.g. ones that allow you to deal seamlessly with both discrete and stochastic models. In this, they're working with biological stochastic process algebra (PEPA adapted to create BioSPA). Network inference and network analysis are also important areas of research. To improve the interface between the experimental biologist and the netowrks, they are using and developing the Edinburgh Pathway Editor (EPE).

They also strongly feel that systems biology naturally leads to work in synthetic biology. For instance, to get a particular model tested, it may be necessary to create a synthetic system *just* containing the steps in the model to be tested. It is a biological test for a bioinformatics experiment.

Centres such as the CISBs have a particular role to play in playing the long-game, fostering community organizations and standards development and usage. They're planning extra collaboration with other centres within Europe.

Drew Endy, MIT:

"Synthetic Biology"

Drew is one of the organizers of the IGEN competition, and undergraduate competition in synthetic biology. In 1999, was looking and changing the genetic architecture of the phage he was working on, creating an autogene architecture where you get positive feedback. By doing this, he thought it would be a phage that grew faster than a wild type. However, the model wasn't right, as when the lab work was done, the growth wasn't as fast as the wild type. However, he had a problem publishing as his model didn't agree with his experiment. Eventually, got it published in PNAS. He thinks that perhaps natural systems haven't evolved to be optimized for modelling. *Really* Intelligent Design would have documentation! 🙂 And yet, we have no such insight yet.

So basically he works on trying to refactor natural biological systems to make them easier to model and manipulate. What should the theoretical "Dept of Biological Engineering" look like? The three lessons learned from engineering history are as follows: standards (to support reliable physical and functional composition, as a resulting product may end up with emebrgent properties), abstraction (borrowed primarily from computer science, allows you to implement much more powerful functions without having to bother with the nitty-gritty details: machine language = ATCG in this case), decoupling (separate complicated problems into simpler separate ones: in biology you could take as an example the automated construction of DNA).

Can you really make "biology" reliable? Alot of good process engineering would have to be done first. What other problems would be anticipated in making biology easier to engineer?
    + Noise: rest of the talk focuses on this.
        + Evolution

Combining synthetic biology and systems biology to try to combat noise. There has been, on average, one paper published per week addressing the concept of noise in biological data sets and in dealing with individual molecules in modelled reactions. Uses an example of a signalling pathway in yeast in a paper he contributed to, published in Nature: the experiment *actually* reports on the level of fluorescent protein levels, rather than in the values of the important proteins in the pathway – i.e. it's all indirect. What's interesting is not the variation in expression, but the fact that there's no observable phenotype for the vast majority of these. He appreciates the noise in biology, and the work gone into reporting it, but is not sure why its relevant. Is that noise really important to us?

Mentions a paper on the lambda (phage) vector from 1997 (published in 1998) by Adam Arkin, published in Genetics. How does phage lamda decide what to do? Two primary decisions are to either lyse the cell or to integrate its genetic material with that of the cell. In the Genetic Switch (book) he admits that there is no "perfect understanding" of what drives lambda to one over the other decision. The Arkin paper suggests that a model should be made of the lambda phage, and he isn't going to be content with a cartoon. Further, it will be discrete reaction events, not continuous. In cases of low multiplicity, you will almost always get lysis. If you infect with many cells, you almost always get chromosomal integration. There is also conditions where you can infect a genetically identical set of cells with 1 phage per cell, and you get a 50/50 split in the decision. Thus stochastic chemical kinetics is a relevant physical framework for describing the behaviour of the system. It's all driven by the noise = the stochastic chemical kinetics. Arkin makes explicit the fact that the molecular biology community did not find all the answers of the lambda phage, and also created a model, and also made a directly testable hypothesis. The lambda phage "rolls the dice": it doesn't know what to do when it
gets in the cell. This means the hypothesis is running blind, and doesn't know the state of the cell. However, what we know of lambda biology points to it actually knowing the state of the cell.

Therefore the alternate model (trying to make a deterministic model) is as follows: if you knew what to look for, you could theoretically segregate the genetically identical population of cells into two sub-populations. In other words, it's not stochastic at all, but determined by the state of the cell at the time of infection. What, then, is the physical barrier that you look for? He tried all sorts of things, including looking at the genetic architecture of the phage more carefully. The lambda C1 repressor protein binds in an antiparallel fashion with the pro protein to R1, R2, and R3. If repressor binds first, it will shut off binding of cell, and you get genetic integration. If the reverse, you'll get lysis. The repressor binds with more cooperativity in the R1-R3 region than pro does. In small cells, the repressor will be more successful as it binds with more cooperativity, and in larger cells the pro will be more fit. So the 3 independent variables are: the abundances of repressor and pro, and the volume of the cell. What they find is that the binding energies of the proteins are matched to the volume of the e coli cell at various stages in its life cycle. In a small cell, 100 C1 = 400 pro, while in a big cell, 100 C1 can be balanced with 100 pro.

So, he took samples of independently collected fractions based on width (one sample had "1", the other was "1.6"). So could plot probability of lysis and lysogeny. Average volume of fractions plotted against the % of lysogeny. In smaller cells it is about 80%, and then declines linearly with cell size to about 25% at the high volume size. The lysis happens in the opposite way. This means at the intermediate fraction (intermediate cell size) you still get a 50/50 split. So the 50/50 split could be noise, but it might also be the distribution based on cell sizes. From this, determine the critical volume, which is pretty close to the middle of the cell division cycle in e coli, at about 1.4.

So, behaviour of such natural systems may not be stochastic (he means "noisy"), but actually deterministic. Next step is to make the appropriate mutants that would remove the assymetry in the R1-R3 region. Also, can you determine if it's absolute versus relative volume? Well, in exponentially growing cells you get about the same slope of the line as before, but still unclear as there is such a range of results (as you can tell, didn't quite hear the whole answer).

Mike Tyers, University of Edinburgh

"Size control: a systems-level problem"

Focusing on the dissection of a growth-dependent switch element in budding yeast by binding GFP to Sic1. The protein is degraded by the ubiquitin system and its recognition depends on phosphorylation at multiple sites. Elimination of Sic1 allows the onset of a B-type cyclin CDK activity. Elements that control the cell cycle were highly enriched in this system. In the system of Cdc4 and Sic1 (recognized by Cdc4 in a phosphorylation-dependent manner). A threshold of G1 cyclin-CDK activity is required for Sic1 elimination. Most individual Sic1 CDK sites are not required for degradation in vivo. 6 of 9 CDK phosphorylation sites appear necessary for efficient Sic1 recognition by Cdc4. A SPOTS peptide array defines the Cdc4 Phospho-Degron (CPD). However, a single optimal CPD site is sufficient for Cdc4 binding and degradation in vivo. Then he showed a video of precocious elimination of Sic1(CPD). Why multi-site phosphorylation? For ultrasensitivity. Might electrostatic repulsion lower Cdc4 affintiy for natural CPD sites? Re-engineering Cdc4 to reduce the phosphorylation threshold was the next step they did. A sharp transition in affinity of Sic1 for Cdc4 was discovered, using surface plasmon resonance analysis. They also performed NMR analysis of the Sic1-Cdc4 interaction.

We understand the equilibrium engagement of Sic1 with Cdc4. It is tuneable, evolvable and adaptable, and is ultra-sensitive. 30-40% of the proteome contains disordered regions.
L. Landau: There are two kinds of models in this world: those that are trivial and those that are wrong. (Paraphrase of a quote).

Joerg Stelling, ETH, Zurich

"Analysis and synthesis of biological networks"

Alternative quote: All models are wrong, but some are useful.

The challenges of biological networks include complexity and uncertainty. Approaches for creating mathematical models include graph theory (topology), structural analysis (stoichiometry), and dynamic analysis (biochemistry). Use the right level of description to catch the phenomena of interest. Don't model bulldozers with quarks. Synthetic biology is a new dimension of biology engineering. It is a case of forward-engineering. It promotes the creation of standardized interfaces to biology/wet lab work. Derived characteristics of synthetic circuit performance should meet the following criteria: robustness, tunability, feasibility, and stability. An example he used was a Synthetic Time-Delay Circuit. You begin with a simple electrical engineering circuit for a time-delay function. The biological analogy is as follows: biotin as a chemical signal (input), covalent protein modification (rectifier), protein accumulation (buffer), protein degradation (resistor), genetic switch (switch) and protein production (output).

Complexity is due to side reactions / coupling between activating input, internal components and inactivating input. They made an ODE model of this system. You can fine-tune the circuit by using non-linear dependencies of performance characteristics on parameters, inputs and component features (protein stability). This helps identify targets for fine-tuning of circuit functions. There were discrepancies between model and experiment. Qualitatively it matches, quantitatively it did not.

Are there any possible shortcuts?

Structural analysis does not need kinetic parameters, but just the structure and stoichiometry of the network. Feinberg talks about chemical reaction network theory. Gatermann discusses algebraic geometry.

Angelika Amon, MIT

"Systematic analysis of aneuploidy"

Studying the mechanisms that control chromosomal segregation, and specifically what prevents mis-segregation from occurring. What actually happens to cells that end up with an extra chromosome? She's discussing the effects of aneuploidy on on cell growth and division in yeast and mouse. Two take-home messages:
+ Aneuploidy causes a proliferance of damage (is bad for the cell!).
+ Additionally, there are a set of phenotypes/consequences to aneuploidy that is independent of which is their extra chromosome.

Yeast:

Comparative genome hybridization analyses confirm the karyotype, and can show you the stretch of the genome that is present twice. They have about 20 strains that carry extra chromosomes, and they studied a variety of properties of these strains:

+ Cell cycle properties of the aneuploid strains: many aneuploid yeast strains are delayed in G1.
Cells disomic for chr 13 are delayed in cell cycle entry. Budding and DNA replication are both delayed by about 15 minutes. A large number of the disome strains had their delays in the G1 phase. There appears to be a correlation with the amount of extra DNA present in the cells does contribute to the length of the G1 delay. Doesn't seem to be the only factor, but still an important one. All aneuploidy strains seem to have growth defects via problems with the G1-S transition, and it occurs upstream of the Cln/CDK pathways.

+ An aneuploidy signature: all such strains shared a common gene expression pattern.
This pattern is not seen when you just grow up the strains normally (the extra transcription for the disomal chromosome will mask any patterns of similarity between strains). So, y
ou have to correct for the extra transcription. The pattern is seen in two clusters of genes, one up- and the other down-regulated (via a Phosphate-limiting experiment). in rRNA transcription and processing genes, which are all upregulated. Don't know what the significance of this cluster is yet. The downregulated genes seem to have something to do with amino acid metabolism.

+ Metabolic properties of aneuploid cells: Aneuploid strains stop dividing at a lower OD.
Something in the media is something that the aneuploidy cells need to grow more than the WT cells. They checked glucose amounts, and found that these strains take up more glucose and have a lower biomass per unit of glucose than the WT. All aneuploid strains show an amplification of two glucose transporters, HSD6 HSD7. They tried to knock out these genes, though, and it didn't make a difference, so still some work to do.

+ What is the extra glucose needed for?: Most genes on the additional chromosome are made into proteins (e.g. more than 90% when chr 2 is the extra one).
Cells devote about 60% of their chemical energy to making proteins. So, check to see if the extra proteins are actually produced, or just transcribed. 3 of 16 proteins analyzed show an increase in protein levels in accordance with RNA levels. 13 of 16 did not show a corresponding increase in their levels. Rather than thinking that they are NOT transcribed, it seems that feedback mechanisms kick-in that re-create the right stoichiometries in the cell for those proteins and they are quickly degraded.

+ Most aneuploids are sensitive to drugs that interfere with rna synthesis, protein synthesis, proteosome; this raises the possibility that these extra proteins create imbalances in the cell which the cells try to fix. The pheonotypes of aneuploids should not show up in cells that have large amounts of DNA, but have very little that are normally translated. Tested this next..

+ What are the consequences of foreign (non-translated) DNA on yeast cells?: No cell cycle delays or sensitivity to conditions interfering with protein synthesis, folding and turnover.

In Summary, a set of phenotypes that is independent of the identity of additional chromosomes.
Hypothesis: cellular homeostasis is disruped in aneuploids due to the RNAs and proteins synthesized from them. Also some of the phenotypes shared by aneuploids may represent the cell's effort to re-establis homeostasis.

+ A strategy to isolate mutations that allow cells to tolerate aneuploidy: select mutations that improve the growth rates.
they evolved strains disomic for chr 5. Choose strains that (via CGH) showed that both copies of chr 5 are intact. Doubling time shortened (3.9 hours rather than 5.1 hours in the original disomic chr 5 strain). They are still sensitive to cycloheximide. However, they are significantly less temperature sensitive.
    + SNP analysis showed 4 mutations in the evolved strains, though they haven't checked if it is those 4 mutations that have caused the changes in phenotype.
        + truncation of ubiquitin-specific protease
        + point mutation in RAD3
        + point mutation in SNT1
        + promoter mutation in the putative ribosome associated factor YJL213W.

+ Analysis of aneuploid mouse cells: analyzed trisomy 1, 13, and 16 in mouse embryonic fibroblasts.
First, transcript arrays confirm genotype (trisomy). Trisomy 16 cells exhibit proliferation defects; for example, such cells are bigger (is it a increase in growth, or a decrease in apoptosis etc mechanisms?).

+ Analysis of Human Ts21 Cells (down's syndrome: foreskin fibroblasts, work from another group in 1979)
These cells are also bigger than the "WT" human cells.

+ There was also an increase in glutamine uptake by their mouse trisomy fibroblasts. They also seem to produce more lactate than WT cells. This indicates a shift from oxidative phosphorylation to glycolysis. This shift is often seen in primary cancer cells. Perhaps, the aneuploid state itself somehow contributes to this metabolic change (still wild speculation 😉 ).

+ Effects on immortalization: Immortalization is delayed/not occurring in Trisomy 16 cells.
Virtually every solid human tumor has its karyotype completely messed up. Having an extra copy of any of these 3 chrs inhibits immortalization. In WT fibroblasts, once immortalized they become neotetraploid. The Trisomy cell lines seemed much different – perhaps hexaploid. Interested in continuing to look at growth and immortalized properties of these trisomy cells. Immortalization in itself does not cause a switch from oxidative phosphorylation to glycolysis (aka does not cause an increase in lactate production).

Final Summary:
+ Aneuploidy causes a proliferation disadvantage
+ Loss of a tumor suppressor gene or gain of an oncogene comes with baggage: a whole extra chromosome.
+ During transformation (immortalization), such a disadvantage needs to be overcome.

Hans Lehrach, MPI of Molecular Genetics

"Vertebrate genomics"

Discussed the work required to finish the euchromatic region of the human chromosome.
Can we understand why certain genes are expressed? Try to understand gene regulation by systematically knocking-down TFs using RNAi. Project is progressing rapidly. Have selected 200 human TFs to work with that have endogenous expression in human cell lines. Looking at, among others, the effects on Chr21 due to its effects when trisomic (i.e Down's Syndrome).

Doing a pilot study of a monte-carlo simulation of large metabolic networks.
ConsensusPathDB.

Christina Smolke, California Institute of Technology

"Biomolecular engineering and Riboswitches"

Working on engineering a scaleable communication and control system, namely a sensor-actuator control system in biological networks. Such a system would be comprised of an actuator, an information transmission and a sensor (aptamer) element. Can have either an open or closed loop system. Closed loop systems are often seen in pathways that include feedback processes.

Synthetic riboswitch engineering: 3 methods in recent years:
+ trial and error integration of an aptamer into target places of a transcript
+ direct coupling of a regulatory element to an aptamer
+ screening randomized linker sequences for switching activity

They have been attempting to build a framework for building such a system. These include the creation of composition rules of ribozyme-based regulatory systems. Design strategies should support
+ portability and reliable functional composition,
+ integrated RNA component systems (e.g. instead of loop replacement – where one of two loops is replaced eliminating tertiary interactions – with "direct coupling", where the function of the regulatory element is maintained)
+ reliable functional switch composition
+ programmable ribozyme switch platforms

Design strategies for synthetic regulatory ribozymes (necessary for a universal ribozyme switch platform).

Have built an "on" and "off" switch to up- and down-regulate gene expression.
applications:
+ non-invasive concentration measurement
+ integrating RNA devices with survival genes
+ integrating RNA devices for programming cell behaviour (e.g. engineering T-cell proliferation)

David Baulcombe, University of Cambridge

"Small RNA silencing networks in plants"

+ RNA silencing as an antiviral defense
As the virus replicated and accumulating, this RNA-based system would be activating, slowing the accumulation of the virus. However, it isn't *quite* and virus defense system. Rather than just defense, it is really a virus regulatory system. Yes, it is used to defend the plant, but the virus "uses i
t" to prevent it from damaging its host system.

+ The basics
ssRNA -> (via RNA-dependent RNA Polymerases RdRP(RDR)) -> dsRNA -> (via a dicer) -> 21+24nt RNA -> argonaute (AGO)/slicer enzymes use the short RNA as a guide to the enzyme's target -> nuclease action on target RNA
There is negative feedback in RNA silencing. The cis-target RNA inhibits the creation of ssRNA.

+ Silencing spreads in two senses:
    + silencing can move beyond the originating cell
    + silencing can also spread along the gene that is the target: the *effective* silencing eventually involved the whole of the transcribed sequence, in both directions (3'<->5')

A primary siRNA is recruited by an AGO protein. The cleaved target then becomes the target for the RdRP. Once you have the secondary siRNAs, they can take over what was done by the primary siRNA. This is why the whole process can be maintained, and the primary siRNA is only needed transiently. This means there is an epigenetic process, i.e. completely independent of the DNA. There is also an amplification process (one initial siRNA -> many produced siRNAs).

Is siRNA a transcription mechanism? If so, there would be methylation of the target DNA. However, this is not a transcription mechanism, even though there *is* methylation. It is RNA virus-induced DNA methylation. Either directly or indirectly, there is some interaction between the viral RNA and the target DNA.

Did an experiment where one virus had GFP added and the other had a promoter sequence (the former gave posttranscriptional silencing and the latter gave transcriptional silencing). Studying the progeny shows a genetic imprint that persists through several generations, therefore RNA silencing can induce trans-generational effects.

+ screening for RNA silencing signal mutants
In theory the signal for silencing followed along the veins of the plant. Mutagenize these plants, and found some that lost the ability for silencing, and others had enhanced silencing ability. This means the amount of silencing is related to the negative feedback effect. If you knock-out the cis-targeting then you get increased silencing, or you could knock-out the trans-targeting effect which will reduce the silencing. (unclear to me which is the trans-targeting part of the pathway).

They did deep sequencing of Arabidopsis siRNA and miRNA. When you align the sequences of the sRNAs against the whole genome, the alignment is not random: there are certain areas of the genome that have a propensity for producing the endogenous sRNAs (== siRNAs and miRNAs). A minimum of 1% of the arabidopsis genome has the potential to generate siRNA. siRNAs: 1 siRNA = many siRNA, miRNAs: made from precursor molecules that can fold back on themselves, causing 1 precursor = 1 miRNA. In his opinion this difference is not profound, and is essentially moot.

tasiRNA: transacting small interfering RNAs.

We think of the following types of RNAs: initiators (foldback RNAs that form miRNAs, or transcribed on both strands; a perfect match to sequenced small RNA; sRNA loaded into AGO slicer complex), node RNAs (like the secondary siRNAs), and end point RNAs (perfect or imperfect match to siRNAs, no evidence for dsRNA, e.g. micro RNA or tasiRNA).

+ computational assembly of sRNA networks.
These networks are large and have non-random characteristics. Many nodes have a very low degree of connectivity (lower than expected with random networks), and there are a very few that are highly connected. They are now working on an empirical analysis to see if these networks do indeed exist.

+ if the networks exist, what could they be doing?
Influencing growth and development of the plant; influence epigenetic effects taking place during flowering, transition from juvenile to adult growth phases etc; heritable silencing by endogenous sRNA loci?
possibly: Altered expression of endogenous RNA -> novel rna directed DNA methylation and transcriptional silencing of target locus -> maintenance of impring through meiosis "heritable epimutation" -> natural selection -> meC to T transition by deamination, results in transformation if epimutation to mutation -> further rounds of natural selection
alternatively: because many sRNA loci are associated with genome repeats and transposons, the sRNAs from inverted repeats have the potential to affect mRNA and therefore the natural variation between species/strains.

+ plan to move into some work with chlamydomonas reinhardtii, which can be considered a model system for silencing. (green algae grown in liquid culture). It produces micro RNAs and siRNAs. Hope to use this organism to do a "truly" quantitative systems-level measurement experiment.

Read and post comments
|
Send to a friend

original