Archive

Posts Tagged ‘bbsrc’

What should you think about when you think about standards?

July 12, 2011 Leave a comment

The creation of a new standard is very exciting (yes, really). You can easily get caught up in the fun of the moment, and just start creating requirements and minimal checklists and formats and ontologies…. But what should you be thinking about when you start down this road? Today, the second and final day of the BBSRC Synthetic Biology Standards Workshop, was about discussing what parts of a synthetic biology standard are unique to that standard, and what can be drawn from other sources. And, ultimately, it was about reminding ourselves not to reinvent the wheel and not to require more information than the community was willing to provide.

Matthew Pocock had a great introduction into this topic when he summarized what he thinks about when he thinks about standards.  Make sure you don’t miss my notes on his presentation further down this post.

(If you’re interested, have a look at yesterday’s blog post on the first day of this workshop: The more things change, the more they stay the same.)

Half a day was a perfect amount of time to get the ball rolling, but we could have talked all day and into the next. Other workshops are planned for the coming months, and it will be very interesting to see what happens as things progress, both in person and via remote discussions.

Once again, for the time constrained among us, here are my favorite sentences from the presentations and discussions of the day:

  1. Dick Kitney: Synthetic biology is already important in industry, and if you want to work with major industrial companies, you need to get acceptance for your standards, making the existing standard (DICOM) very relevant to what we do here.
  2. Matthew Pocock: Divide your nascent standard into a continuum of uniqueness, from the components of your standard which are completely unique to your field, through to those which are important but have overlap with a few other related fields , and finally to the components which are integral to the standard but which are also almost completely generic.
  3. Discussion 1: Modelling for the purposes of design is very different from modelling for the purposes of analysis and explanation of existing biology.
  4. Discussion 2: I learnt that, just as in every other field I’ve been involved in, there are terms in synthetic biology so overloaded with meaning (for example, “part”) it is better to use a new word when you want to add those concepts to an ontology or controlled vocabulary.

Dick Kitney – Imperial College London: “Systematic Design and Standards in Synthetic Biology”

Dick Kitney discussed how SynBIS, a synthetic biology web-based information system with an integrated BioCAD and modelling suite, was developed and how it is currently used. There are three parts to the CAD in SynBIS: DNA assembly, characterization, and chassis (data for SynBIS). They are using automation in the lab as much as possible. With BioCAD, you can use a parallel strategy for both computer modelling and the synthetic biology itself.

With SynBIS, you can get inputs from other systems as well as part descriptions, models and model data from internal sources. SynBIS has 4 layers: an Interface/HTML layer, a communication layer, an application layer and and a database layer.

Information can be structured into four types: the biological “continuum” (or the squishy stuff), modalities (experimental types, standards relating to such), (sorry – missed this one), and ontologies. SynBIS incorporates the DICOM standard for their biological information. DICOM can be used and modified to store/send parts and associated metadata, related images, and related/collected data. They are interested in DICOM because of the industrialization of synthetic biology. Most major industries and companies already use the DICOM standard. If you want to work with major industrial companies, you need to get acceptance for your standards, making DICOM very important. The large number of users of DICOM are a result of large amounts of effort going into the creation of this modular, modality-friendly standard.

Images are getting more and more important for synthetic biology. If you rely on GFP fluorescence, for example, then you need high levels of accuracy in order to replicate results. DICOM helps you do this. It isn’t just a file format, and includes transfer protocols etc. Each image in DICOM has its own metadata.

What are the downsides of DICOM? DICOM is very complex, and most academics might not have the resources to make use of it (it has a huge 3,000-page document). In actuality, however, it is a lot easier to use then you might think. There are libraries, viewers and standard packages that hide most of the complexity. What is the most popular use of DICOM right now? MRCT, ultrasound, light microscopy, lab data, and many other modalities. In a hospital, most machines’ outputs are compliant with DICOM.

As SBOL develops and expands, they plan to incorporate it into SynBIS.

Issues relating to the standard – Run by Matthew Pocock

The rest of the workshop was structured discussion on the practical aspects of building this standard. Matthew Pocock corralled us all and made sure we remained useful, and also provided the discussion points.

To start, Matt provided some background. What does he ponder when he thinks about standards? Adoption of the standard for one, and who your adopters might be. Such people would be both/either providers of data and/or consumers of data. Also, both machines and humans will interact with the standard. The standard should be easy-to-implement, with a low buy-in.

You need to think about copyright and licensing issues: who owns it, maintains it. Are people allowed to change it for their own or public use? Your standard needs to have a clearly-defined scope: you don’t want it to force you to think about what you’re not interested in. To do this, you should have a list of competency questions.

You want the standard to be orthogonal with other standards and compose into it any other related standards you wish to use but which don’t belong in your new standard. You should have a minimal level of compliance in order for your data to be accepted.

Finally, above all, users of your standard would like it to be lightweight and agile.

What are the technical areas that standards often cover? You should have domain-specific models of what you’re interested in (terminologies, ontologies, UML): essentially, what your data looks like. You also need to have a method of data persistence and protocols, e.g. how you write it down (format, XML, etc.). You also need to think about transport of the data, or how you move it about (SOAP, REST, etc.). Access has to be thought about as well, or how you query for some of the data (SQL, DAS, custom API, etc.).

Within synthetic biology, there is a continuum from incredibly generic, useful standards through to things that are absolutely unique to our (synthetic biology) use case, and then in between is stuff that’s really important, but which might be shared with some other areas such as systems biology. For example, LIMS, and generic metadata are completely generic and can be taken care of by things like Dublin Core. DNA sequence and features are important to synthetic biology, but are not unique to it. Synthetic biology’s peculiar constraints include things like a chassis. You could say that host is synonymous with chassis, but in fact they are completely different roles. Chassis is a term used to describe something very specific in synthetic biology.

Some fields relevant to synthetic biology: microscopy, all the ‘omics, genetic and metabolic engineering, bioinformatics.

Discussion 1

Consider the unique ↔ generic continuum: where do activities in the synthetic biology lifecycle lie on the diagram? What standards already exist for these? What standards are missing?

The notes that follow are a merge of the results from the two groups, but it may be an imperfect merge and as a consequence, there may be some overlap.

UNIQUE (to synthetic biology)

  • design (the composition of behaviour (rather than of DNA, for example)).
    • modelling a novel design is different than modelling for systems biology, which seeks to discover information about existing pathways and interactions
    • quantification for design
  • Desired behaviour: higher-level design, intention. I am of the opinion that other fields also have an intention when performing an experiment, which may or may not be realized during the course of an experiment. I may be wrong in this, however. And I don’t mean an expected outcome – that is something different again.
  • Device (reusable) / parts / components
  • Multi-component, multiple-stage assembly
    • biobricks
    • assembly and machine-automated characterization, experiments and protocols (some of this might be covered in more generic standards such as OBI)
  • Scale and scaling of design
  • engineering approaches
  • characterization
  • computational accessibility
  • positional information
  • metabolic load (burden)
  • evolutionary stability

IMPORTANT

  • modelling (from systems biology): some aspects of both types of modelling are common.
    • you use modelling tools in different ways when you are starting from a synbio viewpoint
    • SBML, CellML, BioPAX
  • module/motifs/components – reusable models
  • Biological interfaces (rips, pops)
  • parts catalogues
  • interactions between parts (and hosts)
  • sequence information
  • robustness to various conditions
  • scaling of production

GENERIC

  • Experimental (Data, Protocols)
    • OBI + FuGE
  • sequence and feature metadata
    • SO, GO
  • LIMS
  • success/performance metrics (comparison with specs)
  • manufacturing/production cost

Discussion 2

From the components of a synthetic biology standard identified in discusison 1, choose two and answer:

  • what data must be captured by the standard?
  • What existing standards should it leverage?
  • Where do the boundaries lie?

Parts and Devices

What data must be captured by the standard? Part/device definition/nomenclature, sequence data, type (enumerated list), relationships between parts (enumerated list / ontology), part aggregation (ordering and composition of nested parts), incompatibilities/contraindications (including range of hosts where the chassis is viable), part buffers and interfaces/Input/Output (as a sub-type of part), provenance, curation level. Any improvements (include what changes were made, and why they were made (e.g. mcherry with the linkers removed)); versioning information (version number, release notes, feature list, and known issues); equivalent parts which are customized for other chassis (codon optimization and usage, chassis-agnostic part); Provenance information including authorship, originating lab, and the date/age of the part (much covered by the SBOL-seq standard); the derivation of the part from other parts or other biological sequence databases, and a human- and machine-readable description of the derivation.

What existing standards? SBOL, DICOM, SO, EMBL, MIBBI

Boundaries: Device efficiency (only works in the biological contexts it’s been described in), chassis and its environment, related parts could be organized into part ‘families’ (perhaps use GO for some of this), also might be able to attach other quantitative information that could be common across some parts.

Characterization

We need to state the type of the device, and we would need a new specification for each type of device, e.g. a promoter is not a GFP. We need to know some measurement information such as statistics, experimental conditions required to record, lab, protocols. Another important value is whether or not you’re using a reference part or device. The context information would include the chassis, in vitro/in vivo, conditions, half-life, and interactions with other devices/hosts.

Please note that the notes/talks section of this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

The more things change, the more they stay the same

July 11, 2011 1 comment

…also known as Day 1 of the BBSRC Synthetic Biology Standards Workshop at Newcastle University, and musings arising from the day’s experiences.

In my relatively short career (approximately 12 years – wait, how long?) in bioinformatics, I have been involved to a greater or lesser degree in a number of standards efforts. It started in 1999 at the EBI, where I worked on the production of the protein sequence database UniProt. Now, I’m working with systems biology data and beginning to look into synthetic biology. I’ve been involved in the development (or maintenance) of a standard syntax for protein sequence data; standardized biological investigation semantics and syntax; standardized content for genomics and metagenomics information; and standardized systems biology modelling and simulation semantics.

(Bear with me – the reason for this wander through memory lane becomes apparent soon.)

How many standards have you worked on? How can there be multiple standards, and why do we insist on creating new ones? Doesn’t the definition of a standard mean that we would only need one? Not exactly. Take the field of systems biology as an example. Some people are interested in describing a mathematical model, but have no need for storing either the details of how to simulate that model or the results of multiple simulation runs. These are logically separate activities, yet they fall within a single community (systems biology) and are broadly connected. A model is used in a simulation, which then produces results. So, when building a standard, you end up with the same separation: have one standard for the modelling, another for describing a simulation, and a third for structuring the results of a simulation. All that information does not need to be stored in a single location all the time. The separation becomes even more clear when you move across fields.

But this isn’t completely clear cut. Some types of information overlap within standards of a single domain and even among domains, and this is where it gets interesting. Not only do you need a single community talking to each other about standard ways of doing things, but you also need cross-community participation. Such efforts result in even more high-level standards which many different communities can utilize. This is where work such as OBI and FuGE sit: with such standards, you can describe virtually any experiment. The interconnectedness of standards is a whole job (or jobs) in itself – just look at the BioSharing and MIBBI projects. And sometimes standards that seem (at least mostly) orthogonal do share a common ground. Just today, Oliver Ruebenacker posted some thoughts on the biopax-discuss mailing list where he suggests that at least some of BioPAX and SBML share a common ground and might be usefully “COMBINE“d more formally (yes, I’d like to go to COMBINE; no, I don’t think I’ll be able to this year!). (Scroll down that thread for a response by Nicolas Le Novère as to why that isn’t necessarily correct.) So, orthogonality, or the extent to which two or more standards overlap, is sometimes a hard thing to determine.

So, what have I learnt? As always, we must be practical. We should try to develop an elegant solution, but it really, really should be one which is easy to use and intuitive to understand. It’s hard to get to that point, especially as I think that point is (and should be) a moving target. From my perspective, group standards begin with islands of initial research in a field, which then gradually develop into a nascent community. As a field evolves, ‘just-enough’ strategies for storing and structuring data become ‘nowhere-near-enough’. Communication with your peers becomes more and more important, and it becomes imperative that standards are developed.

This may sound obvious, but the practicalities of creating a community standard means such work requires a large amount of effort and continued goodwill. Even with the best of intentions, with every participant working towards the same goal, it can take months – or years – of meetings, document revisions and conference calls to hash out a working standard. This isn’t necessarily a bad thing, though. All voices do need to be heard, and you cannot have a viable standard without input from the community you are creating that standard for. You can have the best structure or semantics in the world, but if it’s been developed without the input of others, you’ll find people strangely reluctant to use it.

Every time I take part in a new standard, I see others like me who have themselves been involved in the creation of standards. It’s refreshing and encouraging. Hopefully the time it takes to create standards will drop as the science community as a whole gets more used to the idea. When I started, the only real standards in biological data (at least that I had heard of) were the structures defined by SWISS-PROT and EMBL/GenBank/DDBJ. By the time I left the EBI in 2006, I could have given you a list a foot long (GO, PSI, and many others), and that list continues to grow. Community engagement and cross-community discussions continue to be popular.

In this context, I can now add synthetic biology standards to my list of standards I’ve been involved in. And, as much as I’ve seen new communities and new standards, I’ve also seen a large overlap in the standardization efforts and an even greater willingness for lots of different researchers to work together, even taking into account the sometimes violent disagreements I’ve witnessed! The more things change, the more they stay the same…

At this stage, it is just a limited involvement, but the BBSRC Synthetic Biology Standards Workshop I’m involved in today and tomorrow is a good place to start with synthetic biology. I describe most of today’s talks in this post, and will continue with another blog post tomorrow. Enjoy!

For those with less time, here is a single sentence for each talk that most resounded with me:

  1. Mike Cooling: Emphasising the ‘re’ in reusable, and make it easier to build and understand large models from reusable components.
  2. Neil Wipat: For a standard to be useful, it must be computationally amenable as well as useful for humans.
  3. Herbert Sauro: Currently there is no formal ontology for synthetic biology, but one will need to be developed.

This meeting is organized by Jen Hallinan and Neil Wipat of Newcastle University. Its purpose is to set up key relationships in the synthetic biology community to aid the development of a standard for that community. Today, I listened to talks by Mike Cooling, Neil Wipat, and Herbert Sauro. I was – unfortunately – unable to be present for the last couple of talks, but will be around again for the second – and final – day of the workshop tomorrow.

Mike Cooling – Bioengineering Institute Auckland, New Zealand

Mike uses CellML (it’s made where he works, but that’s not the only reason…) in his work with systems and synthetic biology models. Among other things, it wraps MathML and partitions the maths, variables and units into reusable pieces. Although many of the parts seem domain specific, CellML itself is actually not domain specific. Further, unlike other modelling languages such as SBML, components in CellML are reusable and can be imported into other models. (Yes, a new package called comp in SBML Level 3 is being created to allow the importing of models into other models, but it isn’t mature – yet.)

How are models stored? There is the CellML repository, but what is out there for synthetic biology? The Registry of Standard Biological Parts was available, but only described physical parts. Therefore they created a Registry of Standard Virtual Parts (SVPs) to complement the original registry. This was developed as a group effort with a number of people including Neil Wipat and Goksel Misirli at Newcastle University.

They start with template mathematical structures (which are little parts of CellML), and then use the import functionality available as part of CellML to combine the templates into larger physical things/processes (‘SVPs’) and ultimately to combine things into system models.

They extended the CellMLRepository to hold the resulting larger multi-file models, which included adding a method of distributed version control and allow the sharing of models between projects through embedded workspaces.

What can these pieces be used for? Some of this work included the creation of a CellML model of the biology represented in Levskaya et al. 2005 and deposit all of the pieces of the model in the CellML repository. Another example is a model he’s working on about shear stress and multi-scale modelling for aneurysms.

Modules are being used and are growing in number, which is great, but he wants to concentrate more at the moment on the ‘re’ of the reusable goal, and make it easier to build and understand large models from reusable components. Some of the integrated services he’d like to have: search and retrieval, (semi-automated) visualization, semantically-meaningful metadata and annotations, and semi-automated composition.

All this work above converges on the importance of metadata. With the CellML Metadata Framework 1.0, not many used it. With version 2.0 they have developed a core specification with is very simple and then provide many additional satellite specifications. For example, there is a biological information satellite, where you use the biomodels qualifiers as relationships between your data and MIRIAM URNs. The main challenge is to find a database that is at the right level of abstraction (e.g. canonical forms of your concept of interest).

Neil Wipat – Newcastle University

Please note Neil Wipat is my PhD supervisor.

Speaking about data standards, tool interoperability, data integration and synthetic biology, a.k.a “Why we need standards”. They would like to promote interoperability and data exchange between their own tools (important!) as well as other tools. They’d also like to facilitate data integration to inform the design of biological systems both from a manual designer’s perspective and from the POV of what is necessary for computational tool use. They’d also like to enable the iterative exchange of data and experimental protocols in the synthetic biology life cycle.

A description of some of the tools developed in Neil’s group (and elsewhere) exemplify the differences in data structures present within synthetic biology. BacilloBricks was created to help get, filter and understand the information from the MIT registry of standard parts. They also created the Repository of Standard Virtual Biological Parts. This SVP repository was then extended with parts from Bacillus and was extended to make use of SBML as well as CellML. This project is called BacilloBricks Virtual. All of these tools use different formats.

It’s great having a database of SVPs, but you need a way of accessing and utilizing the database. Hallinan and Wipat have started a collaboration with Microsoft Research with the people who created a programming language for genetic engineering of living cells called the genetic engineering of cells (GEC) simulator. Some work a summer student did created a GEC compiler for SVPs from BacilloBricks virtual. Goksel has also created the MoSeC system where you can automatically go from a model to a graph to a EMBL file.

They also have BacillusRegNet, which is an information repository about transcription factors for Bacillus spp. It is also a source of orthogonal transcription factors for use in B. subtilis and Geobacillus. Again, it is very important to allow these tools to communicate efficiently.

The data warehouse they’re using is ONDEX. They feed information from the ONDEX data store to the biological parts database. ONDEX was created for systems biology to combine large experimental datasets. ONDEX views everything as a network, and is therefore a graph-based data warehouse. ONDEX has a “mini-ontology” to describe the nodes and edges within it, which makes querying the data (and understanding how the data is structured) much easier. However, it doesn’t include any information about the synthetic biology side of things. Ultimately, they’d like an integrated knowledgebase using ONDEX to provide information about biological virtual parts. Therefore they need a rich data model for synthetic biology data integration (perhaps including an RDF triplestore).

Interoperabiligy, Design and Automation: why we need standards.

Requirement 1. There needs to be interoperability and data exchange among these tools as well as among these tools and other external tools. Requirement 2. Standards for data integration aid the design of synthetic systems. The format must be both computationally amenable and useful for humans. Requirement 3. Automation of the design and characterization of synthetic systems, and this also requires standards.

The requirements of synthetic biology research labs such as Neil Wipat’s make it clear that standards are needed.

KEYNOTE: Herbert Sauro – University of Washington, US

Herbert Sauro described the developing community within synthetic biology, the work on standards that has already begun, and the Synthetic Biology Open Language (SBOL).

He asks us to remember that Synthetic Biology is not biology – it’s engineering! Beware of sending synthetic biology grant proposals to a biology panel! It is a workflow of design-build-test. He’s mainly interested in the bit between building and testing, where verification and debugging happens.

What’s so important about standards? It’s critical in engineering, where if increases productivity and lowers costs. In order to identify the requirement you must describe a need. There is one immediate need: store everything you need to reconstruct an experiment within a paper (for more on this see the Nature Biotech paper by Peccoud et al. 2011: Essential information for synthetic DNA sequences). Currently, it’s almost impossible to reconstruct a synthetic biology experiment from a paper.

There are many areas requiring standards to support the synthetic biology workflow: assembly, design, distributed repositories, laboratory parts management, and simulation/analysis. From a practical POV, the standards effort needs to allow researchers to electronically exchange designs with round tripping, and much more.

The standardization effort for synthetic biology began with a grant from Microsoft in 2008 and the first meeting was in Seattle. The first draft proposal was called PoBoL but was renamed to SBOL. It is a largely unfunded project. In this way, it is very similar to other standardization projects such as OBI.

DARPA mandated 2 weeks ago that all projects funded from Living Foundries must use SBOL.

SBOL is involved in the specification, design and build part of the synthetic biology life cycle (but not in the analysis stage). There are a lot of tools and information resources in the community where communication is desperately needed.

SBOL Semantic, SBOL Visual, and SBOL Script. SBOL Semantic is the one that’s going to be doing all of the exchange between people and tools. SBOL Visual is a controlled vocabulary and symbols for sequence features.

Have you been able to learn anything from SBML/SBGN, as you have a foot in both worlds? SBGN doesn’t address any of the genetic side, and is pretty complicated. You ideally want a very minimalistic design. SBOL semantic is written in UML and is relatively small, though has taken three years to get to this point. But you need host context above and beyond what’s modelled in SBOL Semantic. Without it, you cannot recreate the experiment.

Feature types such as operator sites, promoter sites, terminators, restriction sites etc can go into the sequence ontology (SO). The SO people are quite happy to add these things into their ontology.

SBOLr is a web front end for a knowledgebase of standard biological parts that they used for testing (not publicly accessible yet). TinkerCell is a drag and drop CAD tool for design and simulation. There is a lot of semantic information underneath to determine what is/isn’t possible, though there is no formal ontology. However, you can semantically-annotate all parts within TinkerCell, allowing the plugins to interpret a given design. A TinkerCell model can be composed of sub-models. Makes it easy to swap in new bits of models to see what happens.

WikiDust is a TinkerCell plugin written in Python which searches SBPkb for design components, and ultimately uploads them to a wiki. LibSBOLj is a library for developers to help them connect software to SBOL.

The physical and host context must be modelled to make all of this useful. By using semantic web standards, SBOL becomes extensible.

Currently there is no formal ontology for synthetic biology but one will need to be developed.

Please note that the notes/talks section of this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Summary thoughts on the BBSRC Systems Biology Workshop

December 17, 2008 Leave a comment

BBSRC Systems Biology Grantholder Workshop, University of Nottingham, 16 December 2008.

I really enjoyed this workshop – met new people, chatted about systems biology, clinical genetics, surname-DNA associations, The Princess Bride and Spinal Tap. From a combination of presentations and chats, two defining topics of discussion in this workshop emerged:

  • social challenges, or getting the different disciplines within systems biology to understand one another. Alternatively, people also mentioned the challenge in getting different collaborating groups to work together;
  • stable infrastructure funding, or getting money for supporting software and for building and supporting data standards.

In my opinion, the former is much less of a current challenge than the latter. From my personal experiences within CISBAN (which contains a variety of experimental biologists as well as different types of theoretical biologists, mathematicians and statisticians), we have progressed to the point that I really feel that each "group" understands what the others do. In other words, in a local context, I think that social challenges are minimal. Longer-distance social challenges will remain around a little longer, but with the increasing use of online social networking tools (1, 2, 3, 4, 5, 6), I think much of this could be minimized. In contrast, I think that the challenges in getting funding for stable infrastructure (software and data standards) isn't advancing as quickly as it should. The production and maintenance of life-science data standards are vital to more efficient data sharing and collaboration. People should make room in their grants for the development of data standards (e.g. MIBBI guidelines, syntaxes or semantics – see Frank's excellent discussion on the issue) that will benefit them. Core institutes such as the EBI do a lot of this work, but can't get funding for everything.

I started thinking about all this stuff on Wednesday morning, and writing this did somewhat affect the notes I took in some of the talks, and for that I apologise! :)

And, in conclusion, some light entertainment. There was a third category of discussion which many will be familiar with:

  • acronyms

I'm as guilty as the rest of them. Here's a small selection of examples of how much us scientists love our acronyms, and those things which are very close to true acronyms: APPLE, BASIS, CRISP, EMMAS, PRESTA, PheroSys, Phyre, PiMS, SToMP, SyMBA (mine), SysMO, SUMO, ROBuST and others. For a guide to how to build acronyms, see the PhD Comic's excellent summary of the topic (and the related FriendFeed discussion).

Read and post comments |
Send to a friend

original

CISBAN and telomere maintenance and shortening, BBSRC Systems Biology Workshop

December 17, 2008 Leave a comment

BBSRC Systems Biology Grantholder Workshop, University of Nottingham, 16 December 2008.

Amanda Greenall: Telomere binding proteins are conserved between yeat and higher eukaryotes. The capping proteins are very important, because they prevent the telomeres from being recognized as double-strand breaks. They work on cdc13, which is the functional homologue of POT1 in humans. A point mutation cdc13-1 allows them to study telomere uncapping. When grown above 27 degrees Celcius, the cdc13-1 protein becomes non-functional, and fall off. This uncapping causes telomere loss and cell-cycle arrest. So, they do further study into the checkpoint response that happens when telomeres are uncapped. Yeast is a good model, as many of the proteins involved in humans have direct analogs in yeast. They did a series of transcriptomics experiments to determine how gene expression is affected when telomeres are uncapped. They did 30 arrays, and the data was analysed using limma. 647 differentially-expressed genes were identified (418 upregulated (carbohydrate metabolism, energy generation, response to OS), and 229 downregulated (amino acid and ribosome biogenesis, RNA metabolism, etc)). The number of differentially-expressed genes increase with time. For example, 259 of the genes were involved in DNA damage response.

They became quite interested in BNA2, which is an enzyme which catalyses de novo NAD+ biosynthesis. Why is it upregulated? It seems over-expression of BNA2 enhances survival of cdc13-1 strains (using spot tests). Nicotinamide biosynthetic genes are altered when telomeres are uncapped in yeast and humans. The second screen was a robotic screen to identify ExoX and/or pathways affecting responses to telomere uncapping. Robots were used to to large-scale screens that can measure systematic cdc13-1 genetic interactions. One of the tests was the up-down assay, which allows them to distinguish Exo1-like and Rad9-like suppressors. Carry on with the spot tests until have worked through the entire library of strains.

Darren Wilkinson: a discrete stochastic kinetic model has been built to model the cellular response to uncapping. (J Royal Soc Interface, 4(12):73-90), and in Biomodels. Encoded in SBML and simulated in BASIS (web-based simulation engine). You can use the microarray data to infer networks of interactions. Such top-down modelling can often be done with Dynamic Bayesian Networks (DBNs) for discretised data and sparse Dynamic Linear Models (DLMs) for (normalized) continuous data. A special case of DLM is the sparse vector auto-regressive model of order 1, known as the sparse VAR(1) model, and this appears to be effective for uncovering dynamic network interactions (see Opgen-Rhein and Strimmer, 2007). They use a simple version of this model. They use a RJ-MCMC algorithm to explore both graphical structure and model parameters. When the RJ-MCMC is performed, it's quite hard to visualize. They do a plot of the marginal probability that an edge exists. This can also be summarised by choosing an arbitrary threshold and then plotting the resulting network. You can change the thickness of the edges so they match the marginal probability associated with each edge. This picture is then easier for biologists to analyse, and allows them to narrow down their search for important genes. He also performed analysis over the robotic genetic screens. There are usually about 1000 images per experiment, each with 384 spots, and therefore image analysis needs to be automated. Want to pick out those strains that are genetically interacting with the query mutation. For interactions to be useful concept in practice, you need the networks to be sparse. With HTP data, we have sufficient data to be able to re-scale the data in order to enforce this sparsity. A scatter-plot of double against single will show them all lying along a straight line (under a model of genetic independence). Points above and below the regression line are phenotypic enhancers and suppressors, respectively.

These are just my notes and are not guaranteed to be correct. Please feel free to let me know about any errors, which are all my fault and not the fault of the speaker. :)

Read and post comments |
Send to a friend

original

Marco Morelli and the micro-evolution of RNA viruses, BBSRC Systems Biology Workshop

December 17, 2008 Leave a comment

BBSRC Systems Biology Grantholder Workshop, University of Nottingham, 16 December 2008.

More fully, he's talking about the micro-evolutionary dynamics of RNA viruses. They want to get a full picture of what happens from the infection of a single cell to an entire outbreak, and all the intermediate scales. The levels of granularity he's looking at goes as follows: within cell, within host (not all viral particles in one host are genetically identical), within group (physical proximity of host to others), between groups (long-distance spreading). The data at each stage is different: from molecular data to epidemiological data. They looked at foot-and-mouth disease (FMD) and plum pox virus (PPV, transmitted by vectors), both RNA viruses. 10,000 farms were culled in the 2001 UK FMD outbreak. However, during this time, modellers were consulted. Samples were taken from every infected farm, and are stored at the IAH Purbright. This means that there's lots of data available. Then, he described a genetic tree that was built based on the FM viruses found in farms in Durham county during the outbreak. However, many transmission patterns are compatible with the tree. With some basic parameters, you can estimate how likely it is that one farm infected another. Among the total set of transmission trees (~2000), only 4 matched the values properly, and can therefore choose the most-likely tree (which accounted for about 50% of the likelihood), and therefore the most likely transmission pattern. Some of the movements show very large distances (of about 15 km). Is it a fault of the model, or a signature of some extrinsic event like transmission via car travel (human) or delivery of infected material. They still have more data (e.g. timing of transmissions) that they still have to use.

These are just my notes and are not guaranteed to be correct. Please feel free to let me know about any errors, which are all my fault and not the fault of the speaker. :)

Read and post comments |
Send to a friend

original

Scott Grandison, leaf growth and form, BBSRC Systems Biology Workshop

December 17, 2008 Leave a comment

BBSRC Systems Biology Grantholder Workshop, University of Nottingham, 16 December 2008.

Talking about the physical changes occuring in leaves, and looking at the different levels of granularity and orders of magnitude you need to think about (e.g. DNA-scale up to the macro, leaf-scale). Multidisciplinary team, and, as with other centres described at this workshop, the lines are beginning to blur. This was a really great talk, but had many videos that just cannot be reproduced here. There was a nice picture of someone viewing, in proper 3-d, bits of a plant, which went with the argument that transposing 3-d objects to 2-d can often cause problems with your visual analysis. They've been able to get parameters for the rate of growth of individual areas of a leaf – many areas, with many rates. They have made the Growth Factor Toolbox (GFtbox). The models they show using the GFtbox are very nice, and show the development of, for example, the specialized leaf of the pitcher plant or the growth of a "standard" leaf shape for Arapbidopsis.

Great talk! :)

These are just my notes and are not guaranteed to be correct. Please feel free to let me know about any errors, which are all my fault and not the fault of the speaker. :)

Read and post comments |
Send to a friend

original

OCISB: From small to large networks and back, BBSRC Systems Biology Workshop

December 17, 2008 Leave a comment

BBSRC Systems Biology Grantholder Workshop, University of Nottingham, 16 December 2008.

Judy Armitage: Bacterial sensory networks. The e.coli chemotaxis system is probably the best-understood "system" in biology, where biases in swimming direction are provided by regulating motor switching. The chemotaxis pathway is a paradigm for HPK-RR (histidine protein kinase – response regulators) pathways. There can be over 100 HPK pathways in a single species. OCISB projects include: extend E.coli models to species with two or more chemosensory pathways, and extend these to HPKLRR pathways in general to allow prediction of partners. They started with R.sphaeroides, her "favorite" bacterium. This bacterium has 2 targeted pathways preventing crosstalk. They gave the generated data sets to the modelling groups and asked if proteins operating in parallel or linear pathway? The control theory people came up with 4 models that fit the data, but 3 could be excluded based on perturbation tests in vivo. The same data was given to mathematical biologists.

Modelling was with ODEs (temporal dynamics), and partial DE (for spatiotemporal dynamics). Porter et al (2008) PNAS online, showed how histidine kinase CheA3 is also a specific phosphatase for CheY6-P, one of the 6 motor binding proteins – tuning kinas:phosphatase will control motor switching. Further, there must be a link between cytoplasmic cluster and polar kinase. CheB2~P phosphorelay allows response to environment to be tuned to metabolic need. How common is this and how is discrimination achieved? CheA (HPK) CheY/CheB (RR). Modelling MCP Helix mutants with the sidekick tool – a coarse-grain transmembrane (TM) pipeline.

These are just my notes and are not guaranteed to be correct. Please feel free to let me know about any errors, which are all my fault and not the fault of the speaker. :)

Read and post comments |
Send to a friend

original

Michael White and the NF-kappaB signalling system, BBSRC Systems Biology Workshop

December 17, 2008 Leave a comment

BBSRC Systems Biology Grantholder Workshop, University of Nottingham, 16 December 2008.

Michael White: Dynamics and function of the NF-kappaB signalling system. NF-kappaB controls cell division and cell death in all cells. How can a simple signal carry so much information (the cell cannot afford to make a mistake!)? It is a complex network with multiple feedback loops (high dynamic complexity). People think that the IkappaB holds it in the cytoplasm, but this doesn't look to be correct. Living cell imaging shows that NF-kappaB oscillates asynchronously between the cytoplasm and nucleus in single cells (i.e. doesn't happen at the same time in multiple cells). However, each cell is cycling with the same amplitude etc, so they're doing the same thing, just not at the same time.

Can we synchronise the oscillations? You can do a repeat pulse protocol and then check to see if the synchronisation has happened. When you stimulate at 100-150 minutes then you can synchronise and not get damped oscillations. They have built a stochastic model. There are a nice set of pictures of pathways, but obviously cannot reproduce those here.

Here go the batteries again…(rest of notes from the paper notes I took, which are generally much lower quality)

Some of this work is funded under SABR, where they will focus on dynamic live cell imaging, quantitative proteomics/phosphoproteomics, genomics/bioinformatics, data analysis, deterministic/stochastic modelling and databases. What are the causes of differential expression? Oscillation dynamics is one possibility (and what he describes in this talk) Others could be signal-specific IkappaB processing, differential NF-kappaB dimer formation, differential protein modifications. Is degradation of IkappaBs regulated by Rel protein binding? NF-kappaB could be differentially phosphorylated.

Finally, one last note on outreach: they've had quite the success with biologists interacting with mathematicians in the group. Biologists are now taking weekly math courses, and it was their idea. That's great :)

These are just my notes and are not guaranteed to be correct. Please feel free to let me know about any errors, which are all my fault and not the fault of the speaker. :)

Read and post comments |
Send to a friend

original

Jim Beynon and PRESTA, BBSRC Systems Biology Workshop

December 16, 2008 Leave a comment

BBSRC Systems Biology Grantholder Workshop, University of Nottingham, 16 December 2008.

PRESTA stands for Plant Resposes to Environmental STress in Arabidopsis. Even though the environment is changing rapidly, investment in plant research has declined. Abiotic and biotic stresses will function via core response networks embellished with stress-specific pathways. A fundamental component of these responses is transcriptional change. It seems that in many of the components in stress responses, hormones are key: also, everything seems to focus through key pathways. Two approaches are used: top-down modelling via network inference, or bottom-up modelling via already extant knowledge of key genes. This talk focused on the former.

They used high-resolution time-course microarrays which use 31,000 genome sequence tags (you need these to get the information to the modellers). Then, they use a range of different stress response to reveal commonalities (developmental e.g. senesence, pathogens, and abiotic stress). One example: over 48 hours there were 24 time points taken with 4 biololgical and 3 technical replicates. Two-color arrays allow complex loop design. They've been using the MAANOVA program, and even altered it to make it more efficient. You basically end up with an f test that tells you which genes have changed over time. How to select genes for Network Inference Modeling?: GO annotations, genes known to be involved in stress-related processes, trancription factors known to be involved, early response genes and prior knowledge.

There goes the battery again! Grrr…. transcribed paper notes follow, which aren't generally as detailed in my case…

Vairation of network models: 4 out of the 12 prospective genes shown to have altered pathogen growth phenotype. Knockouts in a hub gene showed both up or down-regulation of senesence. They want to add validation to the network model, and have validated various genes via experimental work). Developed APPLE, which is tha tAnalysis of Plant Promoter-Linked Elements. Discovered if overexpress HSF3 the plants are more tolerant to drought and show increased seed yield. HSF3 is part of the stress response but has a wide range of interactions, which is a good thing for building parameterized models. In the future, wants to look at the genetic diversity in the crops, and try to express a more robust response to the environment.

These are just my notes and are not guaranteed to be correct. Please feel free to let me know about any errors, which are all my fault and not the fault of the speaker. :)

Read and post comments |
Send to a friend

original

From genes to jam: modellilng A.thaliana root growth, BBSRC Systems Biology Workshop

December 16, 2008 Leave a comment

BBSRC Systems Biology Grantholder Workshop, University of Nottingham, 16 December 2008.

Presented by 4 people from CPIB.

Malcolm Bennett: This plant is a good choice because the morpology is simple, the development well-understood, the imaging technology required for the work is available, and multi-scale modelling is possible. Microfibrils are stopping the cells from growing radially. What are the mechanisms of plant cell expansion? Cell walls are made up of 3 components: cellulose microfibril skeleton, hemicellulose and GAX which cross-link the microfibrils, and pectins and RGI/RGII form the cell wall matrix.

Tara Holman: They divide the root into 5 developmental zones: meristem, accelerating elongation, decelerating elongation, mature, rest of root / lateral root emergence zone. The XET/XTR family function in the loosening of cell walls by allowing slippage of hemicellulose relative to cellulose microfibrils. There are two distinct clades of this family that are elongation specific (based on microarray data). They have transcriptomic data on all 5 areas, and are currently analysing it. They can track changes in expression of cell wall-related loci, such as XETs, and have a large amount of molecular-scale data.

Rosemary Dyson: But how do these changes contribute to root growth? Which factors are actually important? mechanics of root cell growth: cell has high turgor pressure, which is regulated very quickly by the osmotic potential. The TP also exerts a tension in teh cell wall, if tension is greater than a certain yield stress the wall will creep and exhibit irreversible growth. The degree of creep is controlled by varying the cell wall properties (e.g. viscosity). Current models are variations on the 1965 Lockhart model. Modelling assumptions are: approximate cell as a pressurized hollow cylinder with rigid end plates; model the cell wall as consisting of fibres embedded within a ground matrix; assume the wall is permanently yielded, therefore a viscous fluid; exploit the geometry – the cell wall is much thinner than the radius of the cell so can employ asymptotic analysis. It's just like glass blowing… :) She wrote everything in terms of a moving curvilinear coordinate system, fixed within the moving sheet. Where the centre surface is and the thickness of the sheet form part of the solution. She can also decompose the total fluid velocity, U, into velocity of the centre-surface v, and the fluid velocity relative to the moving v so that U = v + u. Everything is a function of the length along the cylinder, s, and time, t, only. Initial conditions are height, radius, angle of fibres, and length of fibres. There were many more functions here, which were very nicely described, but which are impossible to reproduce in these notes :)

Darren Wells: takes the equation/model produced by Rosemary and validates it and looks for best-fit numbers. One of the variables that can be measured directly is turgor pressure (via a micropipette filed with silicon oil and some fancy shenanigans). Pressure is about 3 – 3.5 bar normally (about like tyre pressure). Experimental evidence shows generally that you can assume a constant turgor pressure across the 5 areas of the root, though there may be some slight variation. Conventional fixation techniques can lead to errors of up to 100% in estimation of cell wall thickness. You can solve that with Freeze Fracture, but there is no spatial localization with that. Instead, use cryo FIB-SEM and high-pressure free substitution (beautiful elecron microscopy image!). Discovered that walls are thinner when next to another cell, and thicker at the corners, so difficult to measure thickness. In terms of the growth rate (relative to tip velocity) parameter,  it requires dynamic measurement at cellular resolution. Can use confocal microscopy and image analysis techniques, which give cell lengths and diameters "for free". Vertical imaging under physiologically relevant conditions. The final parameter is viscosity, where direct measurement would require novel techniques, e.g. the development of the micro-rheometer (haven't made it, but might build it). A couple of indirect estimation techniques are possible, however. Here's where the jam comes in, with mimetic cell walls using pectin.

All 4 presenters were clear and interesting and nicely joined together, however I particularly liked the modelling section (3rd part) of this talk. Nice use of LaTex!

These are just my notes and are not guaranteed to be correct. Please feel free to let me know about any errors, which are all my fault and not the fault of the speaker. :)

Read and post comments |
Send to a friend

original

Follow

Get every new post delivered to your Inbox.

Join 494 other followers