Categories
Meetings & Conferences Software and Tools

TT42: Computational Biology in the cloud, towards a federative and collaborative R-based platform (ISMB 2009)

Eamonn Maguire talking on behalf of Karim Chine

BIOCEP-R with advanced graphics – more than with regular R. Is a universal platform for scientific and statstical computing to create an open, federative and collaborative enironment for the production, sharing, and reuse of all the artifacts of computing. Puts new analyitcal, numerical and processing capabilities in the hands of everyone (open science). BIOCEP is a Java app built on top of R and Scilab: anything that you can do within those environments is accessible through BIOCEP. It has a RESTful API.

The BIOCEP computational open platform ecosystem: computational data sources, resources, components, GUIs, web services and scripts. The R Virtualization is like a mini-desktop – virtual R workbench. There is also a plugin repository, including GUI plugins. Firefox plugin called ElasticFox.

Here comes another demo – so fewer notes now… (but FriendFeed is made for this sort of thing, so look there – link below) 🙂 But the R console looks much easier to use the trying to use R on its own, with your own data only. The web services part means you can use BIOCEP to connect to a cloud instance.

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences Software and Tools

TT40: BioCatalogue: A Curated Web Service Registry for the Life Science Community (ISMB 2009)

Franck Tanoh

They estimate 3000+ web services in life sciences, and we need to find out information for them beyond even just where they can be found. People who have an interest in such services: users, developers, service providers (big and small), and tool developers. Their curation consists of: free text, tags, CV, automated WSDL ripping and analytics, automated monitor and testing, partner feeds.

Next came a demo of biocatalogue. You can bookmark lots of services, even without signing up. Categories are created based on service function and discipline. There is also a history of who adds what and when, to aid attribution. The state of the service is shown with an icon. You can find the description and information on any costs or licensing restrictions. The input and output of the services have their own description. Soon, they’ll support batch services.

They’ll have test scripts that monitor the services, and they’d love to get loads of people involved.

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences Software and Tools Standards

TT26: BioModels Database, a database of curated and annotated quantitative models with Web Services and analysis tools (ISMB 2009)

Nicolas Le Novère

Lots of things are called models. He’s NOT going to talk about HMM, Bayesian models, sailboat models, supermodels 🙂 For him, a model is computer-readable, simulatable, and covers biological pathways. Models and their description/metadata need to be accessible. The models in BioModels are from peer-reviewed literature. THey check the model is OK and simulate them before accepting it into the database. Models can be either submitted by curators themselves (e.g. re-implemented from literature), or directly submitted by authors, or a few other ways.

Models also have to be encoded in SBML and follow the MIRIAM guidelines, which are reporting guidelines for the encoding and annotation of models, and is limited at the moment to models that can be quantitatively evaluated. There are seven basic requirements for MIRIAM compliance, which are available online. Within the model, MIRIAM annotations are identified by URIs and are stored as RDF. There’s been a steady increase in the numbers of models in BioModels. There are about 35000 reactions and about 400 models. Standard search functionality available from their website at the EBI (http://www.ebi.ac.uk/biomodels).

Can export in CellML, BioPAX and others (though the SBML is the curated, perhaps more “trusted”, version). There are also two simple simulators available directly from the entry’s webpage, and if you want to change parameters you can click through to JWS online. You can also just extract portions of the models: these will end up as valid SBML models in their own right.

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences Semantics and Ontologies Software and Tools

TT16: Ontology Services for Semantic Applications in Healthcare and Life Sciences (ISMB 2009)

Patricia Whetzel, Outreach Coordinator for NCBO

Trish has recorded her talk as a screencast as she wanted to do a demo, and she can’t trust the wireless – true enough! RESTful web services have been developed at the NCBO within BioPortal. http://rest.bioontology.org/bioportal (Note this is the prefix for all services, and if you just go to this URL there isn’t anything visible). Chose RESTful services as they are lightweight and easy to use. The main BioPortal website is http://bioportal.bioontology.org. All information on the BioPortal site is retrieved using those web services. Can store ontologies in OWL, OBO and Protege frames formats.

You can search ontologies based on a number of parameters. Much help information is available via mouseover text. You can also download ontologies that are available on BioPortal. When browsing your ontologies you can see the structure, the metadata, definitions and more. There are also ontology widgets that you can put on your own site, including jump-to feature and term selection widget. This latter one is very useful because it allows your web app to use term auto-complete without having to code it yourself!

To go into the search web services a little bit more, for instance search for “protocol”. The search can be paramaterized and filtered in many ways: which ontology to use, exact or non-exact matching, etc. The search function is especially important for ontology re-use. For instance, if you’re developing a new domain ontology, then you want to make sure you don’t reinvent the wheel and this is a good way to find out what’s out there. The next bit of the video showed using these searches via programmatic means.

BioPortal also allows you to annotate, or add notes, to ontologies. There is also an annotation tag /term cloud in the interface, which is nice 🙂 You may see duplicates in the tag cloud – designed to be this way to show that more than one ontology has that term..  There are also hierarchy services. You can view the parent terms of a particular term, and do other sorts of queries that allow you to explore the hierarchy around a term programmatically. On the web app, they have a visualization of the hierarchy that is dynamic and you can play with.

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences Software and Tools

TT13: Reflect: Augmented Browsing for the Life Scientist (ISMB 2009)

Sean O’Donoghue

A pragmatic approach to web semantics. What/can end-users do? Wait for all publishers to tag content? Systematically tag all of it? With all features users would like? We can wait for this SW to appear, or we can help make it happen. They use two ideas to make it happen: augmented browsing and real-time tagging. The primary design decision is that it should be easy to install and fast to use. By using Reflect, all they have to do is press one button: put a url into a field and then press Enter. Then the server will serve a tagged version of that page. Alternatively, you can download a plugin for Firefox and IE.

Allyson says: missed a large part of the end of this talk due to FF failing part way through and I tried to get a workaround going.

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences Semantics and Ontologies Software and Tools

Prototyping a Biomedical Ontology Recommender Service (ISMB Bio-Ont SIG 2009)

Clement Jonquet et al.

It’s hard for people to find data – annotating data with ontologies is a solution, but which ontology to use? There are many different formats, platforms, versions. Which is relevant to you? What happens if you get it wrong? Here’s where the recommender comes in. The NCBO Annotator workflow extracts annotations from text by concept recognition, expands annotations using knowledge in the ontologies, and then score annotations according to their context and return them to the user. They use a dictionary approact, which is a list of strings that identifies ontology concepts.

For the semantic expansion, use the original ontologies’ is_a hierarchy, and use mappings in UMLS Metathesaurus and NCBO Bioportal, and use semantic similarity algorigthms based on the is_a graph (ongoing work). It has a nice-looking web interface that has a resulting ontology tag cloud – a good way of displaying the results. In the results, you give high scores for big ontologies, and you identify key ontologies. Some ontologies appear only with a specifric type of data, and it makes it important to have an appropriate recommendation. The score does not follow linearly the number of annotations. In future, they want to enhance backend annotation workflow, and have different types of scoring methods. They’d also like to parameterize the scoring methods.

http://obs.bioontology.org

FriendFeed discussion: http://ff.im/4x1mo

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences Semantics and Ontologies Software and Tools

Increasingly accurate biochemical knowledge representation with precise, structure-based chemical identifiers (ISMB Bio-Ont SIG 2009)

Michael Dumontier et al.

Problem: identifiers are a name for some biochemical entity. Records offer a rich description of the named entity. When viewing data, sometimes it’s difficult to know which form of a chemical the site is referring to. Peoples use identifiers when reporting experimental results, but it’s often unclear which species they’re referring to, and there can be erroneous/underspecified reporting of results. They’d like to generate stable identifiers based on explicit, machine-understandable descriptions which are unchanging and fully self-describing. With this style, different molecules must have different identifiers. For example, InChI strings are good but need specialized software to parse the InChI string.

Some formats that already exist are SDF and CML, whereas existing identifiers that contain chemical information are InChI and SMILES. So, what happens if you ask CML the differences between 3 very similar chemical species that only differ  in their stereochemistry? It isn’t really possible. He’d like to reason betwen relations and class membership, and to classification tasks.

In the vein of functional groups, they’d like to capture some form of generalisation: experimental conditions necessitate a certain level of structural (un)certainty. So, more flexible and accurate representation of biochemical knowledge beyond the exact structure. Classes would include: specification, minimum, combination, possibilities/uncertainties, exclusion.

Ultimately, what we want to do is to generate the useful identifier to point to accurate and unchanging descriptions. So, take what was done with InChI and generate something that can be self-explained. We need OWL description -> identifier. So they have a prototype service that allows you to submit an OWL snippet and get back an identifier. This means that if the description changes, the identifier changes. They will add new knowledge into the linked data web through Bio2RDF.

Benefits of this system include no curation being required, can make identifiers for knowledge at various levels of granularity. Situational modeling enables the careful separation of what is known under particular circumstances.

FriendFeed discussion: http://ff.im/4wZax

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Housekeeping & Self References Papers Research Blogging Software and Tools Standards

Modeling and Managing Experimental Data Using FuGE

ResearchBlogging.org

Want to share your umpteen multi-omics data sets and experimental protocols with one common format? Encourage collaboration! Speak a common language! Share your work! How, you might ask? With FuGE, and this latest paper (citation at the end of the post) tells you how.

In 2007, FuGE version 1 was released (website, Nature Biotechnology paper). FuGE allows biologists and bioinformaticians to describe any life science experiment using a single format, making collaboration and repeatability of experiments easier and more efficient. However, if you wanted to start using FuGE, until now it was difficult to know where to start. Do you use FuGE as it stands? Do you create an extension of FuGE that specifically meets your needs? What do the developers of FuGE suggest when taking your first steps using it? This paper focuses on best practices for using FuGE to model and manage your experimental data. Read this paper, and you’ll be taking your first steps with confidence!

[Aside: Please note that I am one of the authors of this paper.]

What is FuGE? I’ll leave it to the authors to define:

The approach of the Functional Genomics Experiment (FuGE) model is different, in that it attempts to generalize the modeling constructs that are shared across many omics techniques. The model is designed for three purposes: (1) to represent basic laboratory workflows, (2) to supplement existing data formats with metadata to give them context within larger workflows, and (3) to facilitate the development of new technology-specific formats. To support (3), FuGE provides extension points where developers wishing to create a data format for a specific technique can add constraints or additional properties.

A number of groups have started using FuGE, including MGED, PSI (for GelML and AnalysisXML), MSI, flow cytometry, RNA interference and e-Neuroscience (full details in the paper). This paper helps you get a handle on how to use FuGE by presenting two running examples of capturing experimental metadata in the fields of flow cytometry and proteomics of flow cytometry and gel electrophoresis. Part of Figure 2 from the paper is shown on the right, and describes one section of the flow cytometry FuGE extension from FICCS.

The flow cytometry equipment created as subclasses of the FuGE equipment class.
The flow cytometry equipment created as subclasses of the FuGE equipment class.

FuGE covers many areas of experimental metadata including the investgations, the protocols, the materials and the data. The paper starts by describing how protocols are designed in FuGE and how those protocols are applied. In doing so, it describes not just the protocols but also parameterization, materials, data, conceptual molecules, and ontology usage.

Examples of each of these FuGE packages are provided in the form of either the flow cytometry or the GelML extensions. Further, clear scenarios are provided to help the user determine when it is best to extend FuGE and when it is best to re-use existing FuGE classes. For instance, it is best to extend the Protocol class with an application-specific subclass when all of the following are true: when you wish to describe a complex Protocol that references specific sub-protocols, when the Protocol must be linked to specific classes of Equipment or Software, and when specific types of Parameter must be captured. I refer you to the paper for scenarios for each of the other FuGE packages such as Material and Protocol Application.

The paper makes liberal use of UML diagrams to help you understand the relationship between the generic FuGE classes and the specific sub-classes generated by extensions. A large part of the paper is concerned expressly with helping the user understand how to model an experiment type using FuGE, and also to understand when FuGE on its own is enough. But it also does more than that: it discusses the current tools that are already available for developers wishing to use FuGE, and it discusses the applicability of other implementations of FuGE that might be useful but do not yet exist. Validation of FuGE-ML and the storage of version information within the format are also described. Implementations of FuGE, including SyMBA and sysFusion for the XML format and ISA-TAB for compatibility with a spreadsheet (tab-delimited) format, are also summarised.

I strongly believe that the best way to solve the challenges in data integration faced by the biological community is to constantly strive to simply use the same (or compatible) formats for data and for metadata. FuGE succeeds in providing a common format for experimental metadata that can be used in many different ways, and with many different levels of uptake. You don’t have to use one of the provided STKs in order to make use of FuGE: you can simply offer your data as a FuGE export in addition to any other omics formats you might use. You could also choose to accept FuGE files as input. No changes need to be made to the underlying infrastructure of a project in order to become FuGE compatible. Hopefully this paper will flatten the learning curve associated for developers, and get them on the road to a common format. Just one thing to remember: formats are not something that the end user should see. We developers do all this hard work, but if it works correctly, the biologist won’t know about all the underpinnings! Don’t sell your biologists on a common format by describing the intricacies of FuGE to them (unless they want to know!), just remind them of the benefits of a common metadata standard: cooperation, collaboration, and sharing.

Jones, A., Lister, A.L., Hermida, L., Wilkinson, P., Eisenacher, M., Belhajjame, K., Gibson, F., Lord, P., Pocock, M., Rosenfelder, H., Santoyo-Lopez, J., Wipat, A., & Paton, N. (2009). Modeling and Managing Experimental Data Using FuGE OMICS: A Journal of Integrative Biology, 2147483647-13 DOI: 10.1089/omi.2008.0080

Categories
Meetings & Conferences Software and Tools

Building a New Biology (BioSysBio 2009)

Drew Endy
Stanford University, and BioBricks Foundation

Overview: Puzzle related to SB and informing some of his engineering work. Then a ramble through the science of genetics. Last part is a debrief on BioBrick public agreements.

Part 1. If SB is going to scale, we really need to think about the underlying "physics engine", you could do worse than look to Gillespie's work on a well-mixed system. This underlies much of the stochastic systems that underly SB, such as the differentiation of stem cells. A lot of work is based on this idea. Another good system is phage lambda: a phage infects a cell, leading to two outcomes: lysogen + dormancy, or lysing of the cell. If you infect 100 cells with exactly 1 phage molecule each, you get a distribution of behaviour. How is the physics working here? How does an individual cell decide which fate is in store? About 10 years ago, A Arkin took this molecular biology and mapped it to a physics model. From this model it became clear how this variability arises. Can you predetermine what cell fate will occur before lamba infects it? Endy looked into this. They collected different types of cells: both tiny and large (e.g. with the latter, about to divide and with the former just after division). They then scored each cell for the different fates. In the tiny cells, lysogeny is favored 4 to 1, whereas in big cells, lysis is favored 4 to 1. In the end, this is a deterministic model. There might be some discrete transition where certain parts of the cell cycle favor certain fates. They found, however, that there was a continuous distribution of lysis/lysogeny. Further examination found that there was a third, mixed fate. This is that the cell divides before it decides what to do, and the daughter cells will then decide what to do.

They have looked at this process in time, and how it works at the single-cell level. N is a protein made almost immediately upon infection – its activity is not strongly coordinated with cell fate. Cll *is* strongly associated, however. Q protein also studied. In a small bacterium, 100 molecules of repressor are constrained more in the physical sense, so you need 400 of Cro to balance; while in a bigger bacterium there is more space and only 100 Cro are needed. However, this theory may not work as the things may take too long to be built.

Part 2. How much DNA is there on earth? Well, it must be finite. he's not sure about these numbers1E10 tons bacteria (5% DNA)… 5E35 bp on the planet. How long would it take us to sequence it? A conservative estimate – and a little out of date – is about 5E23 months – one mole of months! If current trends hold, a typical RO1 (grant) in 2090 could have: sequence all DNA on earth in the first month of project. 🙂

If there is a finite amount of dna on the planet, could we finish the science of genetics or SB? If true, could we then finish early? Is genetics bounded? Well, if these three things hold true, perhaps yes: genomes have finite lengths; Fixation of rates of mutants in poopulations are finite; Atrophy rates of functional genetic elements are > 0.

Is the underlying math equal to perturbation design? Take the bacteriophage T7 (references a 1969 paper about it from Virology): in that, 19 genes have been identified by isolating the mutants and expect 10 more. By 1989 the sequence came out, and there were acutally 50 genes. So, mutagenesis and screening only got some of the genes. About 40% of the elements didn't have a function assigned.

Could a biologist fix a radio? Endy's question is: could an engineer fix an evolved radio (see Koza et al.)?

Part 3. Who owns BioFAB? What legal things do we need to do for BioBricks? Patents are slow and expensive, copyright is cheap but does not apply, and various other things have other problems. Therefore they have drafted the BioBrick Public Agreements document. He then showed the actual early draft document. They're trying to create a commons of free parts. Open Technology Platform for BioBricks.

Personal Comments: Best statement from Endy: "Really intelligent design would have documentation." (Not sure if it is his statement, or attributed to someone else).

Wednesday Session 3
http://friendfeed.com/rooms/biosysbio
http://conferences.theiet.org/biosysbio

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. I'm happy to correct any errors you may spot – just let me know!

Read and post comments |
Send to a friend

original

Categories
Meetings & Conferences Software and Tools

An Intuitive Automated Modelling Interface for Systems Biology (BioSysBio 2009)

O Kahramanogullari et al.
Imperial College London

He works on improving the modelling and inference step. He makes use of SPiM, which is a process algebra by Microsoft. Process algebra is used to study complex reactive systems, and therefore are well-suited to modelling biological systems. They have used this technique to build a process model of Rho GTPases with GDIs (Kahramanogullari et al. 2009 Theoretical Computer Science, in press).  They also created a process model for actin polymerisation (Kahramanogullari et al. 2009, Proc of FBTC08, Elsevier). Such structures can be written in process algebra when they would be extremely difficult with differential equation techniques.

Process algebra is very difficult for anyone to use directly. So, they've developed an intuitive language interface for modelling with SPiM. The assumption in this is that biochemical species are stateful entities with connectivity interfaces to other species. Further, a species can have a number of sites through which it interacts with other species, and changes its state as a result of these interactions. So, they allow descriptions of the model in a natural-language-like narrative language. Their tool is available for download from their website: http://www.doc.ic.ac.uk/~ozank/pim.html .

Wednesday Session 1
http://friendfeed.com/rooms/biosysbio
http://conferences.theiet.org/biosysbio

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else's. I'm happy to correct any errors you may spot – just let me know!

Read and post comments |
Send to a friend

original