In my relatively short career (approximately 12 years – wait, how long?) in bioinformatics, I have been involved to a greater or lesser degree in a number of standards efforts. It started in 1999 at the EBI, where I worked on the production of the protein sequence database UniProt. Now, I’m working with systems biology data and beginning to look into synthetic biology. I’ve been involved in the development (or maintenance) of a standard syntax for protein sequence data; standardized biological investigation semantics and syntax; standardized content for genomics and metagenomics information; and standardized systems biology modelling and simulation semantics.
(Bear with me – the reason for this wander through memory lane becomes apparent soon.)
How many standards have you worked on? How can there be multiple standards, and why do we insist on creating new ones? Doesn’t the definition of a standard mean that we would only need one? Not exactly. Take the field of systems biology as an example. Some people are interested in describing a mathematical model, but have no need for storing either the details of how to simulate that model or the results of multiple simulation runs. These are logically separate activities, yet they fall within a single community (systems biology) and are broadly connected. A model is used in a simulation, which then produces results. So, when building a standard, you end up with the same separation: have one standard for the modelling, another for describing a simulation, and a third for structuring the results of a simulation. All that information does not need to be stored in a single location all the time. The separation becomes even more clear when you move across fields.
But this isn’t completely clear cut. Some types of information overlap within standards of a single domain and even among domains, and this is where it gets interesting. Not only do you need a single community talking to each other about standard ways of doing things, but you also need cross-community participation. Such efforts result in even more high-level standards which many different communities can utilize. This is where work such as OBI and FuGE sit: with such standards, you can describe virtually any experiment. The interconnectedness of standards is a whole job (or jobs) in itself – just look at the BioSharing and MIBBI projects. And sometimes standards that seem (at least mostly) orthogonal do share a common ground. Just today, Oliver Ruebenacker posted some thoughts on the biopax-discuss mailing list where he suggests that at least some of BioPAX and SBML share a common ground and might be usefully “COMBINE“d more formally (yes, I’d like to go to COMBINE; no, I don’t think I’ll be able to this year!). (Scroll down that thread for a response by Nicolas Le Novère as to why that isn’t necessarily correct.) So, orthogonality, or the extent to which two or more standards overlap, is sometimes a hard thing to determine.
So, what have I learnt? As always, we must be practical. We should try to develop an elegant solution, but it really, really should be one which is easy to use and intuitive to understand. It’s hard to get to that point, especially as I think that point is (and should be) a moving target. From my perspective, group standards begin with islands of initial research in a field, which then gradually develop into a nascent community. As a field evolves, ‘just-enough’ strategies for storing and structuring data become ‘nowhere-near-enough’. Communication with your peers becomes more and more important, and it becomes imperative that standards are developed.
This may sound obvious, but the practicalities of creating a community standard means such work requires a large amount of effort and continued goodwill. Even with the best of intentions, with every participant working towards the same goal, it can take months – or years – of meetings, document revisions and conference calls to hash out a working standard. This isn’t necessarily a bad thing, though. All voices do need to be heard, and you cannot have a viable standard without input from the community you are creating that standard for. You can have the best structure or semantics in the world, but if it’s been developed without the input of others, you’ll find people strangely reluctant to use it.
Every time I take part in a new standard, I see others like me who have themselves been involved in the creation of standards. It’s refreshing and encouraging. Hopefully the time it takes to create standards will drop as the science community as a whole gets more used to the idea. When I started, the only real standards in biological data (at least that I had heard of) were the structures defined by SWISS-PROT and EMBL/GenBank/DDBJ. By the time I left the EBI in 2006, I could have given you a list a foot long (GO, PSI, and many others), and that list continues to grow. Community engagement and cross-community discussions continue to be popular.
In this context, I can now add synthetic biology standards to my list of standards I’ve been involved in. And, as much as I’ve seen new communities and new standards, I’ve also seen a large overlap in the standardization efforts and an even greater willingness for lots of different researchers to work together, even taking into account the sometimes violent disagreements I’ve witnessed! The more things change, the more they stay the same…
At this stage, it is just a limited involvement, but the BBSRC Synthetic Biology Standards Workshop I’m involved in today and tomorrow is a good place to start with synthetic biology. I describe most of today’s talks in this post, and will continue with another blog post tomorrow. Enjoy!
For those with less time, here is a single sentence for each talk that most resounded with me:
- Mike Cooling: Emphasising the ‘re’ in reusable, and make it easier to build and understand large models from reusable components.
- Neil Wipat: For a standard to be useful, it must be computationally amenable as well as useful for humans.
- Herbert Sauro: Currently there is no formal ontology for synthetic biology, but one will need to be developed.
This meeting is organized by Jen Hallinan and Neil Wipat of Newcastle University. Its purpose is to set up key relationships in the synthetic biology community to aid the development of a standard for that community. Today, I listened to talks by Mike Cooling, Neil Wipat, and Herbert Sauro. I was – unfortunately – unable to be present for the last couple of talks, but will be around again for the second – and final – day of the workshop tomorrow.
Mike Cooling – Bioengineering Institute Auckland, New Zealand
Mike uses CellML (it’s made where he works, but that’s not the only reason…) in his work with systems and synthetic biology models. Among other things, it wraps MathML and partitions the maths, variables and units into reusable pieces. Although many of the parts seem domain specific, CellML itself is actually not domain specific. Further, unlike other modelling languages such as SBML, components in CellML are reusable and can be imported into other models. (Yes, a new package called comp in SBML Level 3 is being created to allow the importing of models into other models, but it isn’t mature – yet.)
How are models stored? There is the CellML repository, but what is out there for synthetic biology? The Registry of Standard Biological Parts was available, but only described physical parts. Therefore they created a Registry of Standard Virtual Parts (SVPs) to complement the original registry. This was developed as a group effort with a number of people including Neil Wipat and Goksel Misirli at Newcastle University.
They start with template mathematical structures (which are little parts of CellML), and then use the import functionality available as part of CellML to combine the templates into larger physical things/processes (‘SVPs’) and ultimately to combine things into system models.
They extended the CellMLRepository to hold the resulting larger multi-file models, which included adding a method of distributed version control and allow the sharing of models between projects through embedded workspaces.
What can these pieces be used for? Some of this work included the creation of a CellML model of the biology represented in Levskaya et al. 2005 and deposit all of the pieces of the model in the CellML repository. Another example is a model he’s working on about shear stress and multi-scale modelling for aneurysms.
Modules are being used and are growing in number, which is great, but he wants to concentrate more at the moment on the ‘re’ of the reusable goal, and make it easier to build and understand large models from reusable components. Some of the integrated services he’d like to have: search and retrieval, (semi-automated) visualization, semantically-meaningful metadata and annotations, and semi-automated composition.
All this work above converges on the importance of metadata. With the CellML Metadata Framework 1.0, not many used it. With version 2.0 they have developed a core specification with is very simple and then provide many additional satellite specifications. For example, there is a biological information satellite, where you use the biomodels qualifiers as relationships between your data and MIRIAM URNs. The main challenge is to find a database that is at the right level of abstraction (e.g. canonical forms of your concept of interest).
Neil Wipat – Newcastle University
Please note Neil Wipat is my PhD supervisor.
Speaking about data standards, tool interoperability, data integration and synthetic biology, a.k.a “Why we need standards”. They would like to promote interoperability and data exchange between their own tools (important!) as well as other tools. They’d also like to facilitate data integration to inform the design of biological systems both from a manual designer’s perspective and from the POV of what is necessary for computational tool use. They’d also like to enable the iterative exchange of data and experimental protocols in the synthetic biology life cycle.
A description of some of the tools developed in Neil’s group (and elsewhere) exemplify the differences in data structures present within synthetic biology. BacilloBricks was created to help get, filter and understand the information from the MIT registry of standard parts. They also created the Repository of Standard Virtual Biological Parts. This SVP repository was then extended with parts from Bacillus and was extended to make use of SBML as well as CellML. This project is called BacilloBricks Virtual. All of these tools use different formats.
It’s great having a database of SVPs, but you need a way of accessing and utilizing the database. Hallinan and Wipat have started a collaboration with Microsoft Research with the people who created a programming language for genetic engineering of living cells called the genetic engineering of cells (GEC) simulator. Some work a summer student did created a GEC compiler for SVPs from BacilloBricks virtual. Goksel has also created the MoSeC system where you can automatically go from a model to a graph to a EMBL file.
They also have BacillusRegNet, which is an information repository about transcription factors for Bacillus spp. It is also a source of orthogonal transcription factors for use in B. subtilis and Geobacillus. Again, it is very important to allow these tools to communicate efficiently.
The data warehouse they’re using is ONDEX. They feed information from the ONDEX data store to the biological parts database. ONDEX was created for systems biology to combine large experimental datasets. ONDEX views everything as a network, and is therefore a graph-based data warehouse. ONDEX has a “mini-ontology” to describe the nodes and edges within it, which makes querying the data (and understanding how the data is structured) much easier. However, it doesn’t include any information about the synthetic biology side of things. Ultimately, they’d like an integrated knowledgebase using ONDEX to provide information about biological virtual parts. Therefore they need a rich data model for synthetic biology data integration (perhaps including an RDF triplestore).
Interoperabiligy, Design and Automation: why we need standards.
Requirement 1. There needs to be interoperability and data exchange among these tools as well as among these tools and other external tools. Requirement 2. Standards for data integration aid the design of synthetic systems. The format must be both computationally amenable and useful for humans. Requirement 3. Automation of the design and characterization of synthetic systems, and this also requires standards.
The requirements of synthetic biology research labs such as Neil Wipat’s make it clear that standards are needed.
KEYNOTE: Herbert Sauro – University of Washington, US
Herbert Sauro described the developing community within synthetic biology, the work on standards that has already begun, and the Synthetic Biology Open Language (SBOL).
He asks us to remember that Synthetic Biology is not biology – it’s engineering! Beware of sending synthetic biology grant proposals to a biology panel! It is a workflow of design-build-test. He’s mainly interested in the bit between building and testing, where verification and debugging happens.
What’s so important about standards? It’s critical in engineering, where if increases productivity and lowers costs. In order to identify the requirement you must describe a need. There is one immediate need: store everything you need to reconstruct an experiment within a paper (for more on this see the Nature Biotech paper by Peccoud et al. 2011: Essential information for synthetic DNA sequences). Currently, it’s almost impossible to reconstruct a synthetic biology experiment from a paper.
There are many areas requiring standards to support the synthetic biology workflow: assembly, design, distributed repositories, laboratory parts management, and simulation/analysis. From a practical POV, the standards effort needs to allow researchers to electronically exchange designs with round tripping, and much more.
The standardization effort for synthetic biology began with a grant from Microsoft in 2008 and the first meeting was in Seattle. The first draft proposal was called PoBoL but was renamed to SBOL. It is a largely unfunded project. In this way, it is very similar to other standardization projects such as OBI.
DARPA mandated 2 weeks ago that all projects funded from Living Foundries must use SBOL.
SBOL is involved in the specification, design and build part of the synthetic biology life cycle (but not in the analysis stage). There are a lot of tools and information resources in the community where communication is desperately needed.
SBOL Semantic, SBOL Visual, and SBOL Script. SBOL Semantic is the one that’s going to be doing all of the exchange between people and tools. SBOL Visual is a controlled vocabulary and symbols for sequence features.
Have you been able to learn anything from SBML/SBGN, as you have a foot in both worlds? SBGN doesn’t address any of the genetic side, and is pretty complicated. You ideally want a very minimalistic design. SBOL semantic is written in UML and is relatively small, though has taken three years to get to this point. But you need host context above and beyond what’s modelled in SBOL Semantic. Without it, you cannot recreate the experiment.
Feature types such as operator sites, promoter sites, terminators, restriction sites etc can go into the sequence ontology (SO). The SO people are quite happy to add these things into their ontology.
SBOLr is a web front end for a knowledgebase of standard biological parts that they used for testing (not publicly accessible yet). TinkerCell is a drag and drop CAD tool for design and simulation. There is a lot of semantic information underneath to determine what is/isn’t possible, though there is no formal ontology. However, you can semantically-annotate all parts within TinkerCell, allowing the plugins to interpret a given design. A TinkerCell model can be composed of sub-models. Makes it easy to swap in new bits of models to see what happens.
WikiDust is a TinkerCell plugin written in Python which searches SBPkb for design components, and ultimately uploads them to a wiki. LibSBOLj is a library for developers to help them connect software to SBOL.
The physical and host context must be modelled to make all of this useful. By using semantic web standards, SBOL becomes extensible.
Currently there is no formal ontology for synthetic biology but one will need to be developed.
Please note that the notes/talks section of this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!
Larisa Soldatova et al.
OBI was created to meet the need for a standardised vocabulary for experiments that can be shared across many experiment types. OBI is community driven, with over 19 communities participating. It is a candidate OBO Foundry ontology, is complementary to existing bio-ontologies, and reuses existing ontologies where possible. It uses various ULOs for interoperability: BFO, RO, and IAO. material_entity class was introduced into BFO on request of the OBI developers, for instance.
OBI uses relations from BFO, RO, and IAO as well as creating relations specific to OBI. OBI relations could be merged with other relations ontologies in future. They try to have as few relations as possible. Two use cases were outlined in this paper. Firstly, analyte measuring assay, where you draw blood from a mouse and determine the concentration of glucose in it. Use case 2 was a vaccine protection study, where you measure how efficiently a vaccine induces protection against virulent pathogen infection in vivo.
Allyson’s thoughts: Disclosure: I am involved in the development of OBI.
FriendFeed Discussion: http://ff.im/4xoIA
Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!
According to the rules set down by Greg Laden over at Science Blogs, I have had a trawl through the blasts from the pasts that was my 18 months or older blog posts to find one that is “exactly in lie [sic] with the writing or research in which they are currently engaged”. I thought about my Visiting With Enigma post, which has a special place in my heart, but didn’t choose it in the end as it didn’t have anything to do with my current research. Instead, I ended up choosing my very first post on WordPress: FuGO Workshop Day 1. It may not sound like much, but there are a number of things recommending this particular post.
- FuGO was the original name for the OBI project, of which I’m still a part and therefore it fits with the requirement that I still am involved.
- This was my first introduction to ontologies, and happened just as I was leaving one job (at the EBI) and starting a new one (at CISBAN). Such an important change deserves another mention.
- I notice an earlier incarnation of my “be sensible” statement in this post, where I say that I learned from Richard Scheuerman that it is always a good idea to use “only those fields which would be of most use to the biologist, rather than those that would make us bioinformaticians most happy”.
- FuGO wasn’t the only thing that has since undergone a name change. This post also contained information about the “new” MIcheck registry of minimal checklists: this has continued to gain in popularity, and is now MIBBI.
- Just last week at the CBO workshop, and again in a short discussion on FriendFeed that led to longer real-life conversations (Phillip Lord’s paper that deals with this topic), there was a long discussion at the FuGO workshop about Multiple versus Single inheritance in ontologies. This was also my first introduction to Robert Stevens and Barry Smith, who both took center stage in the MI/SI discussion. Listening to Barry and Robert speak was really informative and interesting and fun!
What a fantastic day that was: a crash course in ontology development and best-practices, as well as introductions to some of the most well-known people in the biological / biomedical ontology world. In many ways, those first few days of my current job / last few days of my old job shaped where I am now.
Read that entire post, and Happy Blogging is Hard Day! Thanks to Greg Laden for the great idea.
This post is part of the PLoS One syncroblogging day, as part of the PLoS ONE @ Two birthday celebrations. Happy Synchroblogging! Here’s a link to the paper on the PLoS One website.
Biological data: vitally important, determinedly unruly. This challenge facing the life-science community has been present for decades, as witnessed by the often exponential growth of biological databases (see the classic curve in the current graphs of UniProt1 and EMBL if you don’t believe me). It’s important to me, as a bioinformatics researcher whose main focus is semantic data integration, but it should be important to everyone. Without manageable data that can be easily integrated, all of our work suffers. Nature thinks it’s important: it recently devoted an entire issue to Big Data. Similarly, the Journal of Biomedical Informatics just had a Semantic Mashup special issue. Deus et al. (the paper I’m blogging about, published in PLoS One this summer) agree, beginning with “Data, data everywhere”, nicely encapsulating both the joy and the challenge in one sentence.
This paper describes work on a distributed management system that can link disparate data sources using methodologies commonly associated with the semantic web (or is that Web 3.0?). I’m a little concerned (not at the paper, just in general) at the fact that we seem to already have a 3.0 version of the web, especially as I have yet to figure out a useful definition for semantic web vs Web 2.0 vs Web 3.0. Definitions of Web 3.0 seems to vary wildly: is it the semantic web? Is it the -rwx- to Web 1.0′s -r– and Web 2.0′s -rw– (as described here, and cited below)? Are these two definitions one and the same? Perhaps these are discussions for another day… Ultimately, however, I have to agree with the authors that “Web 3.0″ is an unimaginative designation2.
So, how can the semantic web help manage our data? That would be a post in itself, and is the focus of many PhD projects (including mine). Perhaps a better question is how does the management model proposed by Deus et al. use the semantic web, and is it a useful example of integrative bioinformatics?
Their introduction focuses on two types of integration: data integration as an aid to holistic approaches such as mathematical modelling, and software integration which could provide tighter interoperability between data and services. They espouse (and I agree) the semantic web as a technology which will allow the semantically-meaningful specification of desired properties of data in a search, rather than retrieving data in a fixed way from fixed locations. They want to extend semantic data integration from the world of bioinformatics into clinical applications. Indeed, they want to move past “clandestine and inefficient flurry of datasets exchanged as spreadsheets through email”, a laudable goal.
Their focus is on a common data management and analysis infrastructure that does not place any restrictions on the data stored. This also means multiple instances of light-weight applications are part of the model, rather than a single central application. The storage format is of a more general, flexible nature. Their way of getting the data into a common format, they say, is to break down the “interoperable elements” of the data structures into RDF triples (subject-predicate-object statements). At its most basic, their data structure has two types of triples: Rules and Statements. Rules are phrases like “sky has_color”, while statements add a value to the phrase, e.g. “today’s_sky has_color blue”.
They make the interesting point that the reclassification of data from flat files to XML to RDF to Description Logics starts to dilute “the distinction between data management and data analysis”. While it is true that if you are able to store your data in formats such as OWL-DL3, the format is much more amenable to direct computational reasoning and inference, perhaps a more precise statement would be that the distinction between performance of data management tasks and data analysis tasks will blur with richer semantic descriptions of both the data and their applications. As they say later in the paper, once the data and the applications are described in a way that is meaningful for computation, new data being deposited online could automatically trigger a series of appropriate analysis steps without any human input.
A large focus of the paper was on identity, both of the people using it (and therefore addressing the user requirement of a strong permissions system) and of the entities in the model and database (each identified with some type of URI). This theme is core to ensuring that only those with the correct permissions may access possibly-sensitive data, and that each item of information can be unambiguously defined. I like that the sharing of “permissions between data elements in distinct S3DB deployments happens through the sharing the membership in external Collections and Rules…not through extending the permission inheritance beyond the local deployment”. It seems a useful and straightforward method of passing permissions.
I enjoyed the introduction, background, and conclusions. Their description of the Semantic Web and how it could be employed in the life sciences is well-written and useful for newcomers to this area of research. Their description of the management model as composed of subject-predicate-object RDF triples plus membership and access layers was interesting. Their website was clear and clean, and they had a demo that worked even when I was on the train4. It’s also rather charming that “S3DB” stands for Simple Sloppy Semantic Database – they have to get points for that one5! However, the description of their S3DB prototype was not extensive, and as a result I have some basic questions, which can be summarized as follows:
- How do they determine what the interoperable elements of different data structures are? Manually? Computationally? Is this methodology generic, or does it have to be done with each new data type?
- The determination of the maturity of a data format is not described, other than that it should be a “stable representation which remains useful to specialized tools”. For instance, the mzXML format is considered mature enough to use as the object of an RDF triple. What quality control is there in such cases: in theory, someone could make a bad mzXML file. Or is it not the format which is considered mature, but instead specific data sets that are known to be high quality?
- I would have like to have seen more detail in their practical example. Their user testing was performed together with the Lung Cancer SPORE user community. How long did the trial last? Was there some qualitative measurement of how happy they were with it (e.g. a questionnaire)? The only requirement gathered seems to have been that of high-quality access control.
- Putting information into RDF statements and rules in an unregulated way will not guarantee a data sets that can be integrated with other S3DB implementations, even if they are of the same experiment type. This problem is exemplified by a quote from the paper (p. 8): “The distinct domains are therefore integrated in an interoperable framework in spite of the fact that they are maintained, and regularly edited, by different communities of researchers.” The framework might be identical, but that doesn’t ensure that people will use the same terms and share the same rules and statements. Different communities could build different statements and rules, and use different terms to describe the same concept. Distributed implementations of S3DB databases, where each group can build their own data descriptions, do not lend themselves well to later integration unless they start by sharing the same ontology/terms and core rules. And, as the authors encourage the “incubation of experimental ontologies” within the S3DB framework, chances are that there will be multiple terms describing the same concept, or even one word that has multiple definitions in different implementations. While they state that data elements can be shared across implementations, it isn’t a requirement and could lead to the problems mentioned. I have the feeling I may have gotten the wrong end of the stick here, and it would be great to hear if I’ve gotten something wrong.
- Their use of the rdfs:subClassOf relation is not ideal. A subclass relation is a bit like saying “is a”, (defined here as a transitive property where “all the instances of one class are instances of another”) therefore what their core model is saying with the statement “User rdfs:subClassOf Group” is “User is a Group”. The same thing happens with the other uses of this relation, e.g. Item is a Collection. A user is not a group, in the same way that a single item is not a collection. There are relations between these classes of object, but rdfs:subClassOf is simply not semantically correct. A SKOS relation such as skos:narrower (defined here as “used to assert a direct hierarchical link between two SKOS concepts”) would be more suitable, if they wished to use a “standard” relationship. I particularly feel that I probably misinterpreted this section of their paper, but couldn’t immediately find any extra information on their website. I would really like to hear if I’ve gotten something wrong here, too.
Also, although this is not something that should have been included in the paper, I would be curious to discover what use they think they could make of OBI, which would seem to suit them very well6. An ontology for biological and biomedical investigations would seem a boon to them. Further, such a connection could be two-way: the S3DB people probably have a large number of terms, gathered from the various users who created terms to use within the system. It would be great to work with the S3DB people to add these to the OBI ontology. Let’s talk!
Thanks for an interesting read!
1. Yes, I’ve mentioned to the UniProt gang that they need to re-jig their axes in the first graph in this link. They’re aware of it!
2. Although I shouldn’t talk, I am horrible at naming things, as the title of this blog shows
3. A format for ontologies using Description Logics that may be saved as RDF. See the official OWL docs.
4. Which is a really flaky connection, believe me!
5. Note that this expanded acronym is *not* present in this PloS One paper, but is on their website.
6. Note on personal bias: I am one of the core developers of OBI
Helena F. Deus, Romesh Stanislaus, Diogo F. Veiga, Carmen Behrens, Ignacio I. Wistuba, John D. Minna, Harold R. Garner, Stephen G. Swisher, Jack A. Roth, Arlene M. Correa, Bradley Broom, Kevin Coombes, Allen Chang, Lynn H. Vogel, Jonas S. Almeida (2008). A Semantic Web Management Model for Integrative Biomedical Informatics PLoS ONE, 3 (8) DOI: 10.1371/journal.pone.0002946
Z. Zhang, K.-H. Cheung, J. P. Townsend (2008). Bringing Web 2.0 to bioinformatics Briefings in Bioinformatics DOI: 10.1093/bib/bbn041
Last week I attended the first RSBI (Reporting Structure for Biological Investigations) Workshop, carrying with me a multitude of hats. RSBI is a working group committed to the progression of standardization in multi-omics investigations. The purpose of the workshop was to examine and offer suggestions on the initial draft of ISA-TAB (more on that in a moment).
My first hat was a FuGE-user's hat, as the triumvirate of standards upon which RSBI is built is the Functional Genomics Experiment Model (FuGE), the Minimum Information for Biological and Biomedical Investigations (MIBBI) Project, and the Ontology for Biomedical Investigations (OBI). I was asked to give a current status update on FuGE itself, and on any communities that have already built extensions to FuGE. Andy Jones from Liverpool provided me with all of the hot-off-the-press information (my FuGE slides) – thanks Andy!
My second hat was a SyMBA-developer's hat. SyMBA uses FuGE to build a database and web front-end for storing data and experimental metadata. We use it in-house to store all of our large, high-throughput 'omics data. The use of FuGE in the system made it relevant for the workshop (my SyMBA slides, more SyMBA slides).
My final hat was a CISBAN-employee's hat. I work in the Wipat group there, and CISBAN is one of the "leading groups" involved in RSBI. As such, I was CISBAN's representative to the workshop.
The reason for the workshop, as stated earlier, was the evaluation of ISA-TAB, a proposed tabular format whose purpose is to provide a standard format for data and metadata submission into the formative BioMAP database at the EBI. ISA-TAB would have two uses:
- Humans: As a tabular format, it is quite easy for people to view and manipulate such templates within spreadsheet software such as Excel.
- Computers: As an interim solution only, ISA-TAB would be used as a computational exchange format until such time as each of the FuGE-based community extensions are complete for Metabolomics, Proteomics, and Transcriptomics. At this time, ISA-TAB would remain available for human use, but there would be a conversion step into "FuGE-ML".
The scope for ISA-TAB is large, and this was reflected in the attendees of the meeting. Representatives from ArrayExpress, Pride, and BioMAP were of course present, but also attending were people from the Metabolomics community, the MIACA project, toxico- and environmental genomics, and the FDA's NCTR.
A full write-up of the results of the workshop will soon be available online at the project's RSBI Google Group, so I'll leave it there. It was an exciting meeting, with fantastic food and even better discussions on getting public databases organized quickly for simple, straightforward multi-omics investigation data and metadata submission.
You can contact the RSBI via firstname.lastname@example.org.
Once again, the best way to view these highly discussion-centric notes is via the combined notes of me and Helen, which you can find on the OBI Wiki. Enjoy! We got through lots of agenda items, and also made a dozen or so milestones, which are now up on the OBI Google Calendar, as well as in the meeting notes. These milestones will eventually be put up on the official Milestones page.
Today it's not worth it to show just my notes and also direct you to the wiki, as the wiki notes are far more complete. So please just visit the OBI Wiki to find out all about the work we did today.
A full version of the notes (which means both my notes and Helen Parkinson's, plus any changes made by the group), can be found on the OBI Wiki.
10 July 2007 – OBI Workshop Day 2
The morning session was all about summarizing the work done in the branches since the last workshop.
Speaking about the relationships we want to introduce. BS states that we should be careful not to add too many relations: the DL version of SnoMed has 108 relations currently. BB says that text definitions for classes and properties are just there to help the humans: restrictions should also be present in OWL, in a manner that matches the text definition as much as possible.
BS: Will OBI, like GO, have Class-level relationships, or will it only have individual-level relationships? AR and others: we would like to have no class-level relationships. GF: In OWL, you can define class-level relations, but it's not a good idea (brings us towards OWL Full). BS: There is a simple OBI statement "PA_Chromium_release has_participant radioactive_chromium" doesn't sound like a relation between instances, but instead between "any old" chromium etc. What it should say it "Every instance of chromium_release_assay has, on the instance level, some instance of radioactive_chromium." that means two relations for everything. CS: You're talking syntactic sugar: saying the same things, but want to have "your" way of saying it be the correct one. AR: The disagreement is that he thinks it's syntactic sugar and BS doesn't. BS actually agrees that it's syntactic sugar, but only if we really know what we're doing: i.e. we need to be VERY sure that, if we decide to only do instance-level relations, that is what we should be sure that we're always doing. General agreement. AR: you can think of these as extra axioms. We should audit all relations to make sure that they're all instance-level reactions. E.g. have axiom on chromium_release_assay that it must always have a reagent, and then make the appropriate notes in the relations used to express that. BS: there is no "problem" in OBO in that you can only use class-level relations. HP made a list of action items for this.
Data Transformation Branch (Presented by Tina Boussard)
A large number of their initial terms went into PA. AR says that some classes should be under linear and non-linear transformation, whereas now they are siblings. There's also the fact that mathematical functions aren't necessarily PAs (parent of Data Transformation). They are functions/methods which might be better elsewhere. E.g. functionX has_role linear transformation. BS: OBI shouldn't be "doing mathematics", as it isn't central to OBI's mission. We should find ===somebody=== who can take care of the central maths that we need. Also, don't use the same noun at the end of a long list of children: shows that should make a parent class of that group. Also, ensure all classes are singular! Also, we need to be very careful about child/parent relationships: a Transformation is NOT a Transformation Method: it is a transformation! LF: Methods aren't PA, they're plans/protocols. Also, she says errors are not PAs, they might be qualities instead. JF says developers have agreed that errors should be measurements.
BB: Peter Lister and Karen Skinner oversee NCBC started out as a resource ontology/cv – just stuff we need to use somewhere. As that comes in, what is being done in data transformation will have to be reconciled for it. BB will send around url to its current state. Daniel Rubin worked on it. Intended to use by certain NIH places as a core resource. Don't forget to constantly check for other people who are working on these child ontologies: the data transformation workshop has invited these people.
DS: Can we indicate such helper classes / administrative classes in a way we immediately see that they 'live' on another level ? e.g. '_unclassified'….. using the underscore at the beginning….? BB: should be a new enumeration in the curation_status field. AL: we should add it to the Wiki now.
CS: Is normalization a role or a transformation? He thinks it is a transformation. It consists of various types of transformation that are constrained in particular ways, with parameters. There is a particular goal to be achieved. Most people who use normalization think of it as a data transformation. AR: The issue is how to define archsine_transformation from _function and _plotting. Will this be a problem? BS: I think it will be a problem, and we hope we can find someone who's solving it on the mathematical ontology side, as it only should be imported from another ontology elsewhere. AR: say archsine is imported from somewhere else. How do we make archsine transformation and normalization (two different things). BS: The classification of applications is something OBI could reasonably do. AR: This would look like "normalization" and "plotting", and then archsine_normalization would be a defined term. But then, where do normalization and plotting live? Processes? HP say yes, processes. BS: Are you saying normalization is a mathematical abstract, but for OBI's purposes it isn't that abstract thing, it's the process of getting the new data. AR: If you make them as defined classes that, when classified, would end up here. However, they shouldn't be explicitly asserted here, as then we would end up with multiple inheritance. AR will put all of the metadata annotation properties into its own OWL file. AL: we need an action item to add a couple of new phrases to the enumerated list for curation_status.
Instrument Branch (Presented by Ally Lister)
Notes from workshop: what are the boundaries on this term? There are instruments that fit this definition? What's a good use-case, or can we just toss it as it equals the instrument definition? BS: the granularity issue: the OBI domain encompasses several granularities, and we may deal with multiple granularities in the same annotation project, so there will be problems. We know the appropriate granularity for dealing with instruments. There are some things that are truly object aggregates. There are objects, parts of objects, and object aggregates in everything we do. It sounds odd but should be that way. Most people using obi will not be using the whole thing. AR: there should be a distinction between what you can buy as a single item, as opposed to something that takes up multiple rooms and has to be put up specially. CS: we don't call the latter a platform, but instead a laboratory, and therefore wouldn't be needed. We need two types of platform: one that you're not going to take apart, and another that allows you to adjust the parts. however, some things you build yourself is only because it isn't a mature technology yet – it would become a platform when it's mature. AR: should only be called a platform when it's mature. BB: some platforms are only software. PA: the difference between platform and group of instruments is that a platform is a group that has been put together for a specific purpose. Plate reader is an instrument that is used in the context of many different platforms. Included in the definition is some link to either Plan or PA.
General consensus that we should promote everything and remove artefact_object. Also, remove device and move up instrument and labware.
BS: Having many children is fine, just make sure you have no redundancy. Then after they're in, see if there's any way to bundle them. AR has seen people put defined classes into owl:THing but this isn't very good. You could make "bogus" classes under instrument. Not good answers – don't really want ugly pseudo classes. AR: someone will
find it useful to have aliquid handler term. RS: The instrument's function may change, but its structure might not, so we definitely shouldn't use their function.
Role and Digital Entities Branches (Presented by Jen Fostel)
Ontology of Clinical Investigations (OCI) (Presented by Jen Fostel)
Covers clinical trial and clinical research. Other groups in this area include CDISC, RCT, IFOMIS, Epoch (the only one that calls itself an ontology), BRIDG, and HL7 (an ANSI standard). The scope of CIO would encompass: the legal terms and minimal information (CONSORT) for clinical trials, clinical research, and administrative terms. Would like to align terms from other efforts with OBI. In OCI, all of the subjects are human. Other groups are doing non-human (e.g pre-clinical or non-clinical efforts). These efforts should be considered – we want them all to use "subject", for example, and mean the same thing. The ontology for clinical trials is in production by the UK cancergrid project. Their aim is to develop a useable ontology by 4Q07 that can be deposited in OBO. JF applauds the effort and hopes that they can all work together. As good as all these efforts all, there is definitely a feeling of "sporting competitiveness" too.
OCI will focus for now on translational research, e.g. clinical research as opposed to trials. The collected terms are from the CDISC glossary (35-page pdf file), STDM (standard tabulated data model: how you would share your files with the FDA), UTSW, MUSC, and RCT. They've organized them all and removed duplicates, and loosely categorized them within the OBI hierarchy. They're now in the process of refining definitions, and have shared terms with the roles and digital entities branches.
OCI would be part of OBI, but there would be a document which contains all of OCI for the benefit of that community. This matches as a 4th-tier owl file as discussed yesterday, or does it? Would OCI be a subset of OBI, or a superset of OBI? CC will show us their OWL file during the next talk. Plan a workshop next year to bring together all efforts for discussions. They have a google group, OCInv@googlegroups.com. CONSORT is already in MIBBI.
===Discussion of the Current OCI File (Presented by Christian Cocos)===
Should OCI be developed in the OBI namespace, or should it be developed separately? We can see from the OWL file, there is overlap with the branches currently in development. The idea is to eventually move *everything* to OBI. No end-user of OCI will even see the namespace, and will just be working with a UI. The OCI WG should be a working community in OBI, and there should not be two independent efforts. But, in the end, should OBI and OCI "appear" to be separate entities, e.g in papers?
Plan Branch (Presented by Phillippe Aldebert)
Protocols, Algorithms, and Study Design. However, Protocols were left out of their work due to the overlap with PA. We will need to come back to this later on, though.
What to do, in general, about adding terms quickly? Many of the terms are suffixed with "parent" words, like _design and _study. BS doesn't like this sort of naming, however some of these suffixes are very important. What should be done? Well, just ensure that the suffix clarifies and is not redundant. JF: If you have a design that included something that was part of the trial with the role of "placebo", then you don't need "placebo_design" as a term, as this could be inferred. You don't need to explicitly say it twice. AL: this is the same problem as we face with the Instrument branch and terms like "liquid_handler". BS: In the definitions, sometimes you use the word "trial" and somewhere you use "study", and this needs to be cleaned up. Offspring study and Parent study are the same study with two different subjects. Therefore instead, what there should be is a good classification of the subject time, and then just link a study to a particular subject type.
BS: Identify the 7 or 9 or 3 essential features that every study should have, e.g. subjects. Pick one that is central, and then assert a single-inheritance hierarchy on that basis. All other features should be put into their own single-inheritance hierarchy. Then use the reasoner to generate all of the appropriate associations and multiple inheritance on the fly. We may end up with the bottomless pit problem, though. We have to find a way of making it clear it isn't a bottomless pit. It should be clear that there is a principled way for finding places for these terms. JF: different people structure these things with different "primary" classifications. AR: use "faceted" browsers. Classic example is travel destination, where you may want to browse by either sport, or location, or family friendliness. Each of these facets are different relations whose target is these other single hierarchies BS mentioned. AL: Where do we start the defined classes, and where do we end the "standard" classes? AR: Should avoid "hardness" as long as possible. Could have no asserted isA until the last step, and the infer all isA's, and see how it plays out, and *then* choose the primary asserted hierarchy. A lot of this work will be integral to the Function Branch work, which BB will cover shortly.
PA: PATO will deal with biological qualities, but not non-biological ones like randomized or control, or qualities of instrumentation like "switched on". Such terms should go in OBI, at least for now.
Biomaterial Branch (Presented by Susanna Sansone)
72/315 terms were actually relevant to the biomaterial branch. There is a dispatcher sheet that is now on google docs. Have started refining definitions, adding examples, and making terms compliant to naming conventions (the latter is still to be done). Most of the information is in an excel spreadsheet. They don't know what to do about "quasi" material terms like lot number and serial number. What to do if a term is present in more than one external resource? How do we point at multiple sources of terms? Where should we put in genetic information like allele, diplotype. They are also thinking that they should probably also extend biomaterial to other types of material.
First division was between experimental and natural biomaterials, but after a little time it was clear this wasn't the right line to draw: for example, where to put transgenic organism? Also, just binned things like allele and haplotype into a single class, even though they don't really know where to put them. Many of the genotype specification are already in PATO. And currently, things like dominant and recessive are not yet in PATO. And, even though we can get PATO into OBI really easily, ontologies like SO are harder to fit in, as their structure doesn't mesh with OBI yet, so we can't just import them directly. AR: if we can get it in promptly, we do it. If we can't, then we put a specialist ontology term directly within OBI. Also, for links out to other ontologies, we don't use an OBI term ID but the ID from the other ontology. Note that this does NOT mean that we import the entirety of the hierarchy above or below that particular external term, just that (for now) we need to represent that particular, single, term.
AR: Population and cohort's current placement is not their final location. Diplotype is a quality of a sequence.
AR: If you have a defined class for whole_mount, you can say whole_mount organism is exactly those organisms that are the output of PA_XYZ. BS: A whole_mount organism is an organism playing the role of whole_mount. CS: But once it goes through PA XYZ, it is an entirely new entity. This entity can take on other roles, such as garbage, but it is a new entity. BS: It is simple to do what I propose, as you just have to add a phrase about the role to the defined class definition (necessary
& sufficient statement). BB: What we're trying to do is figure out where to put in that a biomaterial has been experimentally manipulated. BS: We need to have this role as, for example, you need the role to properly classify patients. You can't just say patients are people who have registered, as that would fit with people who once registered but are no longer true patients (e.g. they went home). RS: whole_mount is irreversible – once you are a whole_mount, you can't go back. For this reason, you should not use roles, as roles are for states that can be easily changed to one thing and then back again and then on to another thing. BS: You can have an organ section that does not play the role of the specimen (e.g. your dinner of part of a liver), therefore specimen is a role. As such, you need to define whole_mount with such role restrictions. CS: whole_mount cannot be a role, but in its definition I'm ok with it indicating that it plays the role / always plays the role of specimen. BB: to reinforce this, you could rename ExperimentalBiomaterial to BiomaterialSpecimen. JF: thinks biomaterial is actually a role, e.g. a fly before it is in an experiment is not a biomaterial. AL: A fly is always a biomaterial, irrespective of whether or not it is in an experiment.
Function Branch (Presented by Bill Bug)
Meant to provide a BFO-based definition of function for investigational artefacts, including instruments, reagents, and assays. They analyzed the BFO definition of the related realizable dependent continuants function, disposition and role, and defined how they are going to work with closely-related branches like role. They have a few examples/use-cases for function on the wiki page. They use as a primary example how to create the function of an HPLC system (high pressure liquid chromatography). He then used a well-written slide to show what minumum relations would be required to get the function correct for HPLC. See his slides on the OBI Wiki for more information.
BS: How are you distiguishing functions and roles from this particular example slide? BB: I don't think there is a clear distinction – it's not clear that function and role are distinct. BS: This is a BFO responsibility. BB: Well, the separation process is distinct. BS: whenever you have function, you should describe the process that is the functioning. A crucial feature of function is that you can have a function without realizing it (this statement applies equally to role). One test of a function is to see if it is possible to still be itself when it isn't functioning. When we say an algorithm has a function, we're not using it in the BFO sense. Functions have to involve realizations as a possibility, which means the thing that has the function has to be such that it can engage in causal processes. BB: I don't think that will work. BS: A laptop has a function, a heart has a function. The pumping of your heart is the exercise of the functioning of your heart. You cannot realize the pumping because the pumping is itself a realization. I think there are assays, but assays themselves do not have functions in the BFO sense. (Assessment is also a problem for him). Generally speaking, occurrents do not have functions. (And then BS had to leave the workshop.)
CS: I think of algorithm as a plan. AR: But you can't have a plan that has a function as they are both realizable entities. But still, perhaps algorithm is a plan, but you couldn't give it a function.
BB's slides include various cardinal parts of instrument, which CS describes as not necessary for the "molecular separation function", just required for the HPLC instruments.
BB: Should look at CC's proposal to use Systems Theory to provide a framework for defining function. It would be a relatively superficial application of ST.
CC says that in order to specify functions you don't need context, but for roles you do need context. BB: Functions imply some primacy, so perhaps each thing only has a single function?
CC: drafted by BS last year to write the "RO 2" paper. Some paradigm examples include:
- kidney UNDERGOES excretion process. There is disagreement on this term, as a kidney does not undergo that process. Blood undergoes filtration, and urine undergoes creation process. In the sense CC has written it is "participates", or has_function.
- excretion process HAS_PARTICIPANT nephron
- excretion function IMPLEMENTED_BY kidney/kidney IMPLEMENTS excretion function.
- kidney HAS_OUTPUT urine / urine OUTPUT_OF kidney. RS: We've defined processes as having inputs and outputs, and CC has a continuant with inputs and outputs. Urine is the output of a process, in the same way as chocolate bar is the output of the chocolate production process, whereas others would say it is the output of the factory.
BB: we want your help implementing functions and relations in OWL. RS: There is confusion in this example, as many organs don't have outputs. CC: Actually, I think all organs have outputs.
BB: As people in other branches hit functions (especially the Instrument branch) please go to the function wiki page and add the example to that page.
Protocol Application Branch (Presented by Alan Ruttenberg)
The group started out trying to figure out what relations should be used. HP: the clinical_diagnosis definition should not have "determination", but instead "assessment", as you may not always get your diagnosis right. RS: Doesn't think tumor grading is an assay, as there is an interpretive step that's not being captured. AR: The output is not material, but instead is information. RS: In tumor grading, the input and output are both data. Perhaps should be moved to Data transformation.
CS: Could delineate material combinations based on whether or not it is a pooling of samples. RS: You should structure according to pooling, partitioning, and transformation. BB: Perhaps shouldn't use transformation in material_transformation – should use a word that more precisely meets the definition.
Today was the first day of the OBI workshop. Here are my personal notes on the day. The official notes can be found on the OBI Wiki.
First talk was by Bill, and described OBI itself.
The main questions raised by CS in this discussion section were: Who's missing from OBI that should be involved? Any criteria to decide who to target? What incentives should we be trying to provide to join us?
The audience wondered what OBI means by "genomics" community, as it's a very broad topic. Further, many of the communities described overlap. CS replied with the following examples: the eventual replacement MGED Ontology and BIRNLex with OBI, and the RADLex project for the MRI community, e.g. Daniel Rubin.
It is difficult to get money for funding this work, as grant people won't generally give money for ontology curators. AR mentioned that money should be provided to develop the *skill* of ontology creation and curation. He wants to establish a teaching program.
Someone in the audience later made the statement that they (or a subset of users) might want to only use a minimal set of the ontology. BB mentions the MIA* efforts, and using them in the context of the ontologies. Also, members of the audience suggested that OBI could be used in a number of efforts, including the CaBIG (Cancer Bioinformatics) community, and the NCI Thesaurus. AR also says you could try to invest part of people's times, rather than getting specific funding for the entirety of a FTE (full-time equivalent). Another topic brought up was the Clinical Trials community – what can we show management? Does OBI have any good examples for them? This was brought up again later in the day, when the OBI developers thought of a number of good OBI use-cases (see below).
Next Talk: Chris and the "ecosystem" of biomedical standards.
Next Talk: Susanna and MIBBI.
The main questions raised by CS in this discussion section were: How should efforts such as OBI be funded? Encourage communities to make it a budget item? Put it in an OBI-focused resource grant? development, infrastructure, and training are three separate funding areas. Role of the NCBO? Currently serves as advisors, and provides tools and methodologies, not support for building.
Audience asked what is the real point of OBI? How to use it? Plenty of examples, like science commons and neuro commons, journals (e.g. tagging articles or sections of articles), alan's work, CISBAN DPI, ArrayExpress to GEO mapping will be a lot easier with the core of OBI developed. Suggestion of AHA, etc. for funding, as long as we can give good examples of the usefulness of OBI to these communities.
How will the world be different when OBI is complete? Provides method for data exchange and for correct analysis and searching over a large corpus of investigations. People will use MIBBI to discover if there is already a minumum information checklist. If there isn't anything there, they have to make their own MIA*. But how will they know how to do this? Look at MIBBI and get started: this sort of thing needs to be written up. There is a need for guidelines on how to do this sort of work. Publication costs money, but if you treat putting data into FuGE format as a publication of your data in electronic format, it could be a useful way of adding such work into the grant proposals.
AR: Changing incentives requires pushing from either journals or funding agencies to say this must be done. Secondly, a workforce that is able to do this sort of encoding does not yet exist. OBI is the start of this training. OBI promises (with a common language for describing results etc) a situation where integration and searching of genomic-scale datasets will be quicker than before. The interest isn't in the individual investigators, but in the people who fund the investigators, knowing they will get more for their money by using OBI.
Do we want to model raw data or "final research data"? Makes a big difference to the cost of using something like OBI. Everything should be included in the long term, including LIMS.
Reiteration of importance of use-cases (how to use OBI) from the point of view of the people who would use OBI. Inevitably, the response to "Our institute should use OBI" is, "What is the benefit to us"?
Formalizing how the advisors get credit for OBI. Have it "offline" as a little subgroup and present the results.
Other discussion topics for this week: SOP and reasoning, svn and branching "clinic" (Alan), how to organize OBI when mature, to make it easier for users to use it.
GOING THROUGH PREVIOUS MILESTONES:
Then we went through the milestones that we had created at the last workshop. Most milestones have been completed, but a few are not ready yet. The April 1st milestone of community submission of terms is complete (even though individual branch editors are adding terms during the development process as they see fit, which is important). An April 14th deadline was to review preliminary community OBI versions, but this is dependent upon the submission of terms being completed, which in this case will probably happen with the first release of the OBI core.
Then we talked about our policy in terms of multiple inheritance. Alan said one possibility would be to make a defined (necessary & sufficient) class that is not necessarily in the "real" hierarchy. Inference would place it in the right location. Example is Diploid Cell, which could go into multiple locations in the single hierarchy. Further, it may go into an external ontology (diploid could be a quality, which equals PATO). This will be discussed in its own session later in the week.
May 1st had a couple of milestones. The first is to present the proposal for environmental/medical/other history. Jen reported. Barry mentions Geo.obo (thought up by Michael Ashburner), which is an obo foundry. There is also EnvO (Environment Ontology), which Dawn Field, for instance, plans to use within the GSC framework. They are both OBO Foundry, and are primarily devoted to children of the BFO class Site. If they are both OBO, then how will we keep them orthogonal? Geo is for annotating *real* geographic locations (already begun, large-scale things like "Poland"), and EnvO (planned with funding, but not started) for terms like habitat and oral cavity (small-scale things like those kinds of entities where organisms live). Geo has a workshop at the end of August. Michael's Geo sort of popped up suddenly. A lot of Jen's terms have been subsumed into the EnvO. Laboratory or clinical artefact may go back to OBI. However, ultimately, those laboratory terms are still environments. We may develop them initially, but then submit them to the EnvO. Further discussion of this has been added to the agenda for this week.
The second May 1 milestone is the proposal for process – how to link to ontologies / terms / free text entries apart from canonical OBI links. The main point here is that we should/must reuse other ontologies where available. Will probably have a breakout session about this. Perhaps it is best a task to give to the Relations Branch.
The June 1 milestone of review of placement of community terms will be covered with the branch updates given tomorrow.
July 1 was the finalizing of terms into branches, which hasn't quite been reached as we are still working onthe branches – it took a while to get subversion sorted. The July 9th milestone of re-merging branches will no longer be necessary as we'll be keeping the branches for a while.
Another 9 July milestone was to have the deprecation policy finalized. Alan had a proposal about where to put deprecated terms – into a separate import file – so that "norma
lly" you wouldn't see them, but could import them if you want to see them. Will talk about the deprecation policy this week too.
This led into a longer discussion of versioning, history, and deprecation. Versioning is a lot more complex than deprecation, but Alan argues that you can't have deprecation without history. GO has a versioning policy. Should be documenting ANY change – spelling, add annotation etc. Both what and why the change was made. Barry suggests that each time any change has done, you should create a new ID. Alan says that this imposes a larger burden on the user. Bill and Ally agree that only semantic changes should make ID changes – syntatic changes shouldn't. Alan points out what happens if you have a closure axiom over a group of terms, and then you need to add a term, or remove the closure axiom. Is that a semantic change? Alan suggested that we not worry about it until we have a stable core. Perhaps a subgroup should set up a proposed policy and send it around. Bill suggests an intermediate milestone of 3 weeks where *everyone* would submit any use-cases / examples they want considered when building the requirements list for this policy. The policy should be ready for the next workshop.
Phillippe made the point that the first OBI core will be a beta, and should be announced as such. However, we should also present use-cases of how to use OBI, as this was a major point made by the guests this morning. Should definitely be added as a new milestone. Bill mentioned BrainMap.org, which uses CVs to try to get info from neuroimaging studies.
Examples of use-cases: data annotation, text mining, data aquisition, querying/searching (sparql?). Alan has a triple-store in science commons. We can load up OBI and data that has been annotated with OBI into it, and then Alan can write queries against it – another good example use-case.
Barry: equipment/instrument branches should be made in tandem with the vendors. This is already happening now via the PSI community, which already has links to vendors. Alan already has info on plasmids that's "waiting for an OBI makeover".
Over the next few days, we will do agenda/discussion items if we hit walls after working on the ontology. Items which have time constraints on them (based on when specific people arrive and leave) have been placed in the agenda at appropriate times. The updated agenda, as well as the combined minutes of Helen and myself, are up on the OBI Wiki (https://wiki.cbil.upenn.edu/obiwiki/index.php/Meeting_notes_and_report).
In the remaining 20 minutes, we talked a little about Matt Pocock's proposal for the final organization of the owl files for obi. His email can be read here: http://sourceforge.net/mailarchive/message.php?msg_name=200707091513.52789.matthew.pocock%40ncl.ac.uk
This leads to larger questions of what we want OBI to do, and what users we're aiming at. Programmers? People who want to reason and assert? Biologists who only want to browse? OLS (ontology lookup service) only works with OBO, and the NCBO's portal (an ontology browser) only works with single OWL files. Bill will send around the email he used when he contacted the NCBO to get BIRNLex to work with their browser (BIRNLex also has multiple OWL files), and we can send a similar one that is specific to OBI, to let NCBO know that we would really appreciate being able to use their service for OBI.
It would be good to have multiple verisons of the "Tier 3". Would also be good to have simple text files, that just have tab-delimited class and definition pairs. This means it would be nice to run a set of scripts that would make any of the "simple" files we want, perhaps at every svn commit.
We should have 15 minutes that covers what Protege 4's status is, and what it looks like.