Today was the first day of the OBI workshop. Here are my personal notes on the day. The official notes can be found on the OBI Wiki.
First talk was by Bill, and described OBI itself.
The main questions raised by CS in this discussion section were: Who's missing from OBI that should be involved? Any criteria to decide who to target? What incentives should we be trying to provide to join us?
The audience wondered what OBI means by "genomics" community, as it's a very broad topic. Further, many of the communities described overlap. CS replied with the following examples: the eventual replacement MGED Ontology and BIRNLex with OBI, and the RADLex project for the MRI community, e.g. Daniel Rubin.
It is difficult to get money for funding this work, as grant people won't generally give money for ontology curators. AR mentioned that money should be provided to develop the *skill* of ontology creation and curation. He wants to establish a teaching program.
Someone in the audience later made the statement that they (or a subset of users) might want to only use a minimal set of the ontology. BB mentions the MIA* efforts, and using them in the context of the ontologies. Also, members of the audience suggested that OBI could be used in a number of efforts, including the CaBIG (Cancer Bioinformatics) community, and the NCI Thesaurus. AR also says you could try to invest part of people's times, rather than getting specific funding for the entirety of a FTE (full-time equivalent). Another topic brought up was the Clinical Trials community – what can we show management? Does OBI have any good examples for them? This was brought up again later in the day, when the OBI developers thought of a number of good OBI use-cases (see below).
Next Talk: Chris and the "ecosystem" of biomedical standards.
Next Talk: Susanna and MIBBI.
The main questions raised by CS in this discussion section were: How should efforts such as OBI be funded? Encourage communities to make it a budget item? Put it in an OBI-focused resource grant? development, infrastructure, and training are three separate funding areas. Role of the NCBO? Currently serves as advisors, and provides tools and methodologies, not support for building.
Audience asked what is the real point of OBI? How to use it? Plenty of examples, like science commons and neuro commons, journals (e.g. tagging articles or sections of articles), alan's work, CISBAN DPI, ArrayExpress to GEO mapping will be a lot easier with the core of OBI developed. Suggestion of AHA, etc. for funding, as long as we can give good examples of the usefulness of OBI to these communities.
How will the world be different when OBI is complete? Provides method for data exchange and for correct analysis and searching over a large corpus of investigations. People will use MIBBI to discover if there is already a minumum information checklist. If there isn't anything there, they have to make their own MIA*. But how will they know how to do this? Look at MIBBI and get started: this sort of thing needs to be written up. There is a need for guidelines on how to do this sort of work. Publication costs money, but if you treat putting data into FuGE format as a publication of your data in electronic format, it could be a useful way of adding such work into the grant proposals.
AR: Changing incentives requires pushing from either journals or funding agencies to say this must be done. Secondly, a workforce that is able to do this sort of encoding does not yet exist. OBI is the start of this training. OBI promises (with a common language for describing results etc) a situation where integration and searching of genomic-scale datasets will be quicker than before. The interest isn't in the individual investigators, but in the people who fund the investigators, knowing they will get more for their money by using OBI.
Do we want to model raw data or "final research data"? Makes a big difference to the cost of using something like OBI. Everything should be included in the long term, including LIMS.
Reiteration of importance of use-cases (how to use OBI) from the point of view of the people who would use OBI. Inevitably, the response to "Our institute should use OBI" is, "What is the benefit to us"?
Formalizing how the advisors get credit for OBI. Have it "offline" as a little subgroup and present the results.
Other discussion topics for this week: SOP and reasoning, svn and branching "clinic" (Alan), how to organize OBI when mature, to make it easier for users to use it.
GOING THROUGH PREVIOUS MILESTONES:
Then we went through the milestones that we had created at the last workshop. Most milestones have been completed, but a few are not ready yet. The April 1st milestone of community submission of terms is complete (even though individual branch editors are adding terms during the development process as they see fit, which is important). An April 14th deadline was to review preliminary community OBI versions, but this is dependent upon the submission of terms being completed, which in this case will probably happen with the first release of the OBI core.
Then we talked about our policy in terms of multiple inheritance. Alan said one possibility would be to make a defined (necessary & sufficient) class that is not necessarily in the "real" hierarchy. Inference would place it in the right location. Example is Diploid Cell, which could go into multiple locations in the single hierarchy. Further, it may go into an external ontology (diploid could be a quality, which equals PATO). This will be discussed in its own session later in the week.
May 1st had a couple of milestones. The first is to present the proposal for environmental/medical/other history. Jen reported. Barry mentions Geo.obo (thought up by Michael Ashburner), which is an obo foundry. There is also EnvO (Environment Ontology), which Dawn Field, for instance, plans to use within the GSC framework. They are both OBO Foundry, and are primarily devoted to children of the BFO class Site. If they are both OBO, then how will we keep them orthogonal? Geo is for annotating *real* geographic locations (already begun, large-scale things like "Poland"), and EnvO (planned with funding, but not started) for terms like habitat and oral cavity (small-scale things like those kinds of entities where organisms live). Geo has a workshop at the end of August. Michael's Geo sort of popped up suddenly. A lot of Jen's terms have been subsumed into the EnvO. Laboratory or clinical artefact may go back to OBI. However, ultimately, those laboratory terms are still environments. We may develop them initially, but then submit them to the EnvO. Further discussion of this has been added to the agenda for this week.
The second May 1 milestone is the proposal for process – how to link to ontologies / terms / free text entries apart from canonical OBI links. The main point here is that we should/must reuse other ontologies where available. Will probably have a breakout session about this. Perhaps it is best a task to give to the Relations Branch.
The June 1 milestone of review of placement of community terms will be covered with the branch updates given tomorrow.
July 1 was the finalizing of terms into branches, which hasn't quite been reached as we are still working onthe branches – it took a while to get subversion sorted. The July 9th milestone of re-merging branches will no longer be necessary as we'll be keeping the branches for a while.
Another 9 July milestone was to have the deprecation policy finalized. Alan had a proposal about where to put deprecated terms – into a separate import file – so that "norma
lly" you wouldn't see them, but could import them if you want to see them. Will talk about the deprecation policy this week too.
This led into a longer discussion of versioning, history, and deprecation. Versioning is a lot more complex than deprecation, but Alan argues that you can't have deprecation without history. GO has a versioning policy. Should be documenting ANY change – spelling, add annotation etc. Both what and why the change was made. Barry suggests that each time any change has done, you should create a new ID. Alan says that this imposes a larger burden on the user. Bill and Ally agree that only semantic changes should make ID changes – syntatic changes shouldn't. Alan points out what happens if you have a closure axiom over a group of terms, and then you need to add a term, or remove the closure axiom. Is that a semantic change? Alan suggested that we not worry about it until we have a stable core. Perhaps a subgroup should set up a proposed policy and send it around. Bill suggests an intermediate milestone of 3 weeks where *everyone* would submit any use-cases / examples they want considered when building the requirements list for this policy. The policy should be ready for the next workshop.
Phillippe made the point that the first OBI core will be a beta, and should be announced as such. However, we should also present use-cases of how to use OBI, as this was a major point made by the guests this morning. Should definitely be added as a new milestone. Bill mentioned BrainMap.org, which uses CVs to try to get info from neuroimaging studies.
Examples of use-cases: data annotation, text mining, data aquisition, querying/searching (sparql?). Alan has a triple-store in science commons. We can load up OBI and data that has been annotated with OBI into it, and then Alan can write queries against it – another good example use-case.
Barry: equipment/instrument branches should be made in tandem with the vendors. This is already happening now via the PSI community, which already has links to vendors. Alan already has info on plasmids that's "waiting for an OBI makeover".
Over the next few days, we will do agenda/discussion items if we hit walls after working on the ontology. Items which have time constraints on them (based on when specific people arrive and leave) have been placed in the agenda at appropriate times. The updated agenda, as well as the combined minutes of Helen and myself, are up on the OBI Wiki (https://wiki.cbil.upenn.edu/obiwiki/index.php/Meeting_notes_and_report).
In the remaining 20 minutes, we talked a little about Matt Pocock's proposal for the final organization of the owl files for obi. His email can be read here: http://sourceforge.net/mailarchive/message.php?msg_name=200707091513.52789.matthew.pocock%40ncl.ac.uk
This leads to larger questions of what we want OBI to do, and what users we're aiming at. Programmers? People who want to reason and assert? Biologists who only want to browse? OLS (ontology lookup service) only works with OBO, and the NCBO's portal (an ontology browser) only works with single OWL files. Bill will send around the email he used when he contacted the NCBO to get BIRNLex to work with their browser (BIRNLex also has multiple OWL files), and we can send a similar one that is specific to OBI, to let NCBO know that we would really appreciate being able to use their service for OBI.
It would be good to have multiple verisons of the "Tier 3". Would also be good to have simple text files, that just have tab-delimited class and definition pairs. This means it would be nice to run a set of scripts that would make any of the "simple" files we want, perhaps at every svn commit.
We should have 15 minutes that covers what Protege 4's status is, and what it looks like.