Meetings & Conferences Semantics and Ontologies Standards

Review of OBO Foundry Principles at the OBO Foundry Workshop 2009

After the recent posts (listed here) in the lead-up to the OBO Foundry workshop, Duncan Hull, Melanie Courtot, and Frank Gibson led a discussion about the current state of the OBO Foundry principles yesterday.

The results of the discussion can be found on the OBO Foundry Wiki page.  It looks like there was a really positive outcome for this section of the workshop, with a lot of good points being raised. I encourage you all to go to this page, and then scroll down to the section entitled “Review of OBO Foundry Principle – Duncan Hull, Frank Gibson, Melanie Courtot”.

Thanks to Susanna-Assunta Sansone for taking the fabulous notes for both days!

Meetings & Conferences Semantics and Ontologies Standards

Rules or Checklist? Which would you prefer from the OBO Foundry?

[Update: Duncan’s written a call for comments on the OBO Foundry criteria on his blog. Also posting on this are Melanie and Frank. Take a look! Update 2: I should have called the 10 criteria “principles” rather than “rules”. My apologies. I think the title may be a little bit of a misnomer for the post. I’m not sure you need to choose between principles and checklists. It’s nice to have the “short and sweet” and the detailed.]

The OBO Foundry Workshop (OBO Foundry paper) is coming up this weekend, and Duncan Hull and I were talking about the 10 criteria the Foundry has for member ontologies. We had been wondering what sort of questions we would ask the OBO Foundry people if we wanted to see the 10 criteria “upgraded” to a minimal checklist for OBO Foundry ontologies in the style of MIBBI. As a result of that, here are my thoughts on each criterion. Perhaps some of these have been answered in mailing lists or elsewhere, but they’re not visible on the OBO Foundry site. Hopefully this post would be useful as a starting point for a discussion on more complete definitions and explanations for the minimal requirements of an OBO Foundry ontology.

Each criterion is reproduced in bold, with my opinions after in italicised text. For any further text present in the criteria list, please see the list page itself.

  1. The ontology must be open and available to be used by all without any constraint other than (a) its origin must be acknowledged and (b) it is not to be altered and subsequently redistributed under the original name or with the same identifiers.
    This is a license without a name or a strong structure. Is it a first attempt at an OBO-specific license? If so, it is too generic to be of much use. Alternatively, is it a requirements list for choosing an existing license? Or, as another option, are they suggesting that people choose their own licenses along these lines? I believe strongly that already-extant licenses should be used in biological research wherever possible. You can see a summary of a FriendFeed discussion and an email discussion with Science Commons in my blog post on Choosing a License for Your Ontology for my opinion on the subject.  Therefore I would suggest option 2, with the Foundry choosing an appropriate license (or shortlist of compatible licenses) as soon as they could.
  2. The ontology is in, or can be expressed in, a common shared syntax. This may be either the OBO syntax, extensions of this syntax, or OWL.
    Firstly, I would like clarification of what “extensions of the [OBO] syntax” means. Secondly, just saying “OWL” as a syntax is too vague; there’s OWL-Full, OWL-DL, and OWL-Lite, to name a few. Are all acceptable, or is the most commonly-used (OWL-DL) the one they want people to use?
  3. The ontologies possesses a unique identifier space within the OBO Foundry.
    Aside from the (nitpicky) statement that it should be either “The ontologies possess” or “Each ontology possesses”, this is one of the most useful criteria. However, a little more detail would be useful here. What should come after the prefix? An underscore or some other dividing character? The rest of the identifier without a dividing character? Should the OBO Foundry assign a prefix to avoid confusion? By the way, a paper has just been published about the *naming* conventions for the OBO Foundry which is interesting. This isn’t the same thing as this criterion, which is about unique identifiers, but it’s still worth a read.
  4. The ontology provider has procedures for identifying distinct successive versions.
    A little vague, but that probably cannot be helped, as you probably don’t want to legislate the type of versioning that takes place with each ontology. Links out to GO’s procedures or OBI’s procedures might provide some ideas to people who don’t know what versioning to use.
  5. The ontology has a clearly specified and clearly delineated content.
    The “domain” of the ontology, used in the further description of this criterion, is a vague term. Yes, we all want orthogonality, but that is difficult to achieve in practice and a clearer description of how people can achieve it might be useful. How are two terms expressing the same concept in the different ontologies resolved? Via the mailing list? Is there an established procedure? It’s easy to say that no two terms should be covering the same concept, but harder to check. There’s been some recent papers in finding similar concepts within a single ontology (e.g. 10.1093/bioinformatics/btp195) might be applicable to multiple ontologies.
  6. The ontologies include textual definitions for all terms.
    Good point. It would also be nice to say formal logic statements for classes would be useful (but not required), as it might help ensure the internal consistency of Foundry ontologies.
  7. The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.
    This says you have to define your relations “following the pattern” from the RO. Does this mean all your relations must be children of relations in RO, or just that you follow their style? Probably the latter, but this is unclear at the moment.
  8. The ontology is well documented.
    Definitely! But how? Where? In the ontology file? On a website? Does the OBO website provide the ability to have lots of documentation, or should it just be links out?
  9. The ontology has a plurality of independent users.
    I’m a bit of a failure here, as I don’t know what this means. I can think of at least 2-3 different ways of interpreting this. What are users in this context? What makes them independent? How can you tell what your users are?
  10. The ontology will be developed collaboratively with other OBO Foundry members.
    Great idea. But what if you can’t find anyone who wants to help? Does that mean you can’t develop? Again, perhaps this just means regular reviews of the developing ontology by other OBO members, but could be made clearer.

Most of these opinions don’t try to provide an answer, but instead just raise some questions that the attendees at this week’s workshop might like to have in their minds. If the OBO Foundry, which exists to “align ontology development efforts” doesn’t provide clear guidance, there is a risk that each member ontology would come up with their own answers, thus negating some of the benefits provided by their membership (quote from the Nature Biotech paper).

Have a great workshop – wish I had the time to attend this year!

Housekeeping & Self References Papers Research Blogging Software and Tools Standards

Modeling and Managing Experimental Data Using FuGE

Want to share your umpteen multi-omics data sets and experimental protocols with one common format? Encourage collaboration! Speak a common language! Share your work! How, you might ask? With FuGE, and this latest paper (citation at the end of the post) tells you how.

In 2007, FuGE version 1 was released (website, Nature Biotechnology paper). FuGE allows biologists and bioinformaticians to describe any life science experiment using a single format, making collaboration and repeatability of experiments easier and more efficient. However, if you wanted to start using FuGE, until now it was difficult to know where to start. Do you use FuGE as it stands? Do you create an extension of FuGE that specifically meets your needs? What do the developers of FuGE suggest when taking your first steps using it? This paper focuses on best practices for using FuGE to model and manage your experimental data. Read this paper, and you’ll be taking your first steps with confidence!

[Aside: Please note that I am one of the authors of this paper.]

What is FuGE? I’ll leave it to the authors to define:

The approach of the Functional Genomics Experiment (FuGE) model is different, in that it attempts to generalize the modeling constructs that are shared across many omics techniques. The model is designed for three purposes: (1) to represent basic laboratory workflows, (2) to supplement existing data formats with metadata to give them context within larger workflows, and (3) to facilitate the development of new technology-specific formats. To support (3), FuGE provides extension points where developers wishing to create a data format for a specific technique can add constraints or additional properties.

A number of groups have started using FuGE, including MGED, PSI (for GelML and AnalysisXML), MSI, flow cytometry, RNA interference and e-Neuroscience (full details in the paper). This paper helps you get a handle on how to use FuGE by presenting two running examples of capturing experimental metadata in the fields of flow cytometry and proteomics of flow cytometry and gel electrophoresis. Part of Figure 2 from the paper is shown on the right, and describes one section of the flow cytometry FuGE extension from FICCS.

The flow cytometry equipment created as subclasses of the FuGE equipment class.
The flow cytometry equipment created as subclasses of the FuGE equipment class.

FuGE covers many areas of experimental metadata including the investgations, the protocols, the materials and the data. The paper starts by describing how protocols are designed in FuGE and how those protocols are applied. In doing so, it describes not just the protocols but also parameterization, materials, data, conceptual molecules, and ontology usage.

Examples of each of these FuGE packages are provided in the form of either the flow cytometry or the GelML extensions. Further, clear scenarios are provided to help the user determine when it is best to extend FuGE and when it is best to re-use existing FuGE classes. For instance, it is best to extend the Protocol class with an application-specific subclass when all of the following are true: when you wish to describe a complex Protocol that references specific sub-protocols, when the Protocol must be linked to specific classes of Equipment or Software, and when specific types of Parameter must be captured. I refer you to the paper for scenarios for each of the other FuGE packages such as Material and Protocol Application.

The paper makes liberal use of UML diagrams to help you understand the relationship between the generic FuGE classes and the specific sub-classes generated by extensions. A large part of the paper is concerned expressly with helping the user understand how to model an experiment type using FuGE, and also to understand when FuGE on its own is enough. But it also does more than that: it discusses the current tools that are already available for developers wishing to use FuGE, and it discusses the applicability of other implementations of FuGE that might be useful but do not yet exist. Validation of FuGE-ML and the storage of version information within the format are also described. Implementations of FuGE, including SyMBA and sysFusion for the XML format and ISA-TAB for compatibility with a spreadsheet (tab-delimited) format, are also summarised.

I strongly believe that the best way to solve the challenges in data integration faced by the biological community is to constantly strive to simply use the same (or compatible) formats for data and for metadata. FuGE succeeds in providing a common format for experimental metadata that can be used in many different ways, and with many different levels of uptake. You don’t have to use one of the provided STKs in order to make use of FuGE: you can simply offer your data as a FuGE export in addition to any other omics formats you might use. You could also choose to accept FuGE files as input. No changes need to be made to the underlying infrastructure of a project in order to become FuGE compatible. Hopefully this paper will flatten the learning curve associated for developers, and get them on the road to a common format. Just one thing to remember: formats are not something that the end user should see. We developers do all this hard work, but if it works correctly, the biologist won’t know about all the underpinnings! Don’t sell your biologists on a common format by describing the intricacies of FuGE to them (unless they want to know!), just remind them of the benefits of a common metadata standard: cooperation, collaboration, and sharing.

Jones, A., Lister, A.L., Hermida, L., Wilkinson, P., Eisenacher, M., Belhajjame, K., Gibson, F., Lord, P., Pocock, M., Rosenfelder, H., Santoyo-Lopez, J., Wipat, A., & Paton, N. (2009). Modeling and Managing Experimental Data Using FuGE OMICS: A Journal of Integrative Biology, 2147483647-13 DOI: 10.1089/omi.2008.0080

Housekeeping & Self References Semantics and Ontologies Standards

“Blogging is Hard” Day: Repost of 2006 FuGO Workshop Day 1

According to the rules set down by Greg Laden over at Science Blogs, I have had a trawl through the blasts from the pasts that was my 18 months or older blog posts to find one that is “exactly in lie [sic] with the writing or research in which they are currently engaged”. I thought about my Visiting With Enigma post, which has a special place in my heart, but didn’t choose it in the end as it didn’t have anything to do with my current research. Instead, I ended up choosing my very first post on WordPress: FuGO Workshop Day 1. It may not sound like much, but there are a number of things recommending this particular post.

  1. FuGO was the original name for the OBI project, of which I’m still a part and therefore it fits with the requirement that I still am involved.
  2. This was my first introduction to ontologies, and happened just as I was leaving one job (at the EBI) and starting a new one (at CISBAN). Such an important change deserves another mention.
  3. I notice an earlier incarnation of my “be sensible” statement in this post, where I say that I learned from Richard Scheuerman that it is always a good idea to use “only those fields which would be of most use to the biologist, rather than those that would make us bioinformaticians most happy”.
  4. FuGO wasn’t the only thing that has since undergone a name change. This post also contained information about the “new” MIcheck registry of minimal checklists: this has continued to gain in popularity, and is now MIBBI.
  5. Just last week at the CBO workshop, and again in a short discussion on FriendFeed that led to longer real-life conversations (Phillip Lord’s paper that deals with this topic), there was a long discussion at the FuGO workshop about Multiple versus Single inheritance in ontologies. This was also my first introduction to Robert Stevens and Barry Smith, who both took center stage in the MI/SI discussion. Listening to Barry and Robert speak was really informative and interesting and fun!

What a fantastic day that was: a crash course in ontology development and best-practices, as well as introductions to some of the most well-known people in the biological / biomedical ontology world. In many ways, those first few days of my current job / last few days of my old job shaped where I am now.

Read that entire post, and Happy Blogging is Hard Day! Thanks to Greg Laden for the great idea.

Meetings & Conferences Semantics and Ontologies Standards

Morning Use-Case Talks (CBO 2009)

Nick Monk, Sheffield/Nottingham – wants to develop a formalism for multicellular models of plant roots. There are many model types out there – they’re all encoding the same thing: the way cells interact with each other and with the environment. He’s familiar with this type of problem via the history of dealing with reaction kinetics. We need to write down information about reaction kinetics in a simulation-independent manner. Therefore they need to write down the multicell models in a way that does not depend on the simulation environment. For reaction kinetics, it was fairly straightforward to do this as there was already a good list of terms describing reaction kinetics.

For cell behaviours, when we talk about them we tend to talk about them in a subjective / qualitative level. Humans using their pattern recognition skills to identifying the behaviours – there are no real quantitative metrics for determining behaviour. What would be most useful is a way to abstract out information from images of cells that would allow us to determine the behaviours they’re exhibiting.

If we generate time-course image data, what are we going to do with them? Therefore we need a way to annotate these images == the annotation case study. They want to have a session on multicellular modelling standards at the next international systems biology conference (ICSB, Edinburgh summer 2010).

Then Rusty Lansford (CalTech) described a set of images he had put up on the screens. They’ve generated some modified quail (FP_expressing Tg quail) that they’re using – the eggs are easy to work with. They put different fluorescent proteins into different quail, and then breed them together. He had a very nice video of quail development with endothelial cells marked. Brighter cells are those about to enter M phase. There are also some great “4D” video that track the movements of the cells to form tha aorta. They’re pretty confident that they can follow cell division and cell orientation and shortly cell polarity. They’re happy to know what interests other people would be interested in terms of data and they’ll collect it for your models. Some words he used was: rolling, flow, differentiating, kiss-and-fuse, formation of organs and more. Many of the terms were subcellular and others were higher up (e.g. organs or tissue-level).

Nadine Peryeras (CNRS) from then discussed the Embryonics EC project, which reconstructs the cell lineage tree as the core of the “embryome”. They looked at 4 organisms, including the zebrafish. When computationally determining the cell shape, you (virtually) cut the embryo into bits to figure out the size. They have a number of algorithmic strategy for determining the position of each cell in the xyz axes. They can convert from total cell number to cell density if they have volume information. They have a video of the zebrafish virtual embryo, where the color shows the direction of migration. Very nice.

There were two other presentations about what their use-cases would be, but I was working on the list of terms from CBO.

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Meetings & Conferences Semantics and Ontologies Standards

Use Cases and top level: Afternoon Discussion (CBO 2009)

After the presentations finished, the discussion of what use cases to use started. What is the scope of the cell behaviour. Specifically,  it describes how cells behave seen as agents (deliberately neglecting subcellular and tissue/organ details). For example, cell adhesion, cell-cell adhesion and others.

I had a lot of fragmented notes during this discussion, but discovered that I was contributing so much that I didn’t have good notes. Luckily, Benjamin has been taking excellent notes which I think should shortly appear on the CBO wiki. I’ll link them from this post as soon as I have it.

There was a really interesting discussion within today’s session about whether or not cell shapes should be included in CBO. It wasn’t so much the cell shape example that was interesting to me (as it is my opinion that shapes are not behaviours – it is the changing from one shape to the next that is a behaviour), but it was the way that it exposed the differences in thinking in the members of the workshop had about what constituted a behaviour, and hence what the scope of the ontology should be.

This is my interpretation of the top level and a very rough binning of the other terms with respect to that top level

  • division
  • cell fragmentation
  • cell movement (linear rate, persistence)
    • movement of single cell
    • movement of clusters of cells
    • movement of sheets of cells
    • follow field (chemotaxis, haptotaxis)
    • polarize – movement within oneself?
    • cell traction
      • between other cells
      • ecm (rearrange ecm); basement membrane
  • shape change
    • cell contraction (apical, area change, in epithelia)
    • shape changes that result in a reduction in volume (defined class?)
    • shape changes that result in an increase in volume (defined class?)
    • shape changes that do not result in a volume change
    • assembly of ecm
    • cell protrusion (life timel orientation; duration; lamellipodia (directed; random); filopodia;retraction fibers)
    • length, width, anterior, posterior changes
    • cilia direction
    • flagella, microvilli
    • ruffle membrane
    • restructure cytoskeleton?
  • exert force / pull
  • delaminate
  • interact with other cells
    • the process of contacting with another cell
    • the process of contacting with something that isn’t a cell
    • cell-cell communication
  • secrete (export)
    • vesicle secretion
    • molecule secretion
  • excrete (export)
  • absorb (import)
    • digestion (e.g. osteoclast)
  • adsorb (import)
  • cell rearrangement – is this always with >1 cell?
    • change neighbours
    • directed rearrangement
    • random rearrangement
  • disappear
    • cell death
    • extinction
  • fusion
  • give off heat
  • change electrical field
  • interact with ecm
    • pull on ecm (also a child of force / pull)

    Alter subcellular distribution

  • alter extracellular distribution
  • alter mechanical properties
  • remodel EC environment

Personal Note: There needs to be a delination between the behavior of a single cell and the behaviors that are only relevant in the context of other cells. Many of the above should probably become defined classes to prevent multiple asserted hierarchy. This is just a representation of what was discussed this afternoon, and is not how it is meant to be in a final form. Particularly, some things that are presented as a top-level (e.g. the two types of interactions) are actually children of a not-yet-extant parent term.

Some terms that didn’t fit in these lists but which were suggested: live cell, cell activation, response to external stimuli, cell metabolism. They may not belong, or they may belong but haven’t been binned.

Meetings & Conferences Semantics and Ontologies Standards

Dan Cook (U Washington): Ontology design: added value of organizing principles (CBO 2009)

Theory-based ontologies (FMA, OPB, SemSin semantic biosimulation models, GO, Cell Type ontology) for multiscale structure. The OPB is the Ontology of Physics for Biology, where domains include fluids, solids, chemical kinetics, electrochemistry, diffusion, heat transfer.

They have created OPB:physical_property as a child of continuant. These include terms like force, resistance, flow, etc.

Personal Note: I was really glad to see someone else using BFO for their terms in a practical sense like OBI does, rather than in a theoretical sense like most of the other OBO Foundry ontologies.

They have developed SemSim, which is a lightweight mapping schema in OWL from the physics biosimulation code to the semantic knowledge (Gennari JH et al 2008 PSB: Integration of multiscale biosimulation models via lightweight semantics; and another PSB article from 2009 about merging/recombining models – Neal et al.). Very interesting.

You could say that any change in property values is a consequence of a thermodynamically driven state property changes. A change in property can have structural and existential consequences.

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Meetings & Conferences Semantics and Ontologies Standards

Alexander Diehl (GO/MGI): Biological Processes subtree in GO (CBO 2009)

Gene Ontology (GO) has developed over time to become more of a true ontology. Its purpose is: as a common language to share knowledge, to support cross-referencing.

Terms of interest for CBO in GO include cellular processes and their regulation, cell differentiation, cellular extravasation, among other things. Cellular process, multicellular organismal process and multi-organism process are disjoint from each other. Some of these terms can be problematic: for instance, localization is a subtype of multi-organism process, which could also be one of the other types, depending on the definition: it all comes down to the definition…

Terms in GO may have multiple parents, some of which are from other ontologies such as the Cell Ontology. These links to external ontologies will not be present in the standard download, but you can download a version of GO that has the links (you’ll probbaly have to download the additional external ontologies separately).

There are 16419 terms in the biological process ontology. They don’t just develop GO as annotators need them or users request them. They also have domain workshops that focus on getting a particular type of domain covered (e.g. lung development and muscle development). GO developers use OBO-Edit 2.0, which isn’t as fully-functional as Protege and OWL, but which is useful for people only developing in OBO.

Annotations of gene products to GO are genome specific. With regards to the CBO and GO, we shouldn’t reinvent the wheel. We also need to think very carefully about the definition of behaviour, which in GO means “the specific actions or reactions of an organims in response to external or internal stimuli…”

Basically, they are just cellular processes, which might be a little more restrictive than we want.It would be really useful to make as much use of GO as possible because you get a lot of benefits: you get automatic linking to all the rest of GO, and all the analyses etc that people do and then annotate with GO. You might also want to look into the extracellular matrix organisation terms.

Question: why did you decide to put cell type outside of GO? Well originally, it was created to describe aspects of particular gene products, and cell type doesn’t seem to be within scope. In a longer term, they want to bring the Cell Ontology under the auspices of the GO Consortium.

MGI website.

In other news, during lunch James Glazier has said that CBO should only be for behaviour of cells, not behaviour in cells, and that we will not be attempting to hang CBO underneath the biological process section of GO.

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Meetings & Conferences Semantics and Ontologies Standards

Herbert Sauro/Michael Galdzicki (Washington): Building Ontologies and Standards in Systems Biology (CBO 2009)

Herbert Sauro and others in the systems biology communities started with the modelling language and then went into ontologies. SBML is used to represent homogeneous multi-compartmental biochemical systems. You can have discrete events that either come from the outside or are generated internally. SBML started in ’99/’00, and now over 160 tools support SBML, and SBML files are accepted at a number of journals including Nature, Science and PLoS. CellML is philosophically different from SBML, as the former is math-centric and the latter is biology-centric.

In systems biology, SBML and related tools have allowed useful collaborations that were not available before. However, SBML is a common syntax, and what was also needed was a common semantics. The SemGen Annotator software is used to attach meaning to mathematical models, which can be loaded into a database such as BioModels.

Galdzicki had a reference to SED-ML, which would allow semantically-enriched publications to aid the interpretation of results. For instance, you could click on a figure of a model and be taken to a web application that can run the simulation for you. (Personal Note: there is an interesting paper about semantically marking-up publications: .)

In conclusion, remember that the use of an ontology must be an important criterion in its design.

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Meetings & Conferences Semantics and Ontologies Standards

Olivier Bodenreider (NLM): Best-Practices, pitfalls and positives (CBO 2009)

If ontologies are the solutions, what is the problem? Think use cases. Uses of biomedical ontologies include knowledge management (annotating data, accessing information, mapping across ontologies), data integration and exchange, semantic interoperability and decision support (Bodenreider YBMI 2008).

The ontology you’re going to build will be different depending on your use cases: different structure, different focus, etc. Finding an agreement and settling on what your use cases are is an important part of the meeting. Collection and prioritization is very important.

Showed an image of the “ontology spectrum”, available at The amount of semantics you want to put in an ontology varies along a spectrum. At the “weak semantics” end you have taxonomies and Thesauri, whereas at the “strong semantics” end you have Conceptual Models and Logical Theory (with Description Logics being the formalism du jour).

MeSH is a hierarchical controlled vocabulary – it is not an ontology. MeSH provides descriptors for indexing biomedical literature. Here, the “entry terms” may or may not be synonymous with the MeSH heading. What the entry terms mean is that anything talking about these terms will get classified according to those terms’ MeSH heading. This is enough for particular goals, such as annotation of literature. However, it may not be enough, depending on your use case. You need to figure out your level of granularity. The hierarchical in MeSH states if you’re interested in term X (e.g. cell movement), you might also be interested in X’s child terms (e.g. ). It is NOT an “IS A” hierarchy, more of a “IS RELATED TO” hierarchy. In GO, synonyms are either exact or related. Cell movement in GO is a child of cellular process and also of localization of cell. GO is more precise.

When defining use cases, you need to think about typical situations in which the resource to be created is expected to contribute to the solution (resource annotation, resource classification, inference based on attributes of biological entities). You need to think about competency questions. The rule is usually to go with the minimal ontological commitment. The last thing you want to do is to put too much into your ontology.

“Ontologies are for ontologists.” What is the difference between an ontology and a car? You wouldn’t think of building a car, but you do think about building an ontology. Eventually, you’ll run into roadblocks, e.g. trying to deal with terms from upper-level ontologies (ULOs) such as the BFO dependent continuants and the differences between function, role and disposition. He then used SNOMED as an example knowledge representation.

From the OntoClean people, he mentions that you shouldn’t have a single class with more than one IS A relationship. E.g. if you use apple and place it under both food and fruit, then you run into problems when trying to describe that an apple is toxic to another animal. Another example is “lmo-2 interacts with Elf-2”. There are many possible understandings of this statement: one individual lmo-2 molecule interacts with one individual Elf-2 molecule”, or any other number of instances or groupings.

CBO is a domain ontology, a low-level ontology. ULOs can have lower-level ontologies hung off them, but you won’t be developing ULOs. There are lots of power tools for ontologies: Protege and OBO-Edit, but these tend to be more complex than biologists wish to use. Semantic wikis are more simplified, intermediate representations that allow collaborative development. They hide part of the complexity.

You can collect terms from experts, textual corpora, and from existing terminologies and ontologies. One good resource is NCBO’s bioportal and the UMLS semantic navigator. You should try to link to and borrow from existing ontologies. On the other hand, by borrowing terms you are also borrowing the ontological commitment from these ontologies, and therefore may or may not align with your goals/scope.

With the help of experienced ontologists, you should decide on: the knowledge representation (e.g. OWL-DL), what to use as an editor (e.g. Protege), and what the ontological commitment should be (e.g. top-level ontologies). You could consider the OBO Foundry.

BiomedGT is from the NCI and they use a semantic wiki. The IDO uses the OBO Foundry approach. The Int’l Classification of Diseases uses a semantic wiki approach combined with a Protege background. A final example is the Neuroscience Information Framework (NIF).

Conclusions. Start by defining use cases, not ontologies. You should also define how you would measure success. Also, let the biologists be biologists, and seek out ontologists where needed. Follow experience/guidelines, not gurus. Finally, think prospectively, such as maintenance and funding.

Olivier’s website:
IDO imports many terms from GO.

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!