There are a few big questions that need to be kept firmly in mind when starting down the road of ontology building. These are questions of:
- Goals: What are you trying to achieve with this ontology?
- Competency/Scope: What are you trying to describe?
- Granularity: To what depth will you need to go?
The rest of this post relates directly to and is organised around these three topics. These topics have a lot of overlap, and aren’t intended to be mutually exclusive: they’re just ideas to get the brain going. I use the upcoming Cell Behavior Ontology (CBO) workshop to illustrate the points. The questions I single out below may already have been answered by the workshop organizers, but haven’t been published on the CBO wiki yet. I’ll be attending this workshop, and will aim to post my notes each day. It should be fun!
If a main goal is eventual incorporation within another ontology (e.g. Gene Ontology (GO) for the case of CBO) or even just alignment with the other ontology’s tenets, you have to be sure you’re happy with the limitations this may put on your own ontology. It may be that these limitations are not acceptable, and as a result you choose to reduce the dependencies on the other ontologies.
For CBO, the important questions relate to possible alignment to GO and therefore, ultimately, Basic Formal Ontology (BFO):
Question: Do you wish to ultimately include some CBO terms under, for example, biological processes of GO? GO contains only canonical/non-pathological terms. How does this fit with the goals of CBO?
GO has the express intent of creating terms covering only canonical / non-pathological biology. Therefore, would cell behavior during cancer (e.g. uncontrolled cell proliferation or metastatis, which aren’t in GO) be appropriate if CBO is meant to, in its entirety, be included within GO? They are important terms, so if some amount of incorporation with GO is appropriate, would it only end up being a partial alignment?
Question: Are there any plans to use an Upper Level Ontology (ULO) such as the OBO Foundry-recommended BFO? Though BFO may not need to be considered immediately, it does place certain restrictions on an ontology. Are you happy with those restrictions?
One example of the restrictions placed by the use of BFO is that within BFO, qualities cannot be linked via the Relations Ontology to processes. That is, if you have a property called has_rate which is a child property of “bears”, then you are not allowed to make a statement such as “cell division has_rate some_rate”, where cell division is a process, and some_rate is a quality. There is a good post available about ULOs by Phil Lord.
Question: How richly do we want to describe cell behaviors?
Another important general goal is the level of richness that is needed with CBO. Competency questions, discussed later, will answer this to some extent. We can think about richness using GO as an example. The goal of the GO developers is the integration of multiple resources through the use of shared terms. GO does this very well. But, if you want rich descriptions and semantic interoperability, then this is something that is not a goal of GO.
While it is often a tempting idea to start from the top of an ontology and work downward, consideration should be given to an initial listing of leaf terms that you are sure that you need in the ontology. Not only does this ensure you have terms that people need from the start, the bagging and grouping exercises you would then go through to create the hierarchy will often highlight any potential problems with your expected hierarchy. If you have clear use-cases, then a bottom-up approach, at least in the early stages, can be useful in figuring out what the scope of your ontology is.
This brings us to the importance of having scope – and a set of competency questions – ready from the beginning of ontology development. What do you want to describe?
Question: What is the definition of cell behavior in the context of CBO?
For instance, for CBO, what is meant by the word “behavior”? A specific description of what is, and isn’t, a behavior that the CBO is interested in, is an important first step.
The last thing that would be relevant to the overall goals (but which could equally well be considered in the Granularity section below) is the type of terms to be added:
Question: Should the terms be biological terms only, or also bioinformatics/clinical terms?
To better explain the above question, you could consider the stages of cancer progression. “Stage 2” is a fictitious name for a clinical/bioinformatics description of a stage of a cancer. This is not a biological term. Which type of term should go into CBO? I would guess that the biological term should go in which describes the biology of a cell at “stage 2”, and then perhaps use synonyms to link to bioinformatics/clinical terms. There probably shouldn’t be a mix of the two types of terms as the primary labels.
Additionally, competency questions can help determine the scope. You can make a list of descriptive sentences that you want the ontology to be able to describe, such as “The behavior of asymmetric division (e.g. stem cell division)”. By listing a number of such sentences, you can determine which are out of scope and which must be included, thus building up a clear definition of the scope.
For me the granularity question has two aspects: first, and more generally, is how fine-grained do you want to be with your terms; second, and more interestingly, is in the context of CBO, are we interested in the behavior of cells and/or the behavior in cells? The examples given in the workshop material seem to come from both of these areas (see http://cbo.compucell3d.org/index.php/Breakout_Session_Groups).
Question: Should CBO deal with the behavior OF cells and/or the behavior IN cells?
For the above question we can use as examples cell polarization and cell movement. Both are listed in the link to the wiki provided just above, so both are considered within the scope of CBO. However, cell movement is a characteristic behavior of a cell, while polarization is something that happens in a cell (e.g. polarization within a S.cerevisiae cell with regards to the budscar). Both of these types of behaviors are relevant, but they are different classes of behavior and may be an appropriate separation within the CBO hierarchy.
As an aside, is cell division a behavior? It is covered in the CBO material, so with respect to CBO, it is. I think that the CBO is intended to deal with single cells, so I’m not sure where cell division fits in.
These questions should be considered, but you should also try not to let them reduce the effectiveness and efficiency of ontology development. However, as with many biological domains, try to ensure that everyone is on the same page with their goals, scope, and granularity and there will be (I believe!) fewer arguments and more results.
Also, I am positive I’ve missed stuff out, so please add your suggestions in the comments!
With special thanks to Phil Lord for the useful discussions surrounding ontology building that formed the basis for this post.