Pre-Building an Ontology: What to think about before you start

There are a few big questions that need to be kept firmly in mind when starting down the road of ontology building. These are questions of: goals (What are you trying to achieve with this ontology?), competency/scope (What are you trying to describe?), and granularity (to what depth will you need to go?). The rest of this post relates directly to and is organised around these three topics. These topics have a lot of overlap, and aren’t intended to be mutually exclusive: they’re just ideas to get the brain going. I use the upcoming Cell Behavior Ontology (CBO) workshop to illustrate the points.

Advertisements

There are a few big questions that need to be kept firmly in mind when starting down the road of ontology building. These are questions of:

  1. Goals: What are you trying to achieve with this ontology?
  2. Competency/Scope: What are you trying to describe?
  3. Granularity: To what depth will you need to go?

The rest of this post relates directly to and is organised around these three topics. These topics have a lot of overlap, and aren’t intended to be mutually exclusive: they’re just ideas to get the brain going. I use the upcoming Cell Behavior Ontology (CBO) workshop to illustrate the points. The questions I single out below may already have been answered by the workshop organizers, but haven’t been published on the CBO wiki yet. I’ll be attending this workshop, and will aim to post my notes each day. It should be fun!

Goals

If a main goal is eventual incorporation within another ontology (e.g. Gene Ontology (GO) for the case of CBO) or even just alignment with the other ontology’s tenets, you have to be sure you’re happy with the limitations this may put on your own ontology. It may be that these limitations are not acceptable, and as a result you choose to reduce the dependencies on the other ontologies.

For CBO, the important questions relate to possible alignment to GO and therefore, ultimately, Basic Formal Ontology (BFO):

Question: Do you wish to ultimately include some CBO terms under, for example, biological processes of GO? GO contains only canonical/non-pathological terms. How does this fit with the goals of CBO?

GO has the express intent of creating terms covering only canonical / non-pathological biology. Therefore, would cell behavior during cancer (e.g. uncontrolled cell proliferation or metastatis, which aren’t in GO) be appropriate if CBO is meant to, in its entirety, be included within GO? They are important terms, so if some amount of incorporation with GO is appropriate, would it only end up being a partial alignment?

Question: Are there any plans to use an Upper Level Ontology (ULO) such as the OBO Foundry-recommended BFO? Though BFO may not need to be considered immediately, it does place certain restrictions on an ontology. Are you happy with those restrictions?

One example of the restrictions placed by the use of BFO is that within BFO, qualities cannot be linked via the Relations Ontology to processes. That is, if you have a property called has_rate which is a child property of “bears”, then you are not allowed to make a statement such as “cell division has_rate some_rate”, where cell division is a process, and some_rate is a quality. There is a good post available about ULOs by Phil Lord.

Question: How richly do we want to describe cell behaviors?

Another important general goal is the level of richness that is needed with CBO. Competency questions, discussed later, will answer this to some extent. We can think about richness using GO as an example. The goal of the GO developers is the integration of multiple resources through the use of shared terms. GO does this very well. But, if you want rich descriptions and semantic interoperability, then this is something that is not a goal of GO.

Competency/Scope

While it is often a tempting idea to start from the top of an ontology and work downward, consideration should be given to an initial listing of leaf terms that you are sure that you need in the ontology. Not only does this ensure you have terms that people need from the start, the bagging and grouping exercises you would then go through to create the hierarchy will often highlight any potential problems with your expected hierarchy. If you have clear use-cases, then a bottom-up approach, at least in the early stages, can be useful in figuring out what the scope of your ontology is.

This brings us to the importance of having scope – and a set of competency questions – ready from the beginning of ontology development. What do you want to describe?

Question: What is the definition of cell behavior in the context of CBO?

For instance, for CBO, what is meant by the word “behavior”? A specific description of what is, and isn’t, a behavior that the CBO is interested in, is an important first step.

The last thing that would be relevant to the overall goals (but which could equally well be considered in the Granularity section below) is the type of terms to be added:

Question: Should the terms be biological terms only, or also bioinformatics/clinical terms?

To better explain the above question, you could consider the stages of cancer progression. “Stage 2” is a fictitious name for a clinical/bioinformatics description of a stage of a cancer. This is not a biological term. Which type of term should go into CBO? I would guess that the biological term should go in which describes the biology of a cell at “stage 2”, and then perhaps use synonyms to link to bioinformatics/clinical terms. There probably shouldn’t be a mix of the two types of terms as the primary labels.

Additionally, competency questions can help determine the scope. You can make a list of descriptive sentences that you want the ontology to be able to describe, such as “The behavior of asymmetric division (e.g. stem cell division)”. By listing a number of such sentences, you can determine which are out of scope and which must be included, thus building up a clear definition of the scope.

Granularity

For me the granularity question has two aspects: first, and more generally, is how fine-grained do you want to be with your terms; second, and more interestingly, is in the context of CBO, are we interested in the behavior of cells and/or the behavior in cells? The examples given in the workshop material seem to come from both of these areas (see http://cbo.compucell3d.org/index.php/Breakout_Session_Groups).

Question: Should CBO deal with the behavior OF cells and/or the behavior IN cells?

For the above question we can use as examples cell polarization and cell movement. Both are listed in the link to the wiki provided just above, so both are considered within the scope of CBO. However, cell movement is a characteristic behavior of a cell, while polarization is something that happens in a cell (e.g. polarization within a S.cerevisiae cell with regards to the budscar). Both of these types of behaviors are relevant, but they are different classes of behavior and may be an appropriate separation within the CBO hierarchy.

As an aside, is cell division a behavior? It is covered in the CBO material, so with respect to CBO, it is. I think that the CBO is intended to deal with single cells, so I’m not sure where cell division fits in.

These questions should be considered, but you should also try not to let them reduce the effectiveness and efficiency of ontology development. However, as with many biological domains, try to ensure that everyone is on the same page with their goals, scope, and granularity and there will be (I believe!) fewer arguments and more results.

Also, I am positive I’ve missed stuff out, so please add your suggestions in the comments!

With special thanks to Phil Lord for the useful discussions surrounding ontology building that formed the basis for this post.

3 thoughts on “Pre-Building an Ontology: What to think about before you start”

  1. Defining behaviour is a tough one to do, my take on it, is that.
    A behaviour is the label assigned to an observed process or processes, realised (carried out by) a material, under certain conditions and/or environments.
    This is not to say that “behaviors” do not occur unless they are observed, it is to say that assigning a behavior, or making the statement the material X has a behaviour is either due to an observation of the material under certain conditions, or by homology assumption.

    As you say it is not clear at the moment as to whether this is behavior of, or in cells. This definition above accounts for behaviour_of_cells. To describe behavior of other material entities in cells, you would explicitly state that the “certain conditions or environment” is the cell i.e has_envronment some cell. For example

    mitochondria_behaviour has_environment some cell

    Dont be tempted to call a mitochondria a cell_component. Instead create a defined class cell_component which infers based on the environment

    The next stage would be to define what a cell is (probably using the OLS to get all the definitions of cell and pick one that suits) then cell-behavoir is the intersection of the two classes, or a defined class, encompassing cell + behaviour.

    Allyson! tut tut! – biological terms? there is no such thing. If you come back next week and the suffix or prefix, bio_, biological appears anywhere I will be extremely disappointed.

    Some other points I would add

    Lossely related to gaols and scope, but “why bother creating a CBO”? How would people/machines use it, and let the case-studies ( a description of my experiment/observation) and the use-cases ( how the ontology will acutally be used, data annotation, inference) drive the decision making process – as you well know, discussing ontology building can be endless,

    In the first instance I would definately recommend, not building a hierarchy. Instead create a simple separation of “process”, “material” and “information” and create a flat list of classes under the appropriate class – describe it in full and then let the reasoner build the hierarchy.

    In the early stages it may also be useful to state incompetency questions – what we will not represent in the otology.

    Building an ontology is a cyclical process and as you reach the end of the evaluation process, you start again, re-assessing your scope all the way through to evaluation again.

    I really should get a preprint out somewhere, of my book chapter, do you think it would be useful if I gave you a copy for next week, you know its rather long though – your audience may be better suited to your description above.

    1. Thanks, Frank 🙂 Yes, I completely agree with your comments on definition of behavior and on the comment about cell_components. I’m not sure if the people at the workshop will be interested in naming (e.g. mitochondria), as that is something for another ontology, but I understand the sentiment.

      Ok, Ok I should have been more clear about the whole biological terms thing. This was a convenience that came from the discussions with Phil. I was just trying to separate out different types of terminology. No, we don’t want anything called bio_something!

      I like your addition to the goals / scope of “why bother creating an ontology XXX”? The use of case studies and use cases is really important, and I would like to see if the smaller working groups that we break into during the workshop could all build off the same set of case studies…

      As we’re meant to limit ourselves to behaviors, I’m not sure that we’ll need anything other than the “process” part – I would expect that we’ll use links out to other ontologies for the materials etc.

      A list of what we’re not describing is definitely an important part of defining the scope.

      As for a preprint of your chapter, that would be great if you’re allowed. 🙂

      Thanks for all the comments!

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s