Categories
Semantics and Ontologies Software and Tools

Software Ontology – a New Release and a Shiny New Build Procedure

I had noticed that it had been a while since we had last updated SWO – the Software Ontology. To be honest, it was a little more than “a while”, but…

  • we’re a merry band of volunteers, primarily Robert Stevens (blog, Computer Science at Manchester), Helen Parkinson (EBI), James Malone (blog, SciBite), which means we are time limited
  • our build process was outdated, slow, and tricky. I’ll admit, I had to ask James to finish our 1.6 release as it just wasn’t working for me!
A small snippet of SWO – see EBI’s OLS for the full graph

Does your release spark joy?

We all enjoy talking about software, and I have particularly enjoyed beginning to work on the lovely Licence Hierarchy within SWO that’s been coming along nicely. But every time I thought about updating the external ontologies we imported, or building the release files, I got a bit of a sinking feeling. Then, feeling like I was the last in the class to notice, I had a good read about ROBOT (website, publication), an ontology build tool that lots of projects had been using. I say build tool, but it does all sorts of lovely things. I use it for the following purposes:

  • SPARQL queries: I use SELECT to create summary statistics of quite complex subdivisions of my ontology
  • Bulk annotation: UPDATE commands can also be run, allowing me to add bulk annotations to my file.
  • Bulk imports via spreadsheets: a separate project I’ve been involved in began their ontology development with a spreadsheet and then we bulk converted it to OWL with ROBOT.
  • Merging imports – going from a development file with multiple imports to a single release file
  • Release building – checking and building a release file with appropriate annotation and versioning.

And to top it all off, ROBOT suggests that you use a Makefile to control your build. What joy! The last time I used one was during my time at the EBI, and a really do enjoy using them. They are a lightweight, fun way to control a set of commands and dependencies that you need to run, and it was awesome to get back to it.

Decluttering

As it had been a while since we released SWO, it needed a spring clean. With MIREOT and Ontofox, I wasn’t tied to a simple (but crowded) import of entire ontologies. In previous versions, ontologies like EDAM were imported en masse and this causes major versioning issues when release get out of step. MIREOT solves that by outlining a procedure (implemented by Ontofox) which allows for the selective import of classes and hierarchies of interest from external ontologies.

So, we stripped out all of our external classes, and re-imported just the ones we needed. We also took the opportunity to resolve a number of inconsistencies with our IRI naming scheme (and a bunch of other housekeeping issues listed in our GitHub milestone).

Release and Indexing

We released 1.7 at the end of October, and our lovely friends at OLS, BioPortal and Ontobee quickly indexed it. Please feel free to browse it at any of these locations, or to say hello over at our GitHub repo (you’ll always find our latest release here). And with our build procedure now as streamlined as our ontology, updates will be easier and quicker – so let us know what you’d like!

Categories
Semantics and Ontologies

How does your ontologizing style compare with the style of others?

Do you run the reasoner after every axiom addition, or do you bravely go minutes or even hours before clicking on “Synchronize Reasoner”? Do you add synonyms, definitions, definition sources and other annotation avidly, or lazily (at least compared with your compatriots)?

Find this image in all its original glory at http://what-if.xkcd.com/3/

Ever wondered if you built your ontology the same way as everyone else? Not in a competitive way (OK, maybe a little bit in a competitive way), but in your stylistic choices and natural rhythms? I have wondered exactly this, and last month I got the chance to provide some data to some researchers who are studying the styles and behaviours of ontologists while they are, well, ontologizing (Robert Stevens says it’s a word, and I believe him!). Markel Vigo (work page, blog site, Twitter) and Robert Stevens (work page, blog site) at the University of Manchester are looking for more ontologists to do the same as me and load up Protege 4 in The Name of Science (well, more science than you were already doing by producing the ontology in the first place).

[If you’re already sold, download the information you need from this Dropbox folder or email Markel Vigo.]

And, for only approximately 90 non-consecutive minutes of your time, you can contribute to their research too! You can pick it up and put it down as you have time; I did a few minutes here and there in about 5 or 6 sessions. You simply download their version of Protege with their event recorder built in, load up your favorite ontology and just work exactly as you would normally work. Although, saying that, I did feel like I was working with Robert sitting beside me, which did make me sit up straighter and feel vaguely like I was in an exam – in a good way…!

As Robert originally told me:

Protege4US (the name of their version of Protege which contains the event recorder) is a standard version of Protege 4, but it logs what people are doing – button presses, menu options used, axioms written etc. They then analyse these logs for patterns of activity. You can see a blog post that describes a paper about a recent study they did with Protege4US that used a pre-determined, defined task.

This study in which you would take part if you’re interested does more or less the same thing (although with no screen capture and no eye-tracking), but this time with participants (that’s you) will be doing their own ontology task in their own time as they would usually do it.

Markel Vigo is running the study and you can ask him or Robert Stevens any questions you might have. Markel has supplied the Protege4US extensions and a readme and so on in a Dropbox folder. Markel has been very quick to answer whenever I had any questions, and even made an Ubuntu version for me as soon as I asked.

If you regularly work with ontologies, please consider donating some of your expertise to this task – I’m sure the results will be very interesting!

Categories
Papers Semantics and Ontologies

Distributed Ontology Development

Last Friday, while I was discussing ontologies and decisions that need to be made in ontology development with some work colleagues, one of the phrases that cropped up more than once is “be sensible”. Being sensible isn’t always as easy as it seems, but one way to be sensible is to choose an ontology development methodology and make use of before you even write down your first ontology class name. If you want lots of people to use an ontology, you need to involve at least some of those people in its development.

As a timely accompaniment to this thought, in the past week Frank Gibson has published a pre-print version of a methodology for distributed ontology development called Developing ontologies in decentralised settings (by Alexander Garcia, Kieran O’Neill, Leyla J. Garcia, Phillip Lord, Robert Stevens, Oscar Corcho, & Frank Gibson).

While Frank himself has referred to it as “dry”, I think that does it a disservice (but perhaps I’m biased because I know him and also because I like methodologies and standards!). This paper would better be described as comprehensive. I’d like to cover a few sections of the paper that I found the most interesting, to whet your appetite for reading the whole thing.

Firstly, Garcia et al. mention one overriding focus of the bio-ontology community: ontology development without any accompanying ontology development methodology:

‘The research focus for the bio-ontology community to date has typically centred on the development of domain specific ontologies for particular applications, as opposed to the actual “how to” of building the ontology or the “materials and methods”[…] This has resulted in a proliferation of bio-ontologies, developed in different ways, often presenting overlap in terminology or application domain.’

Both in programming and in ontology development, I find it very hard not to head straight for working on the “interesting” bits without thinking through the best way to go about it. However, even though I find it difficult to follow a particular methodology, the benefits outweigh the downsides.

Garcia et al also list a kind of minimal set of requirements for an ontology methodology:

‘A general purpose methodology should aim to provide ontology engineers with a sufficient perspective of the stages of the development process and the components of the ontology life cycle, and account for community development. In addition, detailed examples of use should be included for those stages, outcomes, deliverables, methods and techniques; all of which form part of the ontology life cycle.’

So far, these are useful statements for anyone building an ontology, but this paper concentrates on distributed ontology development, and presents Melting Point (MP), an ontology methodology specifically designed for distributed, community-driven ontology development. It was created as a “convergence of existing methodologies, with the addition of new aspects” as “no methodology completely satisfies all the criteria for collaborative development” (pg. 2). A useful overview of MP is available from Figure 3 in the paper, which describes the life cycle of the MP methodology including its processes and activities.

This paper has a thorough review of nine existing ontology and knowledge engineering methodologies (see Table 1 and Section 4.2 particularly), and clearly explains why MP was important to develop. I encourage anyone interested in building ontologies to read this paper for its background information, and especially encourage anyone interested in distributed, community-driven development of ontologies to read this and determine if MP might be the right methodology for you.

I’ll finish as Garcia et al. has, with their concluding paragraph. Enjoy!

‘As we increasingly build large ontologies against complex domain knowledge in a community and collaborative manner there is an identified need for a methodology to provide a framework for this process. A shared methodology tailored for the decentralized development environment, facilitated by the internet should increasingly enable and encourage the development of ontologies fit for purpose. The Melting point methodology provides this framework which should enable the ontology community to cope with the escalating demands for scalability and repeatability in the representation of community derived knowledge bases, such as those in biomedicine and the semantic web.’

Categories
CISBAN Meetings & Conferences Semantics and Ontologies Standards

Pre-Building an Ontology: What to think about before you start

There are a few big questions that need to be kept firmly in mind when starting down the road of ontology building. These are questions of:

  1. Goals: What are you trying to achieve with this ontology?
  2. Competency/Scope: What are you trying to describe?
  3. Granularity: To what depth will you need to go?

The rest of this post relates directly to and is organised around these three topics. These topics have a lot of overlap, and aren’t intended to be mutually exclusive: they’re just ideas to get the brain going. I use the upcoming Cell Behavior Ontology (CBO) workshop to illustrate the points. The questions I single out below may already have been answered by the workshop organizers, but haven’t been published on the CBO wiki yet. I’ll be attending this workshop, and will aim to post my notes each day. It should be fun!

Goals

If a main goal is eventual incorporation within another ontology (e.g. Gene Ontology (GO) for the case of CBO) or even just alignment with the other ontology’s tenets, you have to be sure you’re happy with the limitations this may put on your own ontology. It may be that these limitations are not acceptable, and as a result you choose to reduce the dependencies on the other ontologies.

For CBO, the important questions relate to possible alignment to GO and therefore, ultimately, Basic Formal Ontology (BFO):

Question: Do you wish to ultimately include some CBO terms under, for example, biological processes of GO? GO contains only canonical/non-pathological terms. How does this fit with the goals of CBO?

GO has the express intent of creating terms covering only canonical / non-pathological biology. Therefore, would cell behavior during cancer (e.g. uncontrolled cell proliferation or metastatis, which aren’t in GO) be appropriate if CBO is meant to, in its entirety, be included within GO? They are important terms, so if some amount of incorporation with GO is appropriate, would it only end up being a partial alignment?

Question: Are there any plans to use an Upper Level Ontology (ULO) such as the OBO Foundry-recommended BFO? Though BFO may not need to be considered immediately, it does place certain restrictions on an ontology. Are you happy with those restrictions?

One example of the restrictions placed by the use of BFO is that within BFO, qualities cannot be linked via the Relations Ontology to processes. That is, if you have a property called has_rate which is a child property of “bears”, then you are not allowed to make a statement such as “cell division has_rate some_rate”, where cell division is a process, and some_rate is a quality. There is a good post available about ULOs by Phil Lord.

Question: How richly do we want to describe cell behaviors?

Another important general goal is the level of richness that is needed with CBO. Competency questions, discussed later, will answer this to some extent. We can think about richness using GO as an example. The goal of the GO developers is the integration of multiple resources through the use of shared terms. GO does this very well. But, if you want rich descriptions and semantic interoperability, then this is something that is not a goal of GO.

Competency/Scope

While it is often a tempting idea to start from the top of an ontology and work downward, consideration should be given to an initial listing of leaf terms that you are sure that you need in the ontology. Not only does this ensure you have terms that people need from the start, the bagging and grouping exercises you would then go through to create the hierarchy will often highlight any potential problems with your expected hierarchy. If you have clear use-cases, then a bottom-up approach, at least in the early stages, can be useful in figuring out what the scope of your ontology is.

This brings us to the importance of having scope – and a set of competency questions – ready from the beginning of ontology development. What do you want to describe?

Question: What is the definition of cell behavior in the context of CBO?

For instance, for CBO, what is meant by the word “behavior”? A specific description of what is, and isn’t, a behavior that the CBO is interested in, is an important first step.

The last thing that would be relevant to the overall goals (but which could equally well be considered in the Granularity section below) is the type of terms to be added:

Question: Should the terms be biological terms only, or also bioinformatics/clinical terms?

To better explain the above question, you could consider the stages of cancer progression. “Stage 2” is a fictitious name for a clinical/bioinformatics description of a stage of a cancer. This is not a biological term. Which type of term should go into CBO? I would guess that the biological term should go in which describes the biology of a cell at “stage 2”, and then perhaps use synonyms to link to bioinformatics/clinical terms. There probably shouldn’t be a mix of the two types of terms as the primary labels.

Additionally, competency questions can help determine the scope. You can make a list of descriptive sentences that you want the ontology to be able to describe, such as “The behavior of asymmetric division (e.g. stem cell division)”. By listing a number of such sentences, you can determine which are out of scope and which must be included, thus building up a clear definition of the scope.

Granularity

For me the granularity question has two aspects: first, and more generally, is how fine-grained do you want to be with your terms; second, and more interestingly, is in the context of CBO, are we interested in the behavior of cells and/or the behavior in cells? The examples given in the workshop material seem to come from both of these areas (see http://cbo.compucell3d.org/index.php/Breakout_Session_Groups).

Question: Should CBO deal with the behavior OF cells and/or the behavior IN cells?

For the above question we can use as examples cell polarization and cell movement. Both are listed in the link to the wiki provided just above, so both are considered within the scope of CBO. However, cell movement is a characteristic behavior of a cell, while polarization is something that happens in a cell (e.g. polarization within a S.cerevisiae cell with regards to the budscar). Both of these types of behaviors are relevant, but they are different classes of behavior and may be an appropriate separation within the CBO hierarchy.

As an aside, is cell division a behavior? It is covered in the CBO material, so with respect to CBO, it is. I think that the CBO is intended to deal with single cells, so I’m not sure where cell division fits in.

These questions should be considered, but you should also try not to let them reduce the effectiveness and efficiency of ontology development. However, as with many biological domains, try to ensure that everyone is on the same page with their goals, scope, and granularity and there will be (I believe!) fewer arguments and more results.

Also, I am positive I’ve missed stuff out, so please add your suggestions in the comments!

With special thanks to Phil Lord for the useful discussions surrounding ontology building that formed the basis for this post.