Olivier Bodenreider (NLM): Best-Practices, pitfalls and positives (CBO 2009)
If ontologies are the solutions, what is the problem? Think use cases. Uses of biomedical ontologies include knowledge management (annotating data, accessing information, mapping across ontologies), data integration and exchange, semantic interoperability and decision support (Bodenreider YBMI 2008).
The ontology you’re going to build will be different depending on your use cases: different structure, different focus, etc. Finding an agreement and settling on what your use cases are is an important part of the meeting. Collection and prioritization is very important.
Showed an image of the “ontology spectrum”, available at http://www.mathiswebs.com/ontology.htm. The amount of semantics you want to put in an ontology varies along a spectrum. At the “weak semantics” end you have taxonomies and Thesauri, whereas at the “strong semantics” end you have Conceptual Models and Logical Theory (with Description Logics being the formalism du jour).
MeSH is a hierarchical controlled vocabulary – it is not an ontology. MeSH provides descriptors for indexing biomedical literature. Here, the “entry terms” may or may not be synonymous with the MeSH heading. What the entry terms mean is that anything talking about these terms will get classified according to those terms’ MeSH heading. This is enough for particular goals, such as annotation of literature. However, it may not be enough, depending on your use case. You need to figure out your level of granularity. The hierarchical in MeSH states if you’re interested in term X (e.g. cell movement), you might also be interested in X’s child terms (e.g. ). It is NOT an “IS A” hierarchy, more of a “IS RELATED TO” hierarchy. In GO, synonyms are either exact or related. Cell movement in GO is a child of cellular process and also of localization of cell. GO is more precise.
When defining use cases, you need to think about typical situations in which the resource to be created is expected to contribute to the solution (resource annotation, resource classification, inference based on attributes of biological entities). You need to think about competency questions. The rule is usually to go with the minimal ontological commitment. The last thing you want to do is to put too much into your ontology.
“Ontologies are for ontologists.” What is the difference between an ontology and a car? You wouldn’t think of building a car, but you do think about building an ontology. Eventually, you’ll run into roadblocks, e.g. trying to deal with terms from upper-level ontologies (ULOs) such as the BFO dependent continuants and the differences between function, role and disposition. He then used SNOMED as an example knowledge representation.
From the OntoClean people, he mentions that you shouldn’t have a single class with more than one IS A relationship. E.g. if you use apple and place it under both food and fruit, then you run into problems when trying to describe that an apple is toxic to another animal. Another example is “lmo-2 interacts with Elf-2”. There are many possible understandings of this statement: one individual lmo-2 molecule interacts with one individual Elf-2 molecule”, or any other number of instances or groupings.
CBO is a domain ontology, a low-level ontology. ULOs can have lower-level ontologies hung off them, but you won’t be developing ULOs. There are lots of power tools for ontologies: Protege and OBO-Edit, but these tend to be more complex than biologists wish to use. Semantic wikis are more simplified, intermediate representations that allow collaborative development. They hide part of the complexity.
You can collect terms from experts, textual corpora, and from existing terminologies and ontologies. One good resource is NCBO’s bioportal http://bioportal.bioontology.org and the UMLS semantic navigator. You should try to link to and borrow from existing ontologies. On the other hand, by borrowing terms you are also borrowing the ontological commitment from these ontologies, and therefore may or may not align with your goals/scope.
With the help of experienced ontologists, you should decide on: the knowledge representation (e.g. OWL-DL), what to use as an editor (e.g. Protege), and what the ontological commitment should be (e.g. top-level ontologies). You could consider the OBO Foundry.
BiomedGT is from the NCI and they use a semantic wiki. The IDO uses the OBO Foundry approach. The Int’l Classification of Diseases uses a semantic wiki approach combined with a Protege background. A final example is the Neuroscience Information Framework (NIF).
Conclusions. Start by defining use cases, not ontologies. You should also define how you would measure success. Also, let the biologists be biologists, and seek out ontologists where needed. Follow experience/guidelines, not gurus. Finally, think prospectively, such as maintenance and funding.
Olivier’s website: http://mor.nlm.nih.gov
IDO imports many terms from GO.
Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!