PTO4: Alignment of the UMLS Semantic Network with BioTop Methodology and Assessment (ISMB 2009)

Stefan Schulz

Ontology alignment is the linking of two ontologies by detecting semantic correspondences in their representational units (RUs), e.g. classes. Mainly done via equivalence and subsumption. BioTop is a recent development created to provide formal definitions of upper-level types and relations for the biomedical domain. It is compatible with both BFO and DOLCE lite. It links to OBO ontologies. UMLS Semantic Network (SN) is an upper-level semantic categorization framework for all concepts of the UMLS Metathesaurus. It is mainly unchanged in the last 20 years: a tree of 135 semantic types.

If you compare the two, the main difference is in the semantics, as the BioTop semantics are explicit and use Description Logics (DL), which means you’re also subscribing to the open-world assumption (OWA). The semantics of UMLS-SN is more implicit, frame-like and may be closed world. It also has the possibility to block relation inheritance, which isn’t possible with DL.

The methodology is first to provide DL semantics to the UMLS SN, and second build the bridge between BioTop and UML SN. How do we do the first step?  For semantic types: types extend to classes of individuals; subsumption hierarchies are assumed to be is_a hierarchies; and there are no explicit disjoint partitions. For semantic relations: reified as classes, NOT represented as OWL object properties. For triples: transformed into OWL classes with domain and range restrictions. Why did we convert relations to classes? Didn’t want to inflate the number of BioTop relations, and there are other structural reasons. If you reify the relation, you can provide complex restrictions on that relation. Also, it means you can formally represent the UMLS SN tags such as “defined not inherited” in a more rigorous way.

Mapping is fully manual using Protege 4, consistency check with Fact++ and Pellet supported by the explanation plugin (Horridge ISWC 2008) – they spent most of their time fighting against inconsistent TBoxes. It was an iterative process. Assessment is next. Using SN alone there is very low agreement with expert rating. Using SN+BioTop there were very few rejections (only 3) but agreed with all expert ratings. Possible reasons could be to do with the DL’s OWA and for the false positives that the expert rating was done on NE but system judgments were done on something else. There were inconsistent categorizations of UMLS SN objects which exposed hidden ambiguities (e.g. that Hospital was both a building and an organisation).

Allyson’s questions: Why decide to create BioTop and not use BFO or DOLCE lite? It’s not that I would necessarily suggest that these be used, I am just curious. Also, subsumption hierarchies are assumed to be is_a hierarchies, but is that a safe assumption in UMLS SN? For instance, in older versions of GO this would have been a problem (some things marked as subsumption were not in fact is_a, though I am pretty sure GO has fixed all of this now).

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Advertisements

2 thoughts on “PTO4: Alignment of the UMLS Semantic Network with BioTop Methodology and Assessment (ISMB 2009)”

  1. From Stefan Schulz (reproduced with permission – thanks Stefan for taking the time to reply!):

    “I saw your questions to my presentation:

    > Why decide to create BioTop and not use BFO or DOLCE lite?
    > It’s not that I would necessarily suggest that these be used,
    > I am just curious.

    Problems with BFO and DOLCE lite:

    – completely domain independent
    – do not provide useful categories for upper-level semantic annotations of biomedical content (such as the UMLS SN)
    – certain BFO subdivisions are quite useless (e.g. FiatObjectPart)

    > Also, subsumption hierarchies are assumed to be is_a hierarchies, but is that a safe assumption in UMLS SN?

    You are preeching to the choir… but my co-author Olivier Bodenreider from the NLM strongly defends that UMLS SN subsumption hierarchies are assumed to be is_a and referred to a paper written by McCray (see paper).

    > For instance, in older versions of GO this would have been a problem
    > (some things marked as subsumption were not in fact is_a, though I am pretty sure GO has fixed all of this now).

    Yes it used to be “cell component is-a Gene Ontology”

    In the UMLS SN you still find “Body System is-a Idea or Concept”, which is equally problematic (my nervous system is neither an idea nor a concept…)

    Thanks for your interest and the questions.
    Best regards,

    Stefan”

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s