Archive

Archive for the ‘Semantics and Ontologies’ Category

HL13: The Human Phenotype Ontology (ISMB 2009)

June 29, 2009 Leave a comment

Peter Robinson

MIM – started in 1966 and online (OMIM) for over a decade. It has been extremely difficult to use computationally in a large-scale fashion. Thehierarchical structure of OMIM does not reflect that two terms are more cloesly related than a third. In constructing the HPO, all descriptions used at least twice (~7000) were assigned to HPO. It now has about 9000 terms and annotations for 4813 diseases. They have a procedure which calculates phenotypic similarity by finding their most-specific common ancestor.

You can visualize the human phenome using HPO. They also have a query system that allows physicians to query what’s in the ontology. Also there is the Phenomizer, which is “next-generation diagnostics”. You can get a prioritized list of candidates.To validate the approach, they took 44 syndromes and went to literature to look at their frequency, then generate patients at random using the features of the disease. For each simulated patient, queries were generated using HPO terms. Ranks of the disease returned by the phenomizer were compared to the original diagnosis. Comparisons were performed with phenotypic noise. In an ideal situation, their approach has some advantage (when no noise and imprecision). When add noise or imprecision, the p-value stays ok but other measures drop. They also use the information to get disease-gene families.

HPO and PATO are talking to each other. HPO is being used as a link between cellular networks and HP. They also want you to annotate your data with HPO. If you’re interested, find out more about the HPO Consortium.

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

PTO6: Ontology Quality Assurance Through Analysis of Term Transformations (ISMB 2009)

June 29, 2009 1 comment

Karin Verspoor

This work came out of a meeting talking about OBO quality assurance in GO. The work described here is applicable to any controlled vocabulary. The key quality concerns is univocality, or a shared interpretation of the nature of reality, and was originally coined from Spinoza in 1677. David Hill intended it to mean something slightly different, which is consistency of expression of concepts within an ontology. This facilitates human usability and computational tools can utilize this regularity.

Try to identify cases where there were violations of univocality: two semantically similar terms with different structure in their term labels. GO is generally very high quality: need computational tools to identify inconsistencies. They chose a simplistic approach of term transformation and clustering, as it’s good to start with the simplest stuff first. First step is abstraction, which is substitution of embedded GO and ChEBI terms with variables GTERM and CTERM, respectively. Then there was stopword removal (high frequency words like the, of, via). Next is alphabetic reordering (to deal with word order variation in the terms). They tried all different combinations of transformation ordering, to see how they were different.

20% of abstraction was due to CTERM, and 30% due to GTERMs. If you look at the distribution of the cluster sizes before and after transformation has radically changed. Max cluster before transformation was 29, and after the max cluster size was ~3000. In the end, found 237 clusters that may contain a univocality violation. Looked for terms that were in different cluster after abstraction, but merged together after one of the other transformations (that’s how they got the 237 clusters). A further 190 clusters that had to be manually assessed – this has reduced the number of things that had to be looked at manually. Discovered 67 true positive violations (35% ) of univocality. Already have ideas for improvements of this step.

The 67 clusters constitutes 317 GO terms. 45% of true positive inconsistences were {Y of X} | {Y in X}. There were a further 16% of TP where there were determiners in one version (e.g. “the”) and not in another version. Some of the smaller number of TP dealing with inverses, etc. 50% of FP were the semantic import of a stopword (some of the stopwords actually carry meaning and shouldn’t have been removed) and by removing it they’ve removed the difference between the two words.

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

PTO4: Alignment of the UMLS Semantic Network with BioTop Methodology and Assessment (ISMB 2009)

June 29, 2009 2 comments

Stefan Schulz

Ontology alignment is the linking of two ontologies by detecting semantic correspondences in their representational units (RUs), e.g. classes. Mainly done via equivalence and subsumption. BioTop is a recent development created to provide formal definitions of upper-level types and relations for the biomedical domain. It is compatible with both BFO and DOLCE lite. It links to OBO ontologies. UMLS Semantic Network (SN) is an upper-level semantic categorization framework for all concepts of the UMLS Metathesaurus. It is mainly unchanged in the last 20 years: a tree of 135 semantic types.

If you compare the two, the main difference is in the semantics, as the BioTop semantics are explicit and use Description Logics (DL), which means you’re also subscribing to the open-world assumption (OWA). The semantics of UMLS-SN is more implicit, frame-like and may be closed world. It also has the possibility to block relation inheritance, which isn’t possible with DL.

The methodology is first to provide DL semantics to the UMLS SN, and second build the bridge between BioTop and UML SN. How do we do the first step?  For semantic types: types extend to classes of individuals; subsumption hierarchies are assumed to be is_a hierarchies; and there are no explicit disjoint partitions. For semantic relations: reified as classes, NOT represented as OWL object properties. For triples: transformed into OWL classes with domain and range restrictions. Why did we convert relations to classes? Didn’t want to inflate the number of BioTop relations, and there are other structural reasons. If you reify the relation, you can provide complex restrictions on that relation. Also, it means you can formally represent the UMLS SN tags such as “defined not inherited” in a more rigorous way.

Mapping is fully manual using Protege 4, consistency check with Fact++ and Pellet supported by the explanation plugin (Horridge ISWC 2008) – they spent most of their time fighting against inconsistent TBoxes. It was an iterative process. Assessment is next. Using SN alone there is very low agreement with expert rating. Using SN+BioTop there were very few rejections (only 3) but agreed with all expert ratings. Possible reasons could be to do with the DL’s OWA and for the false positives that the expert rating was done on NE but system judgments were done on something else. There were inconsistent categorizations of UMLS SN objects which exposed hidden ambiguities (e.g. that Hospital was both a building and an organisation).

Allyson’s questions: Why decide to create BioTop and not use BFO or DOLCE lite? It’s not that I would necessarily suggest that these be used, I am just curious. Also, subsumption hierarchies are assumed to be is_a hierarchies, but is that a safe assumption in UMLS SN? For instance, in older versions of GO this would have been a problem (some things marked as subsumption were not in fact is_a, though I am pretty sure GO has fixed all of this now).

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

PT02:From Disease Ontology (DO) to Disease-Ontology Lite (DOLite)

June 29, 2009 Leave a comment

Warren A. Kibbe

Allyson’s note: I missed the beginning of this talk due to me participating in the press conference. Apologies.

Integrating clustering results is followed by final curation of DOLite terms, where a domain expert reviews the merged clusters. In summary, DOLite is a CV whereas the DO is an ontology. The purpose of this was to facilitate the funcational analysis based on a gene list. FunDO: website exploring genes using Functional Disease Ontology Annotations. You take the complete DO, and put in a typical gene list from a microarray study, and you get a network view of clustered genes. The same query using a gene list with DO and DOLite gets better clustering with DOLite (where better == more distinct clusters, greater number of clusters in the example we were shown – 2 clusters rather than 1).

You can also use GeneRIFs as a source (1000 genes with GeneRIFs annotations). You get slightly different answers depending on how you develop/annotate your gene list. Poorly-annotated genes, or a large % of genes with little or no exp literature will have few GeneRIFs.

Grouping ontology terms based on gene-to-ontology mapping provides a IC method for creating “Slims” from any type of ontology. They’ll do this with GO itself and see what their version of GOSlim looks like. Functional analysis based on DOLite provides much more concise and biologically-relevant results.

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

BFO/DOLCE Primitive Relation Comparison (ISMB Bio-Ont SIG 2009)

June 28, 2009 Leave a comment

A. Patrice Seyed

BFO is built for ontologies of sciences. BFO and RO are used in the OBO Foundry. DOLCE was built by Guarino. BFO Continuant/Endurant is synonymous with DOLCE’s Endurant/Perdurant(/Quality/Abstract are also included). For specific dependence, a dependent continuant ‘inheres in’ an independent continuant (relationship between particular and type). Specialized dependence relations are ‘quality of’, ‘function of’, and ‘role of’. In DOLCE, a quality can be a ‘quality of’ another quality, endurant or perdurant. There are still some questions over when to use function or role, as identified by a number of talks at today’s SIG. And from a BFO perspective, qualities only inhere in independent continuants.

The constitutes relation. X constitutes Y when there are properties of X which are accidental to X but essential to Y. BFO does not include consititution, but it does have ‘role of’, which is the closest it has. They want to find a way to continue to merge, and figure out how to integrate a conceptualist-centric ULO with a realist-centric ULO.

FriendFeed Discussion: http://ff.im/4xzrg

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

CiTO, the Citation Typing Ontology and its use for the annotation of reference lists and visualization of citation networks (ISMB Bio-Ont SIG 2009)

June 28, 2009 1 comment

David Shotton

They’ve added characterization to citations present on websites using CiTO. You can encode citation frequencies using CiTO, too. Another purpose is to characterize the cited works themselves. In doing so, he has adopted the FRBR entity model. For an example, they made FRBR entities for Gone with the Wind. The move, while based on the novel is a new creative work. The novel can have a variety of expressions. For these and more reasons it makes it a good example.

SWAP also uses the FRBR classification, and CiTO has adopted terminology and definitions from SWAP. However, SWAP is  concerned with the metadata describing a single work. CiTO describes aspects of scholarly works out of scope for SWAP (e.g. relations between citing and cited works). Another similar ontology is BIBO, but that deals with legal works, and BIBO lacks terms essential to CiTO. BIBO is essentially orthogonal with CiTO. Further, BIBO doesn’t use the FRBR classification. SWAN is another ontology designed to characterize rhetorical statements with text. It is limited in scope and still under development (just a cygnet!) but clearly relevant to CiTO. They’re starting a collaboration with Tim Clark.

What is the proper home for this? It’s not a biological ontology, so maybe doesn’t belong in OBO? They also want a nice authoring tool.

FriendFeed Discussion: http://ff.im/4xwI9

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Annotation of SBML Models Through Rule-Based Semantic Integration (ISMB Bio-Ont SIG 2009)

June 28, 2009 1 comment

Allyson Lister et al.

I didn’t take any notes on this talk, as it was my own talk and I was giving it. However, I can link you out to the paper on Nature Precedings and the Bio-Ontologies programme on the ISMB website. Let me know if you have questions!

You can download the slides for this presentation from SlideShare.

FriendFeed Discussion: http://ff.im/4xtmz

Representing the Immune Epitope Database in OWL (ISMB Bio-Ont SIG 2009)

June 28, 2009 1 comment

Jason Greenbaum et al. (Bjoern Peters presenting)

When a virus infects a mouse, the pieces of the virus end up on the cell surface where they are accessible to the immune cells.  Epitopes are the things that are recognized on the cell surface in this case. It is a role of a material entity that is realized when it binds to an adaptive immune receptor. Here, context is key: What immune receptor for the epitope? What host? What happened to the host previously? And remember, instances are not universals.

The goal of the IEDB is to catalogue and make accessible immune-epitope-related information. There are 10 full-time PhD-level curators, with 50,000 epitopes. They’ve completed about 99% of infectious disease and 90% allergies – next are autoimmune responses. This leads to large amounts of complex data which we have to deal with.

The IEDB development cycle is ontology development -> db (re)design -> content curation and back again. ONTIE = Ontology of Immune Epitopes at http://ontology.iedb.org. Ontie is intended to be superceded as other ontologies take up the terms present there. They’re database tables are aligned with the ontology, which relies very heavily on OBI. This is a method of “ontologic normalization” of the database. Data migration and consistency enforced by rule-based validation engine.

This alignment of ontology to db happened so we could have an easy db export to OWL.

FriendFeed Discussion: http://ff.im/4xqyz

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Modelling biomedical experimental processes with OBI (ISMB Bio-Ont SIG 2009)

June 28, 2009 Leave a comment

Larisa Soldatova et al.

OBI was created to meet the need for a standardised vocabulary for experiments that can be shared across many experiment types. OBI is community driven, with over 19 communities participating. It is a candidate OBO Foundry ontology, is complementary to existing bio-ontologies, and reuses existing ontologies where possible. It uses various ULOs for interoperability: BFO, RO, and IAO. material_entity class was introduced into BFO on request of the OBI developers, for instance.

OBI uses relations from BFO, RO, and IAO as well as creating relations specific to OBI. OBI relations could be merged with other relations ontologies in future. They try to have as few relations as possible. Two use cases were outlined in this paper. Firstly, analyte measuring assay, where you draw blood from a mouse and determine the concentration of glucose in it. Use case 2 was a vaccine protection study, where you measure how efficiently a vaccine induces protection against virulent pathogen infection in vivo.

Allyson’s thoughts: Disclosure: I am involved in the development of OBI.

FriendFeed Discussion: http://ff.im/4xoIA

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Simple, ontology-based representation of biomedical statements (ISMB Bio-Ont SIG 2009)

June 28, 2009 Leave a comment

…through fine-granular entity tagging and new web standards

Matthias Samwald et al.

He’s trying to make sense of a very large number of complicated interactions and connections between molecular phenomena. He’s part of the SW’s HCLSIG as part of W3C. Example: huge queries in the neurocommons knowledgebase, where they span multiple data sources. But there are still very few tools suitable for end users. He came up with <a>Tag, or associative tags. Here, you tag statements, not documents. You tag with entities, and not strings. It’s implemented with a bookmarklet. There is more to the bookmarklet than meets the eye: it is RDFa + SIOC + domain ontologies / terminologies. RDFa – allow you to imbed OWL and RDF snippets within HTML. Doing things this way means we don’t need to build everything from scratch, as can use existing HTML tools, e.g. move to a wordpress blog. aTags can also be generated by NLP web services.

Linked-data paradigm: entities have URIs that can be resolved to yield further information. Developers need understandable and predictable data structures across distributed data sources. They also don’t want to reinvent the wheel, and develop GUIs simply. Balance semantics and pragmatics.

FriendFeed discussion: http://ff.im/4xj6c

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Follow

Get every new post delivered to your Inbox.

Join 493 other followers