Categories
Meetings & Conferences Semantics and Ontologies

BFO/DOLCE Primitive Relation Comparison (ISMB Bio-Ont SIG 2009)

A. Patrice Seyed

BFO is built for ontologies of sciences. BFO and RO are used in the OBO Foundry. DOLCE was built by Guarino. BFO Continuant/Endurant is synonymous with DOLCE’s Endurant/Perdurant(/Quality/Abstract are also included). For specific dependence, a dependent continuant ‘inheres in’ an independent continuant (relationship between particular and type). Specialized dependence relations are ‘quality of’, ‘function of’, and ‘role of’. In DOLCE, a quality can be a ‘quality of’ another quality, endurant or perdurant. There are still some questions over when to use function or role, as identified by a number of talks at today’s SIG. And from a BFO perspective, qualities only inhere in independent continuants.

The constitutes relation. X constitutes Y when there are properties of X which are accidental to X but essential to Y. BFO does not include consititution, but it does have ‘role of’, which is the closest it has. They want to find a way to continue to merge, and figure out how to integrate a conceptualist-centric ULO with a realist-centric ULO.

FriendFeed Discussion: http://ff.im/4xzrg

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences Semantics and Ontologies

CiTO, the Citation Typing Ontology and its use for the annotation of reference lists and visualization of citation networks (ISMB Bio-Ont SIG 2009)

David Shotton

They’ve added characterization to citations present on websites using CiTO. You can encode citation frequencies using CiTO, too. Another purpose is to characterize the cited works themselves. In doing so, he has adopted the FRBR entity model. For an example, they made FRBR entities for Gone with the Wind. The move, while based on the novel is a new creative work. The novel can have a variety of expressions. For these and more reasons it makes it a good example.

SWAP also uses the FRBR classification, and CiTO has adopted terminology and definitions from SWAP. However, SWAP is  concerned with the metadata describing a single work. CiTO describes aspects of scholarly works out of scope for SWAP (e.g. relations between citing and cited works). Another similar ontology is BIBO, but that deals with legal works, and BIBO lacks terms essential to CiTO. BIBO is essentially orthogonal with CiTO. Further, BIBO doesn’t use the FRBR classification. SWAN is another ontology designed to characterize rhetorical statements with text. It is limited in scope and still under development (just a cygnet!) but clearly relevant to CiTO. They’re starting a collaboration with Tim Clark.

What is the proper home for this? It’s not a biological ontology, so maybe doesn’t belong in OBO? They also want a nice authoring tool.

FriendFeed Discussion: http://ff.im/4xwI9

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Data Integration Housekeeping & Self References Meetings & Conferences Semantics and Ontologies

Annotation of SBML Models Through Rule-Based Semantic Integration (ISMB Bio-Ont SIG 2009)

Allyson Lister et al.

I didn’t take any notes on this talk, as it was my own talk and I was giving it. However, I can link you out to the paper on Nature Precedings and the Bio-Ontologies programme on the ISMB website. Let me know if you have questions!

You can download the slides for this presentation from SlideShare.

FriendFeed Discussion: http://ff.im/4xtmz

Categories
Meetings & Conferences Semantics and Ontologies

Representing the Immune Epitope Database in OWL (ISMB Bio-Ont SIG 2009)

Jason Greenbaum et al. (Bjoern Peters presenting)

When a virus infects a mouse, the pieces of the virus end up on the cell surface where they are accessible to the immune cells.  Epitopes are the things that are recognized on the cell surface in this case. It is a role of a material entity that is realized when it binds to an adaptive immune receptor. Here, context is key: What immune receptor for the epitope? What host? What happened to the host previously? And remember, instances are not universals.

The goal of the IEDB is to catalogue and make accessible immune-epitope-related information. There are 10 full-time PhD-level curators, with 50,000 epitopes. They’ve completed about 99% of infectious disease and 90% allergies – next are autoimmune responses. This leads to large amounts of complex data which we have to deal with.

The IEDB development cycle is ontology development -> db (re)design -> content curation and back again. ONTIE = Ontology of Immune Epitopes at http://ontology.iedb.org. Ontie is intended to be superceded as other ontologies take up the terms present there. They’re database tables are aligned with the ontology, which relies very heavily on OBI. This is a method of “ontologic normalization” of the database. Data migration and consistency enforced by rule-based validation engine.

This alignment of ontology to db happened so we could have an easy db export to OWL.

FriendFeed Discussion: http://ff.im/4xqyz

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences Semantics and Ontologies Standards

Modelling biomedical experimental processes with OBI (ISMB Bio-Ont SIG 2009)

Larisa Soldatova et al.

OBI was created to meet the need for a standardised vocabulary for experiments that can be shared across many experiment types. OBI is community driven, with over 19 communities participating. It is a candidate OBO Foundry ontology, is complementary to existing bio-ontologies, and reuses existing ontologies where possible. It uses various ULOs for interoperability: BFO, RO, and IAO. material_entity class was introduced into BFO on request of the OBI developers, for instance.

OBI uses relations from BFO, RO, and IAO as well as creating relations specific to OBI. OBI relations could be merged with other relations ontologies in future. They try to have as few relations as possible. Two use cases were outlined in this paper. Firstly, analyte measuring assay, where you draw blood from a mouse and determine the concentration of glucose in it. Use case 2 was a vaccine protection study, where you measure how efficiently a vaccine induces protection against virulent pathogen infection in vivo.

Allyson’s thoughts: Disclosure: I am involved in the development of OBI.

FriendFeed Discussion: http://ff.im/4xoIA

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences Semantics and Ontologies

Simple, ontology-based representation of biomedical statements (ISMB Bio-Ont SIG 2009)

…through fine-granular entity tagging and new web standards

Matthias Samwald et al.

He’s trying to make sense of a very large number of complicated interactions and connections between molecular phenomena. He’s part of the SW’s HCLSIG as part of W3C. Example: huge queries in the neurocommons knowledgebase, where they span multiple data sources. But there are still very few tools suitable for end users. He came up with <a>Tag, or associative tags. Here, you tag statements, not documents. You tag with entities, and not strings. It’s implemented with a bookmarklet. There is more to the bookmarklet than meets the eye: it is RDFa + SIOC + domain ontologies / terminologies. RDFa – allow you to imbed OWL and RDF snippets within HTML. Doing things this way means we don’t need to build everything from scratch, as can use existing HTML tools, e.g. move to a wordpress blog. aTags can also be generated by NLP web services.

Linked-data paradigm: entities have URIs that can be resolved to yield further information. Developers need understandable and predictable data structures across distributed data sources. They also don’t want to reinvent the wheel, and develop GUIs simply. Balance semantics and pragmatics.

FriendFeed discussion: http://ff.im/4xj6c

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences Semantics and Ontologies

RKB: a SW knowledge base for RNA (ISMB Bio-Ont SIG 2009)

Michael Dumontier et al.

Wants to capture the structural features and interactions of RNA. Capture types and their relations, represent dynamic / context-specific knowledge, populate the KB with PDB structural data and MC-Annotate interactions, and answer questions about RNA structure. Looked at textbooks, review articles, book chapters, expert knowledge. Their upper-level ontology (ULO) was NULO, based on BFO/RO.

Contextual modelling of nucleic acids: base stacking varies in different NMR etc models. Then he described the Leontis-Westhof Nomenclature, where you describe the edges of the base as participating in the reaction. So a more sophisticated nomenclature was developed that was based on this called LW+ Nomenclature, where they divvied up the edges into a set of faces.

They want to capture information about residues, edges/faces, cis/trans nucleotide orientations, and across parallel/antiparallel strands. Base stacking involves inter-nucleobase interactions that involve London forces. Wanted to capture both numbers and a description of what is going on. They’ve used two roles: FacingAwayRole and FacingTowardsRole. There is both an endo and exo role for sugar puckering. Situational modeling assures that objects are represented by a single entity throught their lifetime.

RKB is popoulated with PDB and MC-Annotate and it is all represented in RDF. The population involved 3 steps: assigningnames, asserting class membership, and ?. So they can then ask the database things using DL queries. RKB is also accessible via SPARQL.

They’re now working with the RNA Ontology Consortium. They want to publish as part of the Bio2RDF netowkr, and extend the structure description with backbone angles.

NULO: there is a logic mapping bewteen NULO and BFO. They’ve relaxed restrictions where it is unclear what BFO’s stance is. It was unclear if you made certain statements you would still fit in with the idea in BFO.

FriendFeed Discussion: http://ff.im/4xhmC

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences Semantics and Ontologies

GOclasses: molecular function as viewed by proteins (ISMB Bio-Ont SIG 2009)

Daniel Faria et al.

If you look at GO classes, the distribution of protein functions follows a power law. GOclasses can help identify incomplete and inconsistent annotations. They devised some strategies to analyse things: Information content (IC), and id GOclasses with generic terms and id the primary term of a GOclass; conditional probability to id potential implicity relations between terms; use semantic similarity to id similar GOclasses.

The issue with IC  based on annotation frequency is that it is biased by popularity in nature. Other methods also have problems. Most classes have a maximum IC of between 50-60%, but 87% of the GOclasses have at least one very specific primary term. Inconsistent GOclasses often correspond to cases of implicit relationships. These could be formalized in GO or set as annotation guidelines to improve the consistency of new annotations. Formalizing implicit relationships will mean less terms are required to describe a given function.

FriendFeed Discussion: http://ff.im/4xgmQ

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences

Panel Session on the emergence, use and success of wikis for collaborative knowledge capture for biology (ISMB Bio-Ont SIG)

Panelists: Dawn Field, Andrew Su, Robert Stevens, Barend Mons

Question 1: Wikis work in general for widely accepted knowledge where there are many readers. Can crowd-sourcing work

  • RS: Yes, already present. Crowd-sourcing is not new – a publication has a crowd (albeit a small one) of readers
  • AS: Yes. In small pockets, wikis in science do work
  • BM: Yes, especially where data are controversial, it will work (e.g. Jesus and Adolf Hitler were the most heavily-edited entries in wikipedia). We need to federate all the wikis.
  • DF: Yes, as long as you’ve got a critical mass. Pick your targets well and you’ll succeed admirably.
  • Judy: Yes, it works in a general sense, and possibly can work in the specific science sense. I would like a common agreement as to what should be promoted in wikis/gene wikis, and what the source of the information is: evidence.
    • RS: This requirement Judy mentioned that we need to comply with scientific best-practice even within wikis is not a new idea. (Judy knows this).
  • Wikis are not machine readable, though the semantic wiki is (almost). To make them machine readable, they have to be annotated with controlled vocabularies.

Question 2: Wikis are great for reading, but are generally unstructured. Does the growth of the bio-ontology community not suggest this is the wrong way forward?

You need structured data, but wikis do not necessarily give you that

  • BM: It’s OK if people don’t structure it – you can use something like Peregrine and pull everything out and convert automatically into triples.
  • RS: All of this is predicated on people actually doing the work. We could cut all funding to human annotators 😉
  • How would you get this structured information at the top of the wiki page?
    • AS: You don’t ask your domain experts to structure your data: you value them for what you know, and not for their knowledge of structuring and ontologies.
  • Would each page then be a class or an instance? Wouldn’t that imply that they all have the same attributes (which they don’t)?
    • AS: This is why I believe we shouldn’t do away with human curators. Gene Wiki is not a replacement for traditional curation.
    • BM: I also agree we will always need human curation. This is why you need hypothetical/observational/curated triples.
  • Force people to submit grants in RDF 🙂
  • No-one expects people to submit structured data in RDF – biologists don’t submit information in raw HTML/XML. It’s all about the interface.
  • DF: What really caught her ear was the idea of nano-publication. Start getting people thinking about other ways to contribute – even the smallest addition to this comprehensive catalogue would be useful – and credited.

Question 3: The data is mine! Scientists will fiddle with wikipedia because they don’t care. But they won’t provide valuable knowledge without coercion. So much for wikis!

If I can’t list my 1000 edits in this wiki as publications or for impact assessments, then I won’t do any edits in wikis.

  • RS: We could all give it up and live in a commune 🙂
  • Helen Parkinson: Sanger are deleting some data because they don’t have enough room. Knowledge is a different thing, and treated differently. people have to share 🙂
  • BM: We have to change the way grants are written. Just like the way they did it with OA: if you want to promote open access, why not give grant money to help them out. It is irresponsible to give money to research and not make people put the data out in in an open, structured way (Allyson: I’m paraphrasing here, because I can’t remember exactly what was said).
  • AS: Wiki markup is not a requirement for contribution. One solution could be to enable semantic markup, but not require it in the same way as wiki markup.
  • There must be a strong ego incentive.

FriendFeed discussion: http://ff.im/4xcjq

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences

Keynote: We-Key, or Professional Wikis? (ISMB Bio-Ont SIG 2009)

Barend Mons

Showed the graph from radarnetworks.com from 2007 about Web X.0. What is Web 4.0? wwww? Why Would We Wiki? The “egosystem”. The key to successful wikis is that we are prepared to share our knowledge and prepared to get attribution for it. How can we make wikis for professional use? Not like wikipedia, but with community annotation and review. The very important thing is that we recognize authority. This is why they started the Concept Web Alliance.

The problem (also with wikis) is that no-one wants structured data entry – at least to write it: they want instead to just write free text. However, even if they don’t want structured data entry, they do want structured data! In the concept/token/object triangle: a concept should have a unique idea (e.g. malaria, transmission, mosquito).

Solving ambiguity with synonyms generally works nicely: use massive communities that we have in life sciences to do translation/addition of synonyms. It is therefore theoretically achievable to have a unique identifier for each concept. A triple to him is Concept1->Concept3-> Concept2, where Concept3 is the “relation”, e.g. (1: Barend Mons) (3: published) (2: this article).

Harmonizing data: they suggest doing this by going from data sources (in a daily feed) -> single MRS -> (via concept mapping) -> Peregrine, where the harmonized data will be. From Peregrine, you can link to a system that allows community annotation. The resulting triples are constructed in an unsupervised fashion. This will result in a massive triple store that is open for everyone to use however they like (e.g. as RDF or OWL or OBO etc).

There are different types of triples, such as curated tripes, observational triples and hypothetical triples. (Allyson: not sure if these are actually stored any differently, or just an explanation of the types of triples you can get.) Donate hypothetical triples to the database, even if you’re not sure. Later, if someone has evidence, you may get a nano-credit. These triples aren’t ontologies – they’re a rough source for ontologies. Then he provided an example using ErasmusMC.

Then there was a nice screenshot of the prototype of Wiki Professional. (url might be protein.wikiprofessional.org). Also it seems it can show a regular page like PubMed and overlay the things it knows about the text on the page.

So, how is CWA different from SW (semantic web)? It has strictly non-semantic unique identifiers for concepts; a strict triple format; a layered structure (curated observational hypothetical); triple provenance; strict separation of authority and community; see the triple as a nano-publication with nano-credits – see how innovative you are rather than how many papers you’ve written.

FriendFeed discussion: http://ff.im/4x30J

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!