Categories
Meetings & Conferences Semantics and Ontologies

RKB: a SW knowledge base for RNA (ISMB Bio-Ont SIG 2009)

Michael Dumontier et al.

Wants to capture the structural features and interactions of RNA. Capture types and their relations, represent dynamic / context-specific knowledge, populate the KB with PDB structural data and MC-Annotate interactions, and answer questions about RNA structure. Looked at textbooks, review articles, book chapters, expert knowledge. Their upper-level ontology (ULO) was NULO, based on BFO/RO.

Contextual modelling of nucleic acids: base stacking varies in different NMR etc models. Then he described the Leontis-Westhof Nomenclature, where you describe the edges of the base as participating in the reaction. So a more sophisticated nomenclature was developed that was based on this called LW+ Nomenclature, where they divvied up the edges into a set of faces.

They want to capture information about residues, edges/faces, cis/trans nucleotide orientations, and across parallel/antiparallel strands. Base stacking involves inter-nucleobase interactions that involve London forces. Wanted to capture both numbers and a description of what is going on. They’ve used two roles: FacingAwayRole and FacingTowardsRole. There is both an endo and exo role for sugar puckering. Situational modeling assures that objects are represented by a single entity throught their lifetime.

RKB is popoulated with PDB and MC-Annotate and it is all represented in RDF. The population involved 3 steps: assigningnames, asserting class membership, and ?. So they can then ask the database things using DL queries. RKB is also accessible via SPARQL.

They’re now working with the RNA Ontology Consortium. They want to publish as part of the Bio2RDF netowkr, and extend the structure description with backbone angles.

NULO: there is a logic mapping bewteen NULO and BFO. They’ve relaxed restrictions where it is unclear what BFO’s stance is. It was unclear if you made certain statements you would still fit in with the idea in BFO.

FriendFeed Discussion: http://ff.im/4xhmC

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences Semantics and Ontologies Software and Tools

Increasingly accurate biochemical knowledge representation with precise, structure-based chemical identifiers (ISMB Bio-Ont SIG 2009)

Michael Dumontier et al.

Problem: identifiers are a name for some biochemical entity. Records offer a rich description of the named entity. When viewing data, sometimes it’s difficult to know which form of a chemical the site is referring to. Peoples use identifiers when reporting experimental results, but it’s often unclear which species they’re referring to, and there can be erroneous/underspecified reporting of results. They’d like to generate stable identifiers based on explicit, machine-understandable descriptions which are unchanging and fully self-describing. With this style, different molecules must have different identifiers. For example, InChI strings are good but need specialized software to parse the InChI string.

Some formats that already exist are SDF and CML, whereas existing identifiers that contain chemical information are InChI and SMILES. So, what happens if you ask CML the differences between 3 very similar chemical species that only differ  in their stereochemistry? It isn’t really possible. He’d like to reason betwen relations and class membership, and to classification tasks.

In the vein of functional groups, they’d like to capture some form of generalisation: experimental conditions necessitate a certain level of structural (un)certainty. So, more flexible and accurate representation of biochemical knowledge beyond the exact structure. Classes would include: specification, minimum, combination, possibilities/uncertainties, exclusion.

Ultimately, what we want to do is to generate the useful identifier to point to accurate and unchanging descriptions. So, take what was done with InChI and generate something that can be self-explained. We need OWL description -> identifier. So they have a prototype service that allows you to submit an OWL snippet and get back an identifier. This means that if the description changes, the identifier changes. They will add new knowledge into the linked data web through Bio2RDF.

Benefits of this system include no curation being required, can make identifiers for knowledge at various levels of granularity. Situational modeling enables the careful separation of what is known under particular circumstances.

FriendFeed discussion: http://ff.im/4wZax

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!