These are my notes for the sixth session of talks at the UK Ontology Network Meeting on 14 April, 2016.
What’s in a Name: identifiers in ontologies
Ontologies consist of lots of terms, and we need to be able to refer to them in an unambiguous way. There are lots of identifier schemes, so what are the characteristics of a good one? Should be semantics free, and they are often numeric and incremented. Is a numeric a good scheme? No…
…because there will be a guaranteed collision if two authors work on the ontology at once. You could use URIgen or similar, but a better solution is to use a random number generator.
Numeric identifiers are also difficult to remember, and therefore very easy to get wrong. And your wrong identifier might also be valid, but for a different class. So you could fix this by using a check-digit.
Numeric identifiers are hard to remember and hard to pronounce. Though the number can be hidden with tools, it doesn’t solve the problem. You could use Proquint, which is a bidirectional transformation to letters which alternate consonant and vowels and have vaguely pronouncable words to help people remember.
So, solutions: randomness, checked, pronouncable: https://github.com/phillord/identitas
A New Ontology Lookup Service at EMBL-EBI
Simon Jupp, Tony Burdett, Catherine Leroy, Thomas Liener, Olga Vrousgou, Helen Parkinson
The original OLS has an old codebase (nearly 10 years old in places), and was built around the OBO format (hence an outdated parser). Also built around an assumption that ontologies are available in a public VCS. Assumes that a term only exists in one ontology. Uses Oracle RDBMS and SQL for querying, which is suboptimal. API was SOAP, users want REST.
It has been rebuilt from scratch. Ontologies are polled by URL and not just VCSs. RESTful API, makes use of the Java OWL API behind the scenes, and has multiple indexes for scalable querying. There are 147 ontologies and 4.5 million terms.
Can load any OWL or SKOS file. Open source project at http://github.com/EBISPOT/OLS
An ontology-supported approach to predict automatically the proteases involved in the generation of peptides
Mercedes Arguello Casterleiro, Julie Klein, Robert Stevens
Peptides are useful biomarkers. The PxO (Proteasix Ontology) reuses other ontologies e.g. GO, NCBI Taxonomy, PRO. UniProtKB proteins are organized by Taxons and annotated with GO. They are trying to model the cleavage site patterns. To use peptides as biomarkers, you need lots of data and data linkages. They are using SPARQL queries to query their data.
TopFIND2 and Proteasix can help to automatically predict modification of protease activity.
Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!