Meetings & Conferences Semantics and Ontologies

UKON 2018: Session 4

Distinct effects of ontology visualizations in entailment and consistency checking

Yuri Sato (Brighton), Gem Stapleton (Brighton), Mateja Jamnik (Cambridge) and Zohreh Shams (Cambridge)

When describing world knowledge, a choice must be made about its representation. We explore the manner in which ontological knowledge is expressed in ways accessible to humans in general. The compare novice users’ performance when logical task solving using two distinct notations.

SOVA is a full ontology visualization tool for OWL – you can build syntactic units which create a SOVA graph. Other graph tools (OWLViz, VOWL, Jambalaya) are insufficient to express all OWL constructs. Also, existing systems of hygraph, compounded dygraph, and constraint diagrams are not expressive enough to deal with ontologies. Stapleton et al (2017, VL/HCC 2017) describe concept diagrams. So therefore we have two methods to explore: topo-spatial and topological. In consistency checking tasks, Topological representations were better, while in entailment judgement Topo-spatial representation performed better. In summary, topology representations are suitable for most existing ontologies representations, but there is a need to design new ontology visualizations.

Users and Document-centric workflow for ontology development

Aisha Blfgeh and Phillip Lord (Newcastle)

Ontology development is a collaborative process. Tawny OWL allows you to develop ontologies in the same way as you write programs. You can use it to build a document-centric workflow for ontology development. You start with users editing an excel spreadsheet, which is then used as input in Tawny OWL to ultimately generate OWL. This will also generate a Word Document that users can see the changes in.

But how successful would the ontological information in the form of a word document actual be? Depends on the users – there are two types – the users of the ontology and the developers of the ontology. They started by classifying their users into: newbies, students, experts and ontologists and they worked on the pizza ontology.

The participants saw both the word document and in Protege. Errors were introduced and the participants were asked to find them. Reading text in Word helps explain the structure of the ontology especially for newbies. However, the hierarchy is very useful in Protege. The ability to edit the text in the word document is quite important for non-experts.

The generation of the word document is currently not fully automated, and therefore this is one of the things they plan to do. They also want to develop a Jupyter notebook for the work. Finally, they’d like to repeat this work with ontologists rather than just newbies.

DUO: the Data Use Ontology for compliant secondary use of human genomics data

Melanie Courtot (EBI) – on behalf of The Global Alliance For Genomics And Health Data Use Workstream

Codifying controlled data access consent. Data use restrictions originate from consent forms – and as a researcher, to get the data you have to go via data access committees. The current protocol for data access is: there are data depositors and data requestors. The data access committee sits between the data and the requestors and tries to align the requestors’ needs with the data use limitations. All of this is done manually, and is quite time consuming, Often there isn’t the human capacity to go through all requests. Therefore if we can encode consent codes into an ontology, perhaps the data access process could be more automated.

The use cases for this system would include data discovery, automation of data access, and standardization of data use restrictions and research purposes forms. DUO lives in a GitHub repo where they tag each release. They aim to keep DUO small and to provide clear textual definitions augmented with examples of usage. In addition, DUO provides automated machine-readable coding.

W3C Data Exchange Working Group (DXWG) – an update

Peter Winstanley (The Scottish Government) and Alejandra Gonzalez-Beltran(Oxford)

Peter co-chairs this working group and is one of their Invited Experts. He shares the burden of chairing and ensures that the processes are adhered to. These processes involve making sure there is openness, adequate minutes, and sensible behaviour. The working group is a worldwide organization, which makes it difficult to organize the weekly meetings (time zones etc). There are also subgroups, which means two awkwardly-timed meetings. This is the context in which the work is done.

The DCAT (Data Catalog Vocabulary) has been around since 2014 as a W3C recommendation. Once people really started using it, issues became apparent. There were difficulties with describing versioning, API access, relationships between catalogs, relations between datasets and temporal aspects of datasets etc. Therefore the way that people have used it is by mixing it with other things as part of an “application profile”. Examples include DCAT-AP, GeoDCAT-AP, HCLS Dataset description, DATS. Different countries have also already started creating their own application profiles as part of a wider programme of AP development (e.g. Core Public Service Vocabulary (CPSV-AP)).a

The mission of the DXWG is to revise the DCAT and then to define and publish guidance on the use of APs, and content negotiation when requesting and serving data. There have been a few areas where reduced axiomatisation is being proposed in the re-working of DCAT to increase the flexibility of the model.

You can engage with DXWG via github, the w3c meetings and minutes, the mailing lists, and provide feedback.

Panel Session

Robert Stevens introduced the panel. He stated that one of the reason he likes this network is its diversity. Panellists: Helen Lippell, Allison Gardener, Melanie Courtot, and Peter Andras. The general area for discussion is: in the era of Big Data and I, what type of knowledge representation do we need?

Melanie Courtot: It depends on what you’re calling KR… Ontologies are time consuming and take a lot of time, and they’re typically not funded. If we’re talking about KR other than ontologies, then you want to ensure that you keep any KR solution lightweight. She liked that a lot of the talks were very practically oriented.

Helen Lippell: She doesn’t work on funded projects at the moment, but instead going into private sector companies. They have lots of projects on personalization and content filtering. You can’t really do these things without ontologies / domain models / terminologies, and without ensuring these are all referring to the same thing. She’s like to see more people in the private sector working with ontologies – shouldn’t be just academics – go out and spread your knowledge!

Allison Gardener: From the POV of a biologist coming into Computer Science, she’s primarily concerned with high quality data rather than just lots of data. What features she chose and how she defined these features was really important. Further, how you define a person (and their personal data) would determine how they are treated in a medical environment. Ontologies are really important in the context of Big Data.

Peter Andras: If you look how KR works in the context of Image Analysis – transformation of images and fed into a Neural Network – you get statistical irregularities in the data space. Your KR should look at these irregularities and structure those in a sensible way that you can use for reasoning. This works for images, but is more difficult  / much less clear when you’re looking at text instead. However, if you can add semantics into the text data, perhaps you can more meaningfully derive what transformations make sense to get those high quality irregularities from your analysis. Sociologists have several million documents of transcribed text from interviews – how you analyse this, and get out a meaningful representation of the information contained therein, is difficult and ontologies could be helpful. How can you structure theories and sociological methodologies such that you add more semantics?

Q: Have ontologies over-promised? Did we think it could do more than it has turned out that it could do? Melanie: What are we trying to do here? Trying to make sense of a big bunch of data… As long as the tools work, it doesn’t really matter if we don’t use ontologies. Phil: “Perfection is the enemy of the good.” Peter: There hasn’t been really an over-hype problem. Perhaps you’ll see the development of fewer handcrafted ontologies and more automated ontologies via statistical patterns. But what kind of logic should we use? Alternative measures of logic might apply more – the weighting of logic changes.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Meetings & Conferences Semantics and Ontologies

UKON 2018: Session 3

A Community-based Framework for Ontology Evaluation

Marzieh Talebpour, Thomas Jackson and Martin Sykora (Loughborough)

There are many systems supporting ontology discovery and selection. She reviewed 40 systems in the literature and came up with a generic framework to describe them. They all have a collection of ontologies gained by various means and then they receive added curation. She wanted to evaluate the quality of ontologies and aid the selection process through metrics. There are three groups of such metrics – internal, metadata and social metrics. Although you can group them in this way, do knowledge and ontology engineers actually consider social metrics when evaluating the ontologies?

She interviewed ontologists to discover what they saw as important. After getting the initial list of metrics from the interviews, she did a survey of a larger group to rank the metrics.

Towards a harmonised subject and domain annotation of FAIRsharing standards, databases and policies

Allyson Lister, Peter Mcquilton, Alejandra Gonzalez-Beltran, Philippe Rocca-Serra, Milo Thurston, Massimiliano Izzo and Susanna-Assunta Sansone (Oxford)

(This was my talk so I didn’t take any notes, so here’s a summary)

FAIRsharing ( is a manually-curated, cross-discipline, searchable portal of three linked registries covering standards, databases and data policies. Every record is designed to be interlinked, providing a detailed description not only of the resource itself, but also its relationship to other resources.

As FAIRsharing has grown, over 1000 domain tags across all areas of research have been added by users and curators. This tagging system, essentially a flat list, has become unwieldy and limited. To provide a hierarchical structure and richer semantics, two application ontologies drawn from multiple community ontologies were created to supplement these user tags. FAIRsharing domain tags are now divided into three separate fields:


  • Subject Resource Application Ontology (SRAO) – a hierarchy of academic disciplines that formalises the re3data subject list ( Combined with subsets of six additional ontologies, SRAO provides over 350 classes.
  • Domain Resource Application Ontology (DRAO) – a hierarchy of specific research domains and descriptors. Fifty external ontologies are used to provide over 1000 classes.


  1. Free-text user tags. A small number of FAIRsharing domain tags were not mappable to external ontologies and are retained as user tags. Existing and new user tags may be promoted to either application ontology as required.

From the initial user tags to the development of the new application ontologies, our work has been led by the FAIRsharing community and has drawn on publicly-available resources. The FAIRsharing application ontologies are

  1. Community driven – our users have created the majority of the terms, providing the initial scope for DRAO and SRAO.
  2. Community derived – to describe the wide range of resources available in FAIRsharing, we imported subsets of over fifty publicly-available ontologies, many of which have been developed as part of the OBO Foundry.
  3. Community accessible – with over 1400 classes described, these cross-domain application ontologies are available from our Github repositories (, and are covered by a CC BY-SA 4.0 licence.

Guidelines for the Minimum Information for the Reporting of an Ontology (MIRO)

Nicolas Matentzoglu (EMBL-EBI), James Malone (SciBite), Christopher Mungall (The Lawrence Berkeley National Laboratory) and Robert Stevens (Manchester)

Ontologies need metadata, and we need a minimal list of required metadata for ontologies. They started with a self-made list, and then created a survey that was widely dispersed. The stats from that survey were then used to discover what was most important to you. Reporting items include: Basics, Motivation, Scope, Knowledge acquisition, ontology content, managing change, quality assurance.

What was surprising was the amount of items that were considered very important and ended up with a MUST in MIRO. The ones with the highest score were URL, name, owner and license (clearly). The bottom three were less obvious: content selection, source knowledge location and development environment.

They then tested retrospective compliance by looking through publications – ended up with 15 papers. The scope and coverage, need, KR language, target audience, and axiom patterns were very well represented. Badly represented were ontology license, change management, testing, sustainability, and entity deprecation policy.

Testing was both not reported and not considered important.  Allyson note: I think that this is self fulfilling – there is no really good way to test other than running a reasoner, so something like Tawny OWL allows this, and therefore create an interest in actually doing so.

Tawny-OWL: A Richer Ontology Development Environment

Phillip Lord (Newcastle)

Tawny OWL is a mature environment for Ontology development. It provides a very different model than other existing methods. It allows for literate ontology development. Most people use Protege, others use the OWL API. The driving use case was the development of an ontology of the human chromosomes – complex to describe, but regular. 23 chromosomes, 1000 bands, and the Protege UI can’t really handle the number of classes required.

Tawny OWL is an interactive environment built on Clojure and you can use any IDE or editor that knows about Clojure / leiningen. You can then replace a lot of the ontology-specific tools and use more generic ones – versioning with git, unit testing with clojure, dependency management with Maven, continuous integration with Travis-CI.

It allows for literate development because it allows for fully descriptive documentation / implementation comment (stuff you’d put in code that isn’t meant to be user facing) which wasn’t really possible in the past. Version 2.0 has regularization and reimplementation of the core, patternization support (gems and tiers), a 70 page manual, project templates with integrated web-based IDE, and is internationalizable.

Automating ontology releases with ROBOT

Simon Jupp (EBI), James Overton (Knocean), Helen Parkinson (EBI) and Christopher Mungall (The Lawrence Berkeley National Laboratory)

Why do we automate ontology releases? When you have a regular release cycle which triggers release of other services. You also have the creation of various versions of the ontology.  What happens as part of the release? Pull desired sections of various ontologies – desired terms are kept in a TSV file.

ROBOT is an ontology release toolkit. It is both a library and a command-line tool. Commands can be chained together to create production workflows. Within EFO, the ROBOT commands are added to the EFO makefile, where the ontology release is treated as a compile step. This allows testing to happen prior to release.

ROBOT commands include merging, annotation, querying, reasoning, template (TSV -> OWL), and verification.

Bioschemas Community: Developing profiles over to make life sciences resources more findable

Alasdair Gray (Heriot-Watt) and The Bioschemas Community (Bioschemas)

They are asking for developers to add 6 minimum properties. The specification is added on top of the specification. Over 200 people involved in a number of workshops. To create bioschemas, they identify use cases and then map to existing ontologies. Then a specification is created, tested and then applied.

They’ve had to create a few new types which didn’t have (e.g. Lab Protocol, biological entity). 16 sites have deployed this, including FAIRsharing. Will other search engines respect this? The major 7 search engines are using schema markup.


Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Meetings & Conferences Semantics and Ontologies

UKON 2018: Session 2

Gender: Its about more than just gonads.

Phillip Lord (Newcastle)

He begins with a story – what does LGBTQIA+? How do you define this in an ontology? Perhaps start with something simpler… This is about social modelling. Modelling this is a challenge because it is important, and complicated, and sensitive.

First you need to consider gender versus sex. Newcastle has one of the 7 gender dysphoria clinics in the UK. ICD-10 has a classification of disease called “trans-sexual” which has been removed in ICD-11 because it is not a disease. You also have PATO, which describes gender – among other things. PATO’s male and female definitions has its own issues. These definitions are based on gametes, which is problematic – if you are a infertile man you are both female and male (and so on and so forth). Intact, and Castrated and other aspects of the PATO definitions have problems. The definition of Castrated Male contradicts the definition of Male.

The beginning of Phil’s ontology is Birth Assigned Gender. Other terms include Affirmed, Man, Woman, pronouns, legal gender and biological gender (biological gender will be dropped)Man and Woman are defined based on your affirmed gender, not your assigned gender.

He’s also started modelling sexuality. The entire area is difficult to model, and is critical for many IT systems, and is very interesting.

SyBiOnt: The Synthetic Biology Ontology

Christian Atallah (Newcastle), James McLaughlin (Newcastle), James Skelton(Newcastle), Goksel Misirli (Keele) and Anil Wipat (Newcastle)

Synthetic Biology: the use of engineering principles for the development of novel biological applications. The classic build -> test -> learn -> design -> build. Synthetic biology is very modular and includes many different types of biological parts. SBOL is used to visualize and build synthetic biology designs. SyBiOntKB is an example of using the ontology. You can mine SyBiOntKB to get synthetic biology parts.

SBOL-OWL: The Synthetic Biology Open Language Ontology

Goksel Misirli (Keele), Angel Goni-Moreno (Newcastle), James McLaughlin(Newcastle), Anil Wipat (Newcastle) and Phillip Lord (Newcastle)

Reproducibility of biological system designs is very important. SBOL has been adopted by over 30 universities and 14 companies worldwide as well as ACS Synthetic Biology. The designs are hierarchical and can be grouped into modules. In order to understand SBOL you need to read the User Guide – it isn’t available computationally. Validation rules are in the appendix, and SBOL refers to external ontologies and CVs to provide definitions. So, how should you formally define this? Provide an ontological representation of SBOL data model – SBOL-OWL.

Example query – return a list of ComponentDefinitions that are promoters and of type DNARegion and that have some Components.

SBOL-OWL allows computational validation of verification rules, and allows automated comparison of incremental SBOL Specifications. It provides a machine-accessible description of SBOL entities. You can annotate genetic circuit designs with rich semantics, which would allow you to use logical axioms for reasoning over design information.

Please note that this post is merely my notes on the presentations. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Meetings & Conferences Semantics and Ontologies

UKON 2018: Ivana Bartoletti on security versus privacy in data use

Session 2 began with a presentation by Ivana Bartoletti

What is a good use of data? She works a lot with smart metering, smart homes and connected cities. To address population growth and climate change, we need new thinking. Big Data can help with this, but there are serious privacy concerns about it. Individuals need to be able to discover exactly what data concerns them.

People don’t want to give up with Google Maps and Facebook. These free services have become part of our life and help with our daily tasks. Therefore transparency has become more important than ever. The new GDPR is relevant in this context. Privacy terms on websites are difficult to read and are often convoluted.

The new legislation describes the right of erasure. It states that all available technology needs to be used to delete data – this includes contacting any other companies that might have used that person’s data. Transparency is vital, especially as bias can easily creep in when analysing personal data / profiling. GDPR was created to support the single digital market. The free flow of data across the EU is an important discussion point as part of Brexit.

Must redefine the concept of personal data. We should not define personal data as something we own – it’s something we are. If we think of it as a car that we can sell, we are taking the wrong approach. Instead, it’s like our heart – it’s who we are. So, your Facebook account is part of your personality. Shifting this definition can drive and inform the debate about the transaction of data in return for free services.

This will result in a new ethical debate about how personal data is used. Corporations don’t always understand the data, and therefore struggle to govern it properly. You need an open information system with a high level of transparency and interoperability. Ontologies have a very big potential in this area.

Practically speaking, how would changing how we perceive our personal data (as who we are) change our day to day life?

It will be very challenging to remove someone’s personal data from publications / studies. Will papers have to be changed after they’ve been published? You need to consider if it is personal or anonymized data. You need to de-identify data much more carefully. This question shows how important it is to extract data.

There is a conflict between personal data and personal identity. We often use the data to establish the identity rather than just for the purpose of sharing data. Digital identities are an important part of this research.

Please note that this post is merely my notes on the presentations. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Meetings & Conferences Semantics and Ontologies

UKON 2018: Morning Session

Session 1 Chair: Dr Jennifer Warrender

This session contains short 10-minute talks.

Organising Results of Deep Learning from PubMed using the Clinical Evidence-Based Ontology (CEBO)

M. Arguello Casteleiro (Manchester), D. Maseda-Fernandez (NHS), J. Des-Diz(Hospital do Salnes), C. Wroe (BMJ), M.J. Fernandez-Prieto (Salford), G. Demetriou(Manchester), J. Keane (Manchester), G. Nenadic (Manchester) and R. Stevens (Manchester)

They are combining three areas when studying semantic deep learning: natural language programming, deep learning and the semantic web. The purpose of CEBO is to filter the strings into the ones that are the most useful for clinicians. In other words, CEBO filters and organises Deep Learning outputs. Ot has 658 axioms (177 classes).

Using Card Sorting to Design Faceted Navigation Structures

Ed de Quincey (Keele)

Some of this work is a few years old, but the technique hasn’t been used much and therefore he’d like to present it to make it more visible again. Card sorting begins with topics of content + cards + people who sort them into categories = the creation of information architecture. It can be used to (re)design websites, for example. For the new CS website at Keele, they gave 150 students (in groups of 5) 100+ slides and asked them to categorize. Pictures as well as text can be used, e.g. products. You can also do card sorting with physical objects.

Repeated Single-Criterion Sorting: Rugg and McGeorge have discussed this in a paper. With this technique, because you’re asking them to sort multiple times, you instead use about 8-20 cards at a time. Also, you can get a huge amount of information just by doing this with about 6 people. An example is sorting mobile phones. You ask people to sort the objects into groups based on a particular criterion, e.g. color. Then after sorting, you ask them to sort again, and continue on with a large number of criteria. You ask them to keep sorting until they can’t think of any other ways to sort them. Then you pick a couple at random, and ask the people to describe the main difference between them, which usually gets you another few criteria to sort on.

Overall, this allows you to elicit facets from people. Allows you to create a user-centered version of faceted navigation. For his work, he looked at music genre, and investigated whether or not it is the best way to navigate music. He asked 51 people to sort based on their own criteria. He got 289 sorts/criteria during this work. This was then reduced to 78 after grouping them into superordinate constructs by an independent judge. After a while, you found a commonality for genre, speed and song, but then after that it becomes a lot more personal, e.g. “songs I like to listen to in lectures” 😉

Then you can create a co-occurence matrix for things like gender. There was no agreement with respect to genre, which was interesting. Spotify now supports more personal facets, which wasn’t available 8 years ago when this work was first done. As such, this technique could be very useful for developing ontologies.


Peter Murray-RustCharles Matthews and Thomas Arrow (ContentMine)

Peter feels that there is a critical need for Liberation Ontology, and regain control from publishers. Wikidata has about 50 million entities and even more triples, and it’s democratic. He says it is our hope for digital freedom. WikiFactMine (his group) added 13 million new items (scientific articles) to it. There are loads of disparate categories, so if you want ontological content, WikiData is the first (and only) place to go. Good example of a typical record is Douglas Adams (Q42 – look it up!).  Scientific articles can be WikiData items. They were funded by WikiMedia to set up WikiFactMine for mining anything, but particularly the scholarly literature.

You can create WikiFactMine dictionaries. It is constructed such that there is a hierarchy of types (e.g. the entire animal kingdom in the biology subset). They created a dictionary of drugs just by searching on “drug” and pulling out the information associated with it. There are issues with mining new publications however. Then you can combine dictionaries, e.g. gene, drug, country and virus. By doing co-occurence of country + disease, you may be able to predict outbreaks.

The Right to Read is the Right to Mine.

Is there some kind of curation / moderation on WikiData? There is curation on the properties (the community has to agree to this). WRT data, if people think it’s too trivial, it can be marked as a candidate for deletion, and discussions can ensue.

A Malay Translated Qur’an Ontology using Indexing Approach for Information Retrieval

Nor Diana AhmadEric Atwell and Brandon Bennett (Leeds)

Improving the query mechanism for retrieval from Malay-translated Qur’an. Many Muslims, especially Malay readers, read the Qur’an but do not understand Arabic. Most of the Malay-translated applications only offer keyword search methods, but does not help with a deeper understanding. Further, morphological analysis is complicated in Malay, because it has a different structure. They are building an semantic search and an ontology. They wish to improve speed and performance for finding relevant documents in a search query. Also built a natural-language algorithm for the Malay language.

Ontology + relational database was used. ~150,000 words. With keyword search, there was 50% precision, and with her new method, was ~80% precision.

Towards Models of Prospective Curation in the Drug Research Industry

Samiul Hasan (GlaxoSmithKline)

As we think about making precision medicine a reality, it is much more likely that we will fail because of the challenges of data sharing and data curation (Anthony Philippakis, the Broad Institute).

2 important attributes of scientific knowledge management: persistence and vigilance (without access to the right data and prior knowledge at the right time, we risk making very costly, avoidable business decisions). Persistence requires efficient organization, and vigilance requires effective organization. What’s getting in the way of these aspirations is the inconsistent use of language at the source, which creates serious downstream problems. What about implementing reward in data capture steps? How do we not miss vital data later on? Named entity recognition, document classification, reinforcement learning, trigger event detection. You need both vision-based and user-centric software development.

Posters and Demos: 1-minute intros

  • Bioschemas – exploiting schema markup to make biological sources more findable on the web.
  • Document-centric workflow for ontology development – read from excel spreadsheet using Tawny Owl and create an ontology which can be easily rebuilt
  • Tawny OWL – a highly programmatic environment for ontology development (use software engineering tools / IDEs to build up ontologies.
  • Hypernormalising the gene ontology – as ontologies get bigger, they get harder to maintain. Can you use hypernormalization to help this? It is an extension of the normalising methodology.
  • Bootstrapping Biomedical ontologies from literature – from PubMed to ontologies.
  • meta-ontology fault detection
  • Bioschemas – show the specification and how they’re reusing existing ontologies
  • Get the phenotype community to use logical definitions to increase cohesion within the community (Monarch Consortium)

Please note that this post is merely my notes on the presentations. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!