Session 1 Chair: Dr Jennifer Warrender
This session contains short 10-minute talks.
Organising Results of Deep Learning from PubMed using the Clinical Evidence-Based Ontology (CEBO)
M. Arguello Casteleiro (Manchester), D. Maseda-Fernandez (NHS), J. Des-Diz(Hospital do Salnes), C. Wroe (BMJ), M.J. Fernandez-Prieto (Salford), G. Demetriou(Manchester), J. Keane (Manchester), G. Nenadic (Manchester) and R. Stevens (Manchester)
They are combining three areas when studying semantic deep learning: natural language programming, deep learning and the semantic web. The purpose of CEBO is to filter the strings into the ones that are the most useful for clinicians. In other words, CEBO filters and organises Deep Learning outputs. Ot has 658 axioms (177 classes).
Using Card Sorting to Design Faceted Navigation Structures
Ed de Quincey (Keele)
Some of this work is a few years old, but the technique hasn’t been used much and therefore he’d like to present it to make it more visible again. Card sorting begins with topics of content + cards + people who sort them into categories = the creation of information architecture. It can be used to (re)design websites, for example. For the new CS website at Keele, they gave 150 students (in groups of 5) 100+ slides and asked them to categorize. Pictures as well as text can be used, e.g. products. You can also do card sorting with physical objects.
Repeated Single-Criterion Sorting: Rugg and McGeorge have discussed this in a paper. With this technique, because you’re asking them to sort multiple times, you instead use about 8-20 cards at a time. Also, you can get a huge amount of information just by doing this with about 6 people. An example is sorting mobile phones. You ask people to sort the objects into groups based on a particular criterion, e.g. color. Then after sorting, you ask them to sort again, and continue on with a large number of criteria. You ask them to keep sorting until they can’t think of any other ways to sort them. Then you pick a couple at random, and ask the people to describe the main difference between them, which usually gets you another few criteria to sort on.
Overall, this allows you to elicit facets from people. Allows you to create a user-centered version of faceted navigation. For his work, he looked at music genre, and investigated whether or not it is the best way to navigate music. He asked 51 people to sort based on their own criteria. He got 289 sorts/criteria during this work. This was then reduced to 78 after grouping them into superordinate constructs by an independent judge. After a while, you found a commonality for genre, speed and song, but then after that it becomes a lot more personal, e.g. “songs I like to listen to in lectures” 😉
Then you can create a co-occurence matrix for things like gender. There was no agreement with respect to genre, which was interesting. Spotify now supports more personal facets, which wasn’t available 8 years ago when this work was first done. As such, this technique could be very useful for developing ontologies.
Peter Murray-Rust, Charles Matthews and Thomas Arrow (ContentMine)
Peter feels that there is a critical need for Liberation Ontology, and regain control from publishers. Wikidata has about 50 million entities and even more triples, and it’s democratic. He says it is our hope for digital freedom. WikiFactMine (his group) added 13 million new items (scientific articles) to it. There are loads of disparate categories, so if you want ontological content, WikiData is the first (and only) place to go. Good example of a typical record is Douglas Adams (Q42 – look it up!). Scientific articles can be WikiData items. They were funded by WikiMedia to set up WikiFactMine for mining anything, but particularly the scholarly literature.
You can create WikiFactMine dictionaries. It is constructed such that there is a hierarchy of types (e.g. the entire animal kingdom in the biology subset). They created a dictionary of drugs just by searching on “drug” and pulling out the information associated with it. There are issues with mining new publications however. Then you can combine dictionaries, e.g. gene, drug, country and virus. By doing co-occurence of country + disease, you may be able to predict outbreaks.
The Right to Read is the Right to Mine. http://contentmine.org
Is there some kind of curation / moderation on WikiData? There is curation on the properties (the community has to agree to this). WRT data, if people think it’s too trivial, it can be marked as a candidate for deletion, and discussions can ensue.
A Malay Translated Qur’an Ontology using Indexing Approach for Information Retrieval
Nor Diana Ahmad, Eric Atwell and Brandon Bennett (Leeds)
Improving the query mechanism for retrieval from Malay-translated Qur’an. Many Muslims, especially Malay readers, read the Qur’an but do not understand Arabic. Most of the Malay-translated applications only offer keyword search methods, but does not help with a deeper understanding. Further, morphological analysis is complicated in Malay, because it has a different structure. They are building an semantic search and an ontology. They wish to improve speed and performance for finding relevant documents in a search query. Also built a natural-language algorithm for the Malay language.
Ontology + relational database was used. ~150,000 words. With keyword search, there was 50% precision, and with her new method, was ~80% precision.
Towards Models of Prospective Curation in the Drug Research Industry
Samiul Hasan (GlaxoSmithKline)
As we think about making precision medicine a reality, it is much more likely that we will fail because of the challenges of data sharing and data curation (Anthony Philippakis, the Broad Institute).
2 important attributes of scientific knowledge management: persistence and vigilance (without access to the right data and prior knowledge at the right time, we risk making very costly, avoidable business decisions). Persistence requires efficient organization, and vigilance requires effective organization. What’s getting in the way of these aspirations is the inconsistent use of language at the source, which creates serious downstream problems. What about implementing reward in data capture steps? How do we not miss vital data later on? Named entity recognition, document classification, reinforcement learning, trigger event detection. You need both vision-based and user-centric software development.
Posters and Demos: 1-minute intros
- Bioschemas – exploiting schema markup to make biological sources more findable on the web.
- Document-centric workflow for ontology development – read from excel spreadsheet using Tawny Owl and create an ontology which can be easily rebuilt
- Tawny OWL – a highly programmatic environment for ontology development (use software engineering tools / IDEs to build up ontologies.
- Hypernormalising the gene ontology – as ontologies get bigger, they get harder to maintain. Can you use hypernormalization to help this? It is an extension of the normalising methodology.
- Bootstrapping Biomedical ontologies from literature – from PubMed to ontologies.
- meta-ontology fault detection
- Bioschemas – show the specification and how they’re reusing existing ontologies
- Get the phenotype community to use logical definitions to increase cohesion within the community (Monarch Consortium)
Please note that this post is merely my notes on the presentations. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!