These are my notes of the first day of talks at SWAT4HCLS.
Apologies for missing the first talk. I arrived just as the keynote by Denny Vrandečić titled “Wikidata and beyond – Knowledge for everyone by everyone” was finishing (the first train from the south only arrived in Edinburgh at 9am). (FAIRsharing Wikidata record.)
Enhancing the maintainability of the Bio2RDF project using declarative mappings
Ana Iglesias-Molina (speaker), David Chaves-Fraga, Freddy Priyatna and Oscar Corcho
Current Bio2RDF scripts are PHP scripts on an ad hoc basis. They are proposing the use of OBDA technologies (via declarative mapping). The workflow for OBDA includes 1) mapping file for relationships between data source and ontologies (e.g. in RML, R2RML), 2) use a mapping engine to transform the data sources into knowledge. Bio2RDF data source formats are mainly CSV/XSLX, then XML. They are developing a mapping engine for CSV/XSLX using a tool called Mapeathor. Mapeathor is used to generate the knowledge graph by mapping columns from the spreadsheets into appropriate triples.
They wish to increase the maintainability of the data transformation using the OBDA options, which 1) enables the use of different engines, and 2) creates a knowledge graph ready for data transformation and query translation. They would like to improve the process to deal with the heterogeneity of the data.
Suggesting Reasonable Phenotypes to Clinicians
Laura Slaughter (speaker) and Dag Hovland
They support communication between pediatrician and geneticist, to provide a more complete picture of the patient, but not to replace expertise of the physicians. HPO is used to specify abnormalities.
Their workflow. Pediatrician in intensive care suspects a newborn of a genetic disorder. Firstly, they need to get consent via a patient portal (DIBS – electronic health record). A requisition form has a subset of HPO codes that the pediatrician can select, and then the form and samples are sent off to the lab. Reporting the HPO codes to the lab helps the lab with their reporting and identification.
Phenotips is one HPO entry system (form-based ontology browse and search). Also extant is a natural language mapping from text. A third is Phenotero, which is a bit of a mixture of the two. When they started, the clinicians wanted to use Phenotips. Another related system is the Phenomyzer, which is a different perspective as it helps with the process of differential diagnosis. The authors thought they would just provide a service where they would suggest additional HPO codes to clinicians. But when they started to work on it, they had to make a new user interface after consultation with the clinicians. Additionally, they discovered that they would also need to provide a differential diagnosis feature.
There were a number of issues with the system that existed before they started. There was an overwhelming number of HPO codes for clinicians to sort through. There was no consistency checking or use of the HPO hierarchy. The NLP detection had a low accuracy and had to be supplemented with manual work. There was also no guidance for prompting for a complete picture of the patient or further work-up (available in Phenomizer).
They suggested that they provide a simple look-up mechanism using disease-HPO associations. Suggestions for clinicians come in the form of HPO codes that point to where further work-up might be needed. They also needed to implement ordering of HPO code candidates, and they did this by using disease information to inform priority settings, e.g. measuring the specificity of the disease given the phenotype entered by the clinician.
They order diseases in increasing order, by the ration of unselected phenotypes. There is a balance to find between giving the clinician a bias too early, or alternatively only being able to provide feedback in very specific circumstances. They implement their work using a reasoner called Sequoia, input forms and annotation files.
They are working with a geneticist and clinicians to find the best method for generating suggestions and evaluate the UI. They’re also exploring the ORDO Ontological Module (HOOM), which qualifies the annotations between a clinical entity from ORDO and phenotypic abnormalities from HPO according to frequency and by integrating the notion of diagnostic criteria.
A FHIR-to-RDF converter
Gerhard Kober and Adrian Paschke
FHIR is an HL7 standard with more than 100 resources defined. A FHIR-Store is a storage container for different resources, and they would like to ask SPARQL queries over the result set. Because in FHIR resources are meant to facilitate interoperability (but not semantic interoperability), the storage in RDF is not possible. They are implementing a system architecture that would have a FHIR-to-RDF converter sitting in between the client and the HL7 FHIR stores. This would allow the client to interact with RDF.
They have used the HAPI-FHIR and Apache-Jena libraries. The data is transformed from FHIR-JSON to Apache-Jena-RDF-Model. Searches of FHIR resources are returned as JSON objects. Performance is critical, and there are two time consuming steps: HTTP call to the FHIR store and the conversion from FHIR to RDF, and as such the performance might be a bottleneck. To alleviate this, queries to the FHIR store should be specialized. They also need to check if the transformation to Apache-Jena is too expensive.
A framework for representing clinical research in FHIR
Hugo Leroux, Christine Denney, Smita Hastak and Hugh Glove (speaker)
This covers work they’ve done as part of HL7 together over the past 6-8 months. They’ve had a FHIR Meeting “Blacing the Path Forward for Reserach” where they agreed to establish a new Accelerator Project to get a set of use cases. FHIR has been widely adopted in clinical care, mainly because of its accessibility and how it is all presented through a website. If you look at a FHIR resource, you get a display containing an identifier and some attributes. For instance, for Research Subject you would get information on state, study link, patient link, consent… Research Study includes identifier, protocol, phase, condition, site / location, inclusion/exclusion.
FHIR tooling enforces quality standards at the time of publishing the data, has publicly-available servers for testing, and others. It also provides RESTful services for the master data objects that are stateless and non-object oriented.
Much of the work involved is keeping track of the investigators and other trial management. They are looking at using FHIR resources to help with the trial management as well as the more traditional data capture and storage.
People can build tools around FHIR – one example is ConMan, which allows you to graphically pull resource objects in and link them together. With respect to linking resources together, with the resulting graph of objects looking a lot like a vocabulary/ontology relating ResearchStudies to Patients via ResearchSubject and other relationships.
The object model is quite complex. BRIDG is a domain model for FHIR in clinical research. The objective of what they’re doing is to stimulate a discussion on how clinical research semantics and data exchange use cases can be represented in FHIR.
Reconciling author names in taxonomic and publication databases
LSIDs were used early on in the semantic web – I remember those! However, LSIDs didn’t really work out – data didn’t just “join together” magically, unfortunately. He’s working towards a Biodiversity Knowledge Graph, as there is a lack of identifiers and a lack of links. Taxonomists often feel underappreciated, and under attack from people who are generating loads of new species and aggregated biodiversity data. Taxonomists are much less likely to have ORCIDs than the general research population, so in order to identify people you need to match people using CrossRef and ORCID either using schema:Role, or matching people in IPNI (a taxonomic database that still uses LSIDs?) and ORCID.
Not all ORCID profiles are equal – he shows us an example of one called “Ian”…, though he did figure out who it (probably) is. In conclusion, the semantic web for taxonomic data failed because of the lack of links, and making retrospective links is hard. Additionally, there is the “Gary Larson” problem of people hearing “blah blah RDF blah” 🙂
On Bringing Bioimaging Data into the Open(-World)
Josh Moore (speaking), Norio Kobayashi, Susanne Kunis, Shuichi Onami and Jason R. Swedlow
In imaging, the diversity is visual and you can see how different things are. They are developing a 5d image representation / model: 3d movies in multiple colors. From there it gets more complicated with multilayer plates and tissues. They develop the Image Data Resource. They are interested in well-annotated image data, e.g. from the EBI as well as controlled vocaularies. They are getting lots of CSV data coming in which is horrible to process.
They translate over 150 file formats via BioFormats by reverse engineering the file formats – big job! They tried to get everyone using OME-TIFF but it wasn’t completely successful. However, it was a good model of how such things should be done: it’s a community-supported format, for example.
This community is still a bit “closed world”. In 2016 they started development of the IDR, and needed to formalize the key/value pairs. However, the community continues to want to extend it more. As a result, they want to leave the key/value pairs and move back to something more semantic. Use cases include extension of the entire model or conversion of the entire model – Norio Kobayashi converted the entire model into OWL (OME Core ontology <= OME Data Model (which itself is OME-TIFF + OME-XML)). Extension is the 4D Nucleome ontology.
He likes the Semantic Web solutions as it reduces the cost of more adhoc XML extensions. Perhaps could use JSON-LD as it may end up being the “exciting” front end? Bio-(light) imaging is relatively new to this and lagging in investment for these sorts of things.
Please also see the full programme. Please note that this post is merely my notes on the presentations. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any issues you may spot – just let me know!