Categories
Uncategorized

UKON 2016:Making Sense of Description Logics

These are my notes for Paul Warren’s talk at the UK Ontology Network Meeting on 14 April, 2016.

Source http://slidewiki.org/upload/media/images/25/2472.PNG 14 April 2016

What can we learn from the theories of reasoning? To understand the difficulties experienced with DLs, and try to mitigate those difficulties. There have been a lot of psychological studies on how people reason. Historically, there have been two camps, the rule-based and the model-based reasoning. Rule-based (syntactic) is where we reason by constructing logical steps akin to those created formally by a logician in a proof. Model-based is about reasoning via constructing mental models which represent the situation (semantic). These two are complemented by a third method (missed that name).

 

English: John only has sons (implication: he does have children but no daughters)

Manchester OWL: John has_children only Male (implication: if he has children, they are sons, but he might not have any at all)

 

Mental models

English: John sons

Manchester OWL: John son(s), John has things which are sons

It seems that people are reasoning syntactically, based on the work he has done. Recommendations are then to use syntax where possible to emphasize semantics. However, beware of false equivalences and teach the duality laws as expressed in Manchester OWL syntax and let tools show alternative equivalent statements.

In a study, only half got it right that the following were not disjoint: “has_child only MALE” and “has_child only (not MALE)”. Replacing only with only or none helped a lot.

People didn’t do very well thinking about functional object properties. But why was this? He designed two pieces of reasoning which were equivalent which required 3 reasoning steps to get to the appropriate conclusion. One piece used functional, and the other used the transitive property. People finished the reasoning faster and more got it correct when using transitivity.

Functionality is inherently difficult. They suggest a new keyword solely, to emphasise that it is the object which is unique and not the subject. This showed a significant improvement in performance.

Theories of reasoning and language provide insight and lead to recommendations for modifications to syntax, tool support and training.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences Semantics and Ontologies

UKON 2016 Short Talks III

These are my notes for the third session of talks at the UK Ontology Network Meeting on 14 April, 2016.

Source https://lh3.googleusercontent.com/-Ql7RFDSgvxQ/AAAAAAAAAAI/AAAAAAAAAFA/pnoDTCze85Q/s120-c/photo.jpg 14 April 2016

Ontology-driven Applications with MEKON and HOBO
Colin Puleston

Currently used in a sepsis prediction system and a clinical trials design / design-retrieval system (early prototypes). MEKON  is based around a generic frames model with a plugin framework which allows the incorporation of ontology-like knowledge sources and associated reasoning mechanisms. HOBO extends MEKON to enable the creation of domain specific object models, bound to appropriately populated frames models, Instantiations of object models operate in tandem with instantiations of bound frames models.

They come with instance store mechanisms, delivered via plugins. Current plugins are based on OWL DL, RDF + SPARQL and the XML database BaseX. They also come with a Model Explorer GUI to allow the model developer to browse, explore dynamic behaviour of specific instantiations, and exercise the instance store (store instances and execute queries).

MEKON and HOBO provide layered architecture (structured means of combining generic knowledge sources and reasoners, and domain specific processing), handle much of the dirty work, and provide client access via appropriate APIs (generic frames model, domain-specific object models).

MEKON enables the creation of a skeleton application without having to write a single line of code.

The Synthetic Biology Open Language
Goksel Misrili, James Alastair McLaughlin, Matthew Pocock, Anil Wipat

Synthetic biology aims to engineer novel and predictable biological systems through existing engineering paradigms. Lots of information is required, including DNA sequence info, regulatory elements, molecular interactions and more. They are currently using SBOL 2.0. Modules allow hierarchical representation of computational and biological systems. A biological example would be a subtilin sensor.

SBOL utilises existing SW resources, e.g. BioPAX, SO, SBO, EDAM, Dublin Core, Provenance Ontology. SBOL is serialized as XML. SBOL allows the unambiguous exchange of synthetic biology designs, and is developed by a large consortium.

ConSBOL: A terminology for biopolymer design
Matthew Pocock, Chris Taylor, James Alastair McLaughlin, Anil Wipat

The presenter was unable to make the workshop today.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences Semantics and Ontologies

UKON 2016: Modelling Dispositions

These are my notes for Alexander Carruth’s talk at the UK Ontology Network Meeting on 14 April, 2016.

Some ULOs, such as BFO, do try to model dispositions. What is a disposition? Fragility, solubility are canonical examples. Dispositions are capacities, tendencies, or causal powers (for example). They are the features in virtue of which things engage in particular causal interactions. Other examples are mass and charge.

Traditionally, the dominant account of dispositions is called the Conditional Analysis (CA). This basically says if S occurs, then O will M. S = stimulus, M = some manifestation. Example: if the vase is struck, it will break. This relationally captures a disposition’s nature as D(s,m). There have been some challenges to the CA method in recent years.

Source http://www.danielborup.com/wp-content/uploads/2013/01/Cracks1.jpg 14 April 2016

There are two ongoing debates about the nature of dispositions. The first is the Tracking Debate (single trackers versus multi trackers). This debate concerns the number and variety of manifestations that can be associated with a single disposition. Within multi tracking, there is quantitative and qualitative multi tracking. Multi trackers: Being ball-shaped has a variety of manifestations (e.g. rolling, making a dent in some clay). Therefore dispositions have multiple manifestations produced by multiple stimuli.

The second debate concerns how dispositions operate: CA assumes a stimulus-based account of how dispositions operate. The Mutual Manifestation view states that dispositions ‘work together’, with no distinction possible between the active disposition and the mere stimulus.

Therefore there are four accounts of disposition:

  • single-track stimulus manifestation (CA) D(s,m)
  • Multi-track stimulus manifestation
  • single-track, mutual manifestation D1(D2, m1)
  • Multi-track, mutual manifestation

How should we react? Monism (choose which of the four accounts to go with); pluralism (greater complexity but could pick and choose); or pragmatism (different responses for different purposes)?

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences Semantics and Ontologies

UKON 2016 Short Talks II

These are my notes for the second morning session of talks at the UK Ontology Network Meeting on 14 April, 2016.

Source https://lh3.googleusercontent.com/-Ql7RFDSgvxQ/AAAAAAAAAAI/AAAAAAAAAFA/pnoDTCze85Q/s120-c/photo.jpg 14 April 2016

Dialogues for finding correspondences between partially disclosed Ontologies
Terry Payne and Valentina Tamma

and-

Dialogue based meaning negotiation
Gabrielle Santos, Valentina Tamma, Terry Payne, Floriana Grasso

Different systems have different ontologies, so you need to align them. Many approaches exist. There are 3 problems: different alignment systems produce different solutions; you may not want to align all of your ontology; part of that ontology might be commercially sensitive and therefore not want to expose or disclose it.

You can use 2 agents to negotiate possible alignments. Agents selectively identify what mappings should be disclosed if the agents are knowledge rich. If not knowledge rich, then the agents need to start exchanging segments of the ontology.

They have started to develop a formal inquiry dialogue that allows two agents to exchange knowledge about known mappings. If you’re looking at many different alignments, you many have many one-to-many mappings. Which mapping should be selected? Could it be resolved through objections within the dialogue? Through the dialogue each agent extends their ontology by including correspondences.

They’ve used a cognitive approach to reaching consensus over possible correspondences. Agents identify possible concepts that may be ontologically equivalent in their respective ontologies. Each then seeks further evidence. One agent asks the other agent if it can align an entity. Start with a lexical match. The second phase will then ask for evidence to support the correspondence, e.g. what are the structural similarities? The CID dialogue has been empirically evaluated using OAEI datasets.

The Cardiovascular Disease Ontolgoy (CVDO)
Mercedes Arguello Casteleiro, Julie Klein, Robert Stevens

CVDKB content: 34 publications, including human and mouse. How can we connect the biological information, e.g. HGNC, UniProt, MGI, ChEBI, miRBase? From this they have 86792 mouse proteins, 172121 proteins from human, and many metabolites and miRNAs. The CVDO reuses some ontologies, such as OBI, and parts of other ontologies including SO, PRO, GO, CL, UBERON, PATO…

There is a CVDO application which allows SPARQL queries over the OWL. They’re looking at including SOLR and elasticsearch to make searching fast, via the conversion of OWL to JSON-LD. Go to http://cvdkb.cs.man.ac.uk to try it out. The idea is to hide the ontological complexity from the end user.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences Semantics and Ontologies

UKON 2016: The Use of Reformation to Repair Faulty Analogical Blends

These are my notes for Alan Bundy’s and Ewen Maclean’s talk at the UK Ontology Network Meeting on 14 April, 2016.

This talk is divided into two parts: Merging Ontologies via Analogical Blending, and Repairing Faulty Ontologies using Reformation.

Can you merge ontologies successfully using analogical blending? It would be quite easy to get things wrong, and therefore they are using the reformation technique to repair any mistakes made in the merging process.

T1 and T2 are the parent theories, and B is the blend between them. Suppose T1 and T2 are two retailer ontologies. T1 has relationships for owning, and part numbers, and product A and product B have the same part number as they are different instances of the same product. In T2, the relationship is sold_to and there are serial numbers rather than part numbers. So, things are similar but not identical. It would be easy to automatically align these concepts incorrectly. When the ontology is merged, the two products are incorrectly given the same serial number (when they only have the same part number). This makes the ontology inconsistent.

How can the reformation technique help you recover? Reformation works from reasoning failures. Here, we’re looking into inconsistencies. Using the proof of inconsistency, reformation tries to break the proof to prevent it from getting to the inconsistency, and therefore creating a suggested repair, in this case rename the two occurrences of the serial number. The resulting new blended ontology has replaced the serial number type with a part number type, and part and serial number are two different types, thus correcting the ontology.

Ontologies can be merged by analogical blending, but some blends can be faulty. Faults can be revealed by reasoning failures. Reformation uses such failures to diagnose and repair faulty ontologies. This work is still in the early stages.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences Semantics and Ontologies

UKON 2016 Short Talks I

These are my notes for the first morning session of talks at the UK Ontology Network Meeting on 14 April, 2016.

Source https://lh3.googleusercontent.com/-Ql7RFDSgvxQ/AAAAAAAAAAI/AAAAAAAAAFA/pnoDTCze85Q/s120-c/photo.jpg 14 April 2016

Integrating literature mining and curation for ontology-driven knowledge discovery
George Demetriou, Warren Read, Noel Ruddock, Martyn Fletcher, Goran Nenadic, Tom Jackson, Robert Stevens, Jerry Winter

It is hard to keep up with the volume and complexity of data throughout its life cycle (search for content, collect it, read and analyse it, convert it into formal representations, integrate knowledge into computational models, use it to produce explanations, predictions or innovations). Therefore they have BioHub, which stores information on feedstocks, chemicals, plants, organisms, chemical transformation, and properties. The task is to extract, organise and integrate knowledge into models of chemical engineering. An example question: “Which chemicals come from which feedstocks?”

Where does the human come in for the curation task, and where the machine? DARPA Big Mechanism: a big project to compare based on text evidence from literature. In the study, they found humans are good for finding interactions and bad for grounding. Manchines were bad for interactions and good for grounding. A hybrid method had the best result.

In the BioHub Curation Pipeline, there are different types of annotation, with human and machine curation.

Integrating Concept Dynamism into Logitudinal Analysis of Electronic Health Records
Chris Smith and Alex Newsham

Policies that determine the data captured in EHRs is subject to change over time for a variety of reasons, including updated clinical practice, improved tests, and the introduction or cessation of PH initiatives. EHRs may capture different clinical concepts or use different representations. Longitudinal analysis of EHRs aims to identify patterns in health and healthcare over time to inform the design of interventions. The analysis predicated on the ability to robustly identify specific clinical concepts.

A set of policies determine recording by clinicians. These policies define a set of quality indicators. Updates are provided every 3 months, and need to be taken into account, and the changes need to be recorded.

Dynamism in presence and representation of clinical concepts in policies needs to be integrated into the longitudinal analysis of EHRs. This will improve accuracy with which patients, interventions and outcomes can be characterised over time.
INSPIRE: An Ontological Approach to Augment Careers Guidance
Mirko Michele Dimartino, Vania Dimitrova, Alexandra Poulovassilis

Build an intelligent tool to inspire career paths. They want to build the tool on top of semantic web technologies. A GUI Tool would interface with the user and with a SPARQL endpoint. Other SPARQL endpoints are attached to RDFS data from LinkedIn and L4All (and others). They are joined up and integrated, and then the user queries the system through federated querying. The integration of the data happens with an ontology-based rewriting for integration.

They have one ontology to describe LinkedIn. The user is asked to create a profile, then can explore the next career step or explore a long-term career goal. The user can select time intervals, and the response is matched against those intervals.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Meetings & Conferences Semantics and Ontologies

UKON 2016: Great North Museum

These are my notes for Dan Gordon’s talk at the UK Ontology Network Meeting on 14 April, 2016.

Dan Gordon is the Keeper of Biology at the Great North Museum. He has about one million objects in his collection, ranging from taxidermy to microscope slides. One problem they face at the museum is that there are 52 million records in the museum, and classification of those objects is very challenging.

The Great North Museum
Source: https://twmuseums.org.uk/images/900/7ELR-4087-original.jpg 14 April 2016

In the Biology Collection, in some ways he starts off better than with the other collections, as there are already many classification systems available (e.g. taxonomic classification). Taxonomy is easier for larger animals, and harder for insects and plants which can be either (or both) small and highly diverse.

There are 40,000 plant specimens in his collection. When new research comes in, rather than re-classifying and moving all of the specimens, he leaves them in their existing system (there are loads of obsolete systems!). One example is a lichen, where you have two completely different organisms living symbiotically (algae and fungus) – here you have two completely different phylogenies, and the position in the system is constantly being revised. Therefore the best way for him to organize things is… Alphabetically!

Another example is dynamos. There are many in the collection, and often relate to different academic discipline, and therefore tend to get organized along those lines. In terms of trying to classify them, their historical use is very important. They store some un-ground lenses for lighthouse lamps, and these are very important for the history of lighthouses and industry in general. There is a system for classifying this kind of industrial object which isn’t univerally used, called the SHICS system, which works a bit like the Dewey decimal system.

There are SHIC numbers for wedding dresses, marriages etc. However they have a dress to store which was worn for a wedding during WWI. Therefore its primary importance is relating to WWI, so in theory you could assign many different SHIC numbers to note its different roles. However, many people tend to choose what they believe is the most important role and assign a number just for that. This tends towards an “incomplete” number of axes of importance.

What about the art collection? There’s Flat Art, 3D Art, etc. In the physical store, everything that’s framed is on racks, and unframed things are in drawers – that’s the primary classification 🙂 . After that, title and artist are important classifications. But what about things like pots, where there are designers, makers, and manufacturers? There are several layers to the cataloguing which are not obvious when you initially get starting classifying.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Papers Standards

BioSharing is Caring: Being FAIR

FAIR: Findable, Accessible, Interoperable, Reusable
Source: Scientific Data via http://www.isa-tools.org/join-the-funfair/ March 16, 2016.

In my work for BioSharing, I get to see a lot of biological data standards. Although you might laugh at the odd dichotomy of multiple standards (rather than One Standard to Rule Them All), there are reasons for it. Some of those reasons are historical, such as a lack of cross-community involvement during inception of standards, and some are technical, such as vastly different requirements in different communities. The FAIR paper, published yesterday by Wilkinson et al. (and by a number of my colleagues at BioSharing) in Scientific Data, helps guide researchers towards the correct standards and databases by clarifying data stewardship and management requirements. If used correctly, a researcher can be assured that as long as a resource is FAIR, it’s fine.

This article describes four foundational principles—Findability, Accessibility, Interoperability, and Reusability—that serve to guide data producers and publishers as they navigate around these obstacles, thereby helping to maximize the added-value gained by contemporary, formal scholarly digital publishing. Importantly, it is our intent that the principles apply not only to ‘data’ in the conventional sense, but also to the algorithms, tools, and workflows that led to that data. All scholarly digital research objects—from data to analytical pipelines—benefit from application of these principles, since all components of the research process must be available to ensure transparency, reproducibility, and reusability.(doi:10.1038/sdata.2016.18)

This isn’t the first time curators, bioinformaticians and other researchers have shouted out the importance of being able to find, understand, copy and use data. But any help in spreading the message is more than welcome.

Standards
Source: https://xkcd.com/927/

Need more help finding the right standard or database for your work? Visit BioSharing!

Further information:

Categories
Meetings & Conferences

Governance Open Discussion: Why and When?

Why is governance important?

These are notes on a discussion on 21 January, 2016, at Open Tools and Infrastructure for Biology 2016 held at Newcastle University.

  • Governance is for a common goal.
  • Provide structure and effective coordination – trying to reduce the “cost” of conflict resolution (conflict reduction). It is important that all know that there is a way things are supposed to work within the community.
  • It increases efficiency and decreases friction. Poor governance results in the opposite effect, and there does need to be iterative development of the governance methodology itself (the governance itself needs to be evaluated and modified).
  • It is an expression of the values within your community (e.g. how the community is represented and structured).
  • An open governance model means that feedback from within the community (3 people in a pub creating an add-on) can be quickly incorporated and made use of. It aligns well with modular design principles.
  • If the solution space is limited, then you don’t actually need much governance. If your goal is focused (and therefore so is the solution space), then a lack of governance might be best.
    • Governance provides a socio-political framework. If you don’t have many social or political requirements, you may not need governance.

As an example, the SBOL governance process has changed over time. One of the useful things it has provided is a framework of expectations and responsibilities of the various academics and companies. SBOL seems to be getting more open over time, and the changes in governance method has likely helped increase this openness.

If a community is too small, it could become closed because of “accidental obscurity”. You don’t want the governance structure to become an excuse for the standard not progressing.

Is it helpful to have a benevolent dictator?

Perhaps a better question than “Do we need governance” is “When do we need governance?”

In projects that are open and have more than one person, how do these people interact – what are the rules of interaction?

  • What does “open” mean, and who gets to choose its definition?
  • With “open”, you have to find the appropriate mixture (for you) of freedom and community consensus.
  • Does openness need to be imposed upon a newcomer? When someone wishes to participate in a group, you let them know what the rules/governance of the community are, and ask if they want to be a part of it. There is a distinction here in that people make the choice to be part of a community, and therefore the openness is not an imposition, as such.

For open science, we need governance. Openness implies some restrictions or rules, even if it’s just to state the (type of) openness itself. Governance doesn’t mean you are mandating behavior (necessarily). For instance, in SBOL you can extend it however you like (the freedom to extend) but you gain additional benefits if you follow the rules (a privilege if you follow the rules rather than using governance to mandate the use of the rules).

Example: Governance states a list of recommended formats. Users create open data files – if they are not in a format from the list, they don’t get the benefits of the other tools in the community which follow the governance policies. If they do use the format, you get immediate benefits wrt reproducibility and community.

What happens when there are multiple governance frameworks? The way you set up your governance may be incompatible for some people who are under a contradictory governance framework.

Do you need metrics to figure out how well your governance model is working as well as how well your community is developing? Or should such metrics be closely aligned / identical?

Who are all the players that have a say in governance? What do we exclude? Governance is ultimately the definition of who is the “in” group and who is the “out” group. The “in” group are the community, defined by their agreement to be governed according to the governance framework. However, a community needs to remain open to external input, otherwise bad decisions may be propagated.

How does governance began and how does it change over time?

  • How do decisions get made? You can organize your project to disperse your decision making to a greater or lesser extent. This results in either greater or lesser exclusivity of such decision making.
  • Various government strategies are dependent upon time and money resources – so some form of governance are only available to certain types of people.
  • How does governance pursue openness? There are very few ways in which communities devoted to openness actually get (monetary, social) credit for such commitment.
  • Different groups of people prefer different methods of communication, and this may have an effect on the governance method.
  • You need to have a singular goal or view, or it doesn’t matter what the governance method is, you will not be able to resolve anything. Deciding what the problem is, is a deeply social issue – without this, you can’t decide what is valuable and what your goal should be.

Please note that this post is merely my notes on the discussion. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be m

Categories
Meetings & Conferences

The OpenPlant Project

Jenny Molloy

This is a presentation given on 20 January, 2016, at the Open Tools and Infrastructure for Biology 2016 held at Newcastle University.

Plants provide proven, global, low-cost technology for gigatonne scale bioproduction. Synthetic biology offers breakout technologies. Plants are useful because they are faster but still simple multicellular systems for engineering form and metabolism. Plant biotechnology is often beset by restrictive IP practices that threaten to constrain innovation. Creating open tools and technologies for plant experimentation will be very useful for future research. For instance, many parts of a typical plant expression vector are under patent. And once you start engineering entire metabolic pathways, the worry is that such patents will be an increasingly problematic issue. In seed patents, a small number of large players hold most of the information.

In plant biotechnology, most applications are IP protected. What would be useful to have is a shared toolset to promote innovation, and might allow smaller players to enter the market. More importantly there would be a social benefit WRT agriculture, drug production and greater openness. OpenPlant would be completely free and available to everyone, but leave some room to allow people to also go through the patent process if they choose. There are currently very few plant “parts” or biobricks available at the moment (compared with microbial parts).

There are a number of work programmes that the Open Plant initiative is trying to build: open technology EZ-MTA / Open MTA, a common syntax for DNA parts, low-cost automated assembly, marchantia (a simple plant chassis), genome-scale DNA engineering, and shared libraries of biological parts and resources. There are also core laboratories in Cambridge and Norwich, and the OpenPlant Fund (funding specifically for small-scale interdisciplinary projects).

Foundational Technology: Chassis, DNA Assembly, Gene expression, Genome engineering, open source software and modelling. Trait engineering projects include photosynthesis, carbohydrate engineering, natural products, nitrogen fixation, and virus-based methods for bioproduction.

Please note that this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know!