Summer 2007 OBI Ontology Workshop, Day 2

A full version of the notes (which means both my notes and Helen Parkinson's, plus any changes made by the group), can be found on the OBI Wiki.

10 July 2007 – OBI Workshop Day 2

MORNING SESSION

The morning session was all about summarizing the work done in the branches since the last workshop.

Relations Branch

Speaking about the relationships we want to introduce. BS states that we should be careful not to add too many relations: the DL version of SnoMed has 108 relations currently. BB says that text definitions for classes and properties are just there to help the humans: restrictions should also be present in OWL, in a manner that matches the text definition as much as possible.

BS: Will OBI, like GO, have Class-level relationships, or will it only have individual-level relationships? AR and others: we would like to have no class-level relationships. GF: In OWL, you can define class-level relations, but it's not a good idea (brings us towards OWL Full). BS: There is a simple OBI statement "PA_Chromium_release has_participant radioactive_chromium" doesn't sound like a relation between instances, but instead between "any old" chromium etc. What it should say it "Every instance of chromium_release_assay has, on the instance level, some instance of radioactive_chromium." that means two relations for everything. CS: You're talking syntactic sugar: saying the same things, but want to have "your" way of saying it be the correct one. AR: The disagreement is that he thinks it's syntactic sugar and BS doesn't. BS actually agrees that it's syntactic sugar, but only if we really know what we're doing: i.e. we need to be VERY sure that, if we decide to only do instance-level relations, that is what we should be sure that we're always doing. General agreement. AR: you can think of these as extra axioms. We should audit all relations to make sure that they're all instance-level reactions. E.g. have axiom on chromium_release_assay that it must always have a reagent, and then make the appropriate notes in the relations used to express that. BS: there is no "problem" in OBO in that you can only use class-level relations. HP made a list of action items for this.

Data Transformation Branch (Presented by Tina Boussard)

A large number of their initial terms went into PA. AR says that some classes should be under linear and non-linear transformation, whereas now they are siblings. There's also the fact that mathematical functions aren't necessarily PAs (parent of Data Transformation). They are functions/methods which might be better elsewhere. E.g. functionX has_role linear transformation. BS: OBI shouldn't be "doing mathematics", as it isn't central to OBI's mission. We should find ===somebody=== who can take care of the central maths that we need. Also, don't use the same noun at the end of a long list of children: shows that should make a parent class of that group. Also, ensure all classes are singular! Also, we need to be very careful about child/parent relationships: a Transformation is NOT a Transformation Method: it is a transformation! LF: Methods aren't PA, they're plans/protocols. Also, she says errors are not PAs, they might be qualities instead. JF says developers have agreed that errors should be measurements.

BB: Peter Lister and Karen Skinner oversee NCBC started out as a resource ontology/cv – just stuff we need to use somewhere. As that comes in, what is being done in data transformation will have to be reconciled for it. BB will send around url to its current state. Daniel Rubin worked on it. Intended to use by certain NIH places as a core resource. Don't forget to constantly check for other people who are working on these child ontologies: the data transformation workshop has invited these people.

DS:  Can we indicate such helper classes / administrative classes in a way we immediately see that they 'live' on another level ? e.g. '_unclassified'….. using the underscore at the beginning….? BB: should be a new enumeration in the curation_status field. AL: we should add it to the Wiki now.
CS: Is normalization a role or a transformation? He thinks it is a transformation. It consists of various types of transformation that are constrained in particular ways, with parameters. There is a particular goal to be achieved. Most people who use normalization think of it as a data transformation. AR: The issue is how to define archsine_transformation from _function and _plotting. Will this be a problem? BS: I think it will be a problem, and we hope we can find someone who's solving it on the mathematical ontology side, as it only should be imported from another ontology elsewhere. AR: say archsine is imported from somewhere else. How do we make archsine transformation and normalization (two different things). BS: The classification of applications is something OBI could reasonably do. AR: This would look like "normalization" and "plotting", and then archsine_normalization would be a defined term. But then, where do normalization and plotting live? Processes? HP say yes, processes. BS: Are you saying normalization is a mathematical abstract, but for OBI's purposes it isn't that abstract thing, it's the process of getting the new data. AR: If you make them as defined classes that, when classified, would end up here. However, they shouldn't be explicitly asserted here, as then we would end up with multiple inheritance. AR will put all of the metadata annotation properties into its own OWL file. AL: we need an action item to add a couple of new phrases to the enumerated list for curation_status.

Instrument Branch (Presented by Ally Lister)

Notes from workshop: what are the boundaries on this term? There are instruments that fit this definition? What's a good use-case, or can we just toss it as it equals the instrument definition? BS: the granularity issue: the OBI domain encompasses several granularities, and we may deal with multiple granularities in the same annotation project, so there will be problems. We know the appropriate granularity for dealing with instruments. There are some things that are truly object aggregates. There are objects, parts of objects, and object aggregates in everything we do. It sounds odd but should be that way. Most people using obi will not be using the whole thing. AR: there should be a distinction between what you can buy as a single item, as opposed to something that takes up multiple rooms and has to be put up specially. CS: we don't call the latter a platform, but instead a laboratory, and therefore wouldn't be needed. We need two types of platform: one that you're not going to take apart, and another that allows you to adjust the parts. however, some things you build yourself is only because it isn't a mature technology yet – it would become a platform when it's mature. AR: should only be called a platform when it's mature. BB: some platforms are only software. PA: the difference between platform and group of instruments is that a platform is a group that has been put together for a specific purpose. Plate reader is an instrument that is used in the context of many different platforms. Included in the definition is some link to either Plan or PA.

General consensus that we should promote everything and remove artefact_object. Also, remove device and move up instrument and labware.

BS: Having many children is fine, just make sure you have no redundancy. Then after they're in, see if there's any way to bundle them. AR has seen people put defined classes into owl:THing but this isn't very good. You could make "bogus" classes under instrument. Not good answers – don't really want ugly pseudo classes. AR: someone will
find it useful to have aliquid handler term. RS: The instrument's function may change, but its structure might not, so we definitely shouldn't use their function.

Role and Digital Entities Branches  (Presented by Jen Fostel)

AFTERNOON SESSION

Ontology of Clinical Investigations (OCI)  (Presented by Jen Fostel)

Covers clinical trial and clinical research. Other groups in this area include CDISC, RCT, IFOMIS, Epoch (the only one that calls itself an ontology), BRIDG, and HL7 (an ANSI standard). The scope of CIO would encompass: the legal terms and minimal information (CONSORT) for clinical trials, clinical research, and administrative terms. Would like to align terms from other efforts with OBI. In OCI, all of the subjects are human. Other groups are doing non-human (e.g pre-clinical or non-clinical efforts). These efforts should be considered – we want them all to use "subject", for example, and mean the same thing. The ontology for clinical trials is in production by the UK cancergrid project. Their aim is to develop a useable ontology by 4Q07 that can be deposited in OBO. JF applauds the effort and hopes that they can all work together. As good as all these efforts all, there is definitely a feeling of "sporting competitiveness" too.

OCI will focus for now on translational research, e.g. clinical research as opposed to trials. The collected terms are from the CDISC glossary (35-page pdf file), STDM (standard tabulated data model: how you would share your files with the FDA), UTSW, MUSC, and RCT. They've organized them all and removed duplicates, and loosely categorized them within the OBI hierarchy. They're now in the process of refining definitions, and have shared terms with the roles and digital entities branches.

OCI would be part of OBI, but there would be a document which contains all of OCI for the benefit of that community. This matches as a 4th-tier owl file as discussed yesterday, or does it? Would OCI be a subset of OBI, or a superset of OBI? CC will show us their OWL file during the next talk. Plan a workshop next year to bring together all efforts for discussions. They have a google group, OCInv@googlegroups.com. CONSORT is already in MIBBI.

===Discussion of the Current OCI File (Presented by Christian Cocos)===
Should OCI be developed in the OBI namespace, or should it be developed separately? We can see from the OWL file, there is overlap with the branches currently in development. The idea is to eventually move *everything* to OBI. No end-user of OCI will even see the namespace, and will just be working with a UI. The OCI WG should be a working community in OBI, and there should not be two independent efforts. But, in the end, should OBI and OCI "appear" to be separate entities, e.g in papers?

Plan Branch (Presented by Phillippe Aldebert)

Protocols, Algorithms, and Study Design. However, Protocols were left out of their work due to the overlap with PA. We will need to come back to this later on, though.
What to do, in general, about adding terms quickly? Many of the terms are suffixed with "parent" words, like _design and _study. BS doesn't like this sort of naming, however some of these suffixes are very important. What should be done? Well, just ensure that the suffix clarifies and is not redundant. JF: If you have a design that included something that was part of the trial with the role of "placebo", then you don't need "placebo_design" as a term, as this could be inferred. You don't need to explicitly say it twice. AL: this is the same problem as we face with the Instrument branch and terms like "liquid_handler". BS: In the definitions, sometimes you use the word "trial" and somewhere you use "study", and this needs to be cleaned up. Offspring study and Parent study are the same study with two different subjects. Therefore instead, what there should be is a good classification of the subject time, and then just link a study to a particular subject type.

BS: Identify the 7 or 9 or 3 essential features that every study should have, e.g. subjects. Pick one that is central, and then assert a single-inheritance hierarchy on that basis. All other features should be put into their own single-inheritance hierarchy. Then use the reasoner to generate all of the appropriate associations and multiple inheritance on the fly. We may end up with the bottomless pit problem, though. We have to find a way of making it clear it isn't a bottomless pit. It should be clear that there is a principled way for finding places for these terms. JF: different people structure these things with different "primary" classifications. AR: use "faceted" browsers. Classic example is travel destination, where you may want to browse by either sport, or location, or family friendliness. Each of these facets are different relations whose target is these other single hierarchies BS mentioned. AL: Where do we start the defined classes, and where do we end the "standard" classes? AR: Should avoid "hardness" as long as possible. Could have no asserted isA until the last step, and the infer all isA's, and see how it plays out, and *then* choose the primary asserted hierarchy. A lot of this work will be integral to the Function Branch work, which BB will cover shortly.

PA: PATO will deal with biological qualities, but not non-biological ones like randomized or control, or qualities of instrumentation like "switched on". Such terms should go in OBI, at least for now.

Biomaterial Branch (Presented by Susanna Sansone)

72/315 terms were actually relevant to the biomaterial branch. There is a dispatcher sheet that is now on google docs. Have started refining definitions, adding examples, and making terms compliant to naming conventions (the latter is still to be done). Most of the information is in an excel spreadsheet. They don't know what to do about "quasi" material terms like lot number and serial number. What to do if a term is present in more than one external resource? How do we point at multiple sources of terms? Where should we put in genetic information like allele, diplotype. They are also thinking that they should probably also extend biomaterial to other types of material.

First division was between experimental and natural biomaterials, but after a little time it was clear this wasn't the right line to draw: for example, where to put transgenic organism? Also, just binned things like allele and haplotype into a single class, even though they don't really know where to put them. Many of the genotype specification are already in PATO. And currently, things like dominant and recessive are not yet in PATO. And, even though we can get PATO into OBI really easily, ontologies like SO are harder to fit in, as their structure doesn't mesh with OBI yet, so we can't just import them directly. AR: if we can get it in promptly, we do it. If we can't, then we put a specialist ontology term directly within OBI. Also, for links out to other ontologies, we don't use an OBI term ID but the ID from the other ontology. Note that this does NOT mean that we import the entirety of the hierarchy above or below that particular external term, just that (for now) we need to represent that particular, single, term.

AR: Population and cohort's current placement is not their final location. Diplotype is a quality of a sequence.
AR: If you have a defined class for whole_mount, you can say whole_mount organism is exactly those organisms that are the output of PA_XYZ. BS: A whole_mount organism is an organism playing the role of whole_mount. CS: But once it goes through PA XYZ, it is an entirely new entity. This entity can take on other roles, such as garbage, but it is a new entity. BS: It is simple to do what I propose, as you just have to add a phrase about the role to the defined class definition (necessary
& sufficient statement). BB: What we're trying to do is figure out where to put in that a biomaterial has been experimentally manipulated. BS: We need to have this role as, for example, you need the role to properly classify patients. You can't just say patients are people who have registered, as that would fit with people who once registered but are no longer true patients (e.g. they went home). RS: whole_mount is irreversible – once you are a whole_mount, you can't go back. For this reason, you should not use roles, as roles are for states that can be easily changed to one thing and then back again and then on to another thing. BS: You can have an organ section that does not play the role of the specimen (e.g. your dinner of part of a liver), therefore specimen is a role. As such, you need to define whole_mount with such role restrictions. CS: whole_mount cannot be a role, but in its definition I'm ok with it indicating that it plays the role / always plays the role of specimen. BB: to reinforce this, you could rename ExperimentalBiomaterial to BiomaterialSpecimen. JF: thinks biomaterial is actually a role, e.g. a fly before it is in an experiment is not a biomaterial. AL: A fly is always a biomaterial, irrespective of whether or not it is in an experiment.

Function Branch (Presented by Bill Bug)

Meant to provide a BFO-based definition of function for investigational artefacts, including instruments, reagents, and assays. They analyzed the BFO definition of the related realizable dependent continuants function, disposition and role, and defined how they are going to work with closely-related branches like role. They have a few examples/use-cases for function on the wiki page. They use as a primary example how to create the function of an HPLC system (high pressure liquid chromatography). He then used a well-written slide to show what minumum relations would be required to get the function correct for HPLC. See his slides on the OBI Wiki for more information.

BS: How are you distiguishing functions and roles from this particular example slide? BB: I don't think there is a clear distinction – it's not clear that function and role are distinct. BS: This is a BFO responsibility. BB: Well, the separation process is distinct. BS: whenever you have function, you should describe the process that is the functioning. A crucial feature of function is that you can have a function without realizing it (this statement applies equally to role). One test of a function is to see if it is possible to still be itself when it isn't functioning. When we say an algorithm has a function, we're not using it in the BFO sense. Functions have to involve realizations as a possibility, which means the thing that has the function has to be such that it can engage in causal processes. BB: I don't think that will work. BS: A laptop has a function, a heart has a function. The pumping of your heart is the exercise of the functioning of your heart. You cannot realize the pumping because the pumping is itself a realization. I think there are assays, but assays themselves do not have functions in the BFO sense. (Assessment is also a problem for him). Generally speaking, occurrents do not have functions. (And then BS had to leave the workshop.)

CS: I think of algorithm as a plan. AR: But you can't have a plan that has a function as they are both realizable entities. But still, perhaps algorithm is a plan, but you couldn't give it a function.

BB's slides include various cardinal parts of instrument, which CS describes as not necessary for the "molecular separation function", just required for the HPLC instruments.

BB: Should look at CC's proposal to use Systems Theory to provide a framework for defining function. It would be a relatively superficial application of ST.

CC says that in order to specify functions you don't need context, but for roles you do need context. BB: Functions imply some primacy, so perhaps each thing only has a single function?

CC: drafted by BS last year to write the "RO 2" paper. Some paradigm examples include:

  • kidney UNDERGOES excretion process. There is disagreement on this term, as a kidney does not undergo that process. Blood undergoes filtration, and urine undergoes creation process. In the sense CC has written it is "participates", or has_function.
  • excretion process HAS_PARTICIPANT nephron
  • excretion function IMPLEMENTED_BY kidney/kidney IMPLEMENTS excretion function.
  • kidney HAS_OUTPUT urine / urine OUTPUT_OF kidney. RS: We've defined processes as having inputs and outputs, and CC has a continuant with inputs and outputs. Urine is the output of a process, in the same way as chocolate bar is the output of the chocolate production process, whereas others would say it is the output of the factory.

BB: we want your help implementing functions and relations in OWL. RS: There is confusion in this example, as many organs don't have outputs. CC: Actually, I think all organs have outputs.

BB: As people in other branches hit functions (especially the Instrument branch) please go to the function wiki page and add the example to that page.

Protocol Application Branch (Presented by Alan Ruttenberg)

The group started out trying to figure out what relations should be used. HP: the clinical_diagnosis definition should not have "determination", but instead "assessment", as you may not always get your diagnosis right. RS: Doesn't think tumor grading is an assay, as there is an interpretive step that's not being captured. AR: The output is not material, but instead is information. RS: In tumor grading, the input and output are both data. Perhaps should be moved to Data transformation.

CS: Could delineate material combinations based on whether or not it is a pooling of samples. RS: You should structure according to pooling, partitioning, and transformation. BB: Perhaps shouldn't use transformation in material_transformation – should use a word that more precisely meets the definition.

Read and post comments |
Send to a friend

original

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s