Today it's not worth it to show just my notes and also direct you to the wiki, as the wiki notes are far more complete. So please just visit the OBI Wiki to find out all about the work we did today.
A full version of the notes (which means both my notes and Helen Parkinson's, plus any changes made by the group), can be found on the OBI Wiki.
10 July 2007 – OBI Workshop Day 2
The morning session was all about summarizing the work done in the branches since the last workshop.
Speaking about the relationships we want to introduce. BS states that we should be careful not to add too many relations: the DL version of SnoMed has 108 relations currently. BB says that text definitions for classes and properties are just there to help the humans: restrictions should also be present in OWL, in a manner that matches the text definition as much as possible.
BS: Will OBI, like GO, have Class-level relationships, or will it only have individual-level relationships? AR and others: we would like to have no class-level relationships. GF: In OWL, you can define class-level relations, but it's not a good idea (brings us towards OWL Full). BS: There is a simple OBI statement "PA_Chromium_release has_participant radioactive_chromium" doesn't sound like a relation between instances, but instead between "any old" chromium etc. What it should say it "Every instance of chromium_release_assay has, on the instance level, some instance of radioactive_chromium." that means two relations for everything. CS: You're talking syntactic sugar: saying the same things, but want to have "your" way of saying it be the correct one. AR: The disagreement is that he thinks it's syntactic sugar and BS doesn't. BS actually agrees that it's syntactic sugar, but only if we really know what we're doing: i.e. we need to be VERY sure that, if we decide to only do instance-level relations, that is what we should be sure that we're always doing. General agreement. AR: you can think of these as extra axioms. We should audit all relations to make sure that they're all instance-level reactions. E.g. have axiom on chromium_release_assay that it must always have a reagent, and then make the appropriate notes in the relations used to express that. BS: there is no "problem" in OBO in that you can only use class-level relations. HP made a list of action items for this.
Data Transformation Branch (Presented by Tina Boussard)
A large number of their initial terms went into PA. AR says that some classes should be under linear and non-linear transformation, whereas now they are siblings. There's also the fact that mathematical functions aren't necessarily PAs (parent of Data Transformation). They are functions/methods which might be better elsewhere. E.g. functionX has_role linear transformation. BS: OBI shouldn't be "doing mathematics", as it isn't central to OBI's mission. We should find ===somebody=== who can take care of the central maths that we need. Also, don't use the same noun at the end of a long list of children: shows that should make a parent class of that group. Also, ensure all classes are singular! Also, we need to be very careful about child/parent relationships: a Transformation is NOT a Transformation Method: it is a transformation! LF: Methods aren't PA, they're plans/protocols. Also, she says errors are not PAs, they might be qualities instead. JF says developers have agreed that errors should be measurements.
BB: Peter Lister and Karen Skinner oversee NCBC started out as a resource ontology/cv – just stuff we need to use somewhere. As that comes in, what is being done in data transformation will have to be reconciled for it. BB will send around url to its current state. Daniel Rubin worked on it. Intended to use by certain NIH places as a core resource. Don't forget to constantly check for other people who are working on these child ontologies: the data transformation workshop has invited these people.
DS: Can we indicate such helper classes / administrative classes in a way we immediately see that they 'live' on another level ? e.g. '_unclassified'….. using the underscore at the beginning….? BB: should be a new enumeration in the curation_status field. AL: we should add it to the Wiki now.
CS: Is normalization a role or a transformation? He thinks it is a transformation. It consists of various types of transformation that are constrained in particular ways, with parameters. There is a particular goal to be achieved. Most people who use normalization think of it as a data transformation. AR: The issue is how to define archsine_transformation from _function and _plotting. Will this be a problem? BS: I think it will be a problem, and we hope we can find someone who's solving it on the mathematical ontology side, as it only should be imported from another ontology elsewhere. AR: say archsine is imported from somewhere else. How do we make archsine transformation and normalization (two different things). BS: The classification of applications is something OBI could reasonably do. AR: This would look like "normalization" and "plotting", and then archsine_normalization would be a defined term. But then, where do normalization and plotting live? Processes? HP say yes, processes. BS: Are you saying normalization is a mathematical abstract, but for OBI's purposes it isn't that abstract thing, it's the process of getting the new data. AR: If you make them as defined classes that, when classified, would end up here. However, they shouldn't be explicitly asserted here, as then we would end up with multiple inheritance. AR will put all of the metadata annotation properties into its own OWL file. AL: we need an action item to add a couple of new phrases to the enumerated list for curation_status.
Instrument Branch (Presented by Ally Lister)
Notes from workshop: what are the boundaries on this term? There are instruments that fit this definition? What's a good use-case, or can we just toss it as it equals the instrument definition? BS: the granularity issue: the OBI domain encompasses several granularities, and we may deal with multiple granularities in the same annotation project, so there will be problems. We know the appropriate granularity for dealing with instruments. There are some things that are truly object aggregates. There are objects, parts of objects, and object aggregates in everything we do. It sounds odd but should be that way. Most people using obi will not be using the whole thing. AR: there should be a distinction between what you can buy as a single item, as opposed to something that takes up multiple rooms and has to be put up specially. CS: we don't call the latter a platform, but instead a laboratory, and therefore wouldn't be needed. We need two types of platform: one that you're not going to take apart, and another that allows you to adjust the parts. however, some things you build yourself is only because it isn't a mature technology yet – it would become a platform when it's mature. AR: should only be called a platform when it's mature. BB: some platforms are only software. PA: the difference between platform and group of instruments is that a platform is a group that has been put together for a specific purpose. Plate reader is an instrument that is used in the context of many different platforms. Included in the definition is some link to either Plan or PA.
General consensus that we should promote everything and remove artefact_object. Also, remove device and move up instrument and labware.
BS: Having many children is fine, just make sure you have no redundancy. Then after they're in, see if there's any way to bundle them. AR has seen people put defined classes into owl:THing but this isn't very good. You could make "bogus" classes under instrument. Not good answers – don't really want ugly pseudo classes. AR: someone will
find it useful to have aliquid handler term. RS: The instrument's function may change, but its structure might not, so we definitely shouldn't use their function.
Role and Digital Entities Branches (Presented by Jen Fostel)
Ontology of Clinical Investigations (OCI) (Presented by Jen Fostel)
Covers clinical trial and clinical research. Other groups in this area include CDISC, RCT, IFOMIS, Epoch (the only one that calls itself an ontology), BRIDG, and HL7 (an ANSI standard). The scope of CIO would encompass: the legal terms and minimal information (CONSORT) for clinical trials, clinical research, and administrative terms. Would like to align terms from other efforts with OBI. In OCI, all of the subjects are human. Other groups are doing non-human (e.g pre-clinical or non-clinical efforts). These efforts should be considered – we want them all to use "subject", for example, and mean the same thing. The ontology for clinical trials is in production by the UK cancergrid project. Their aim is to develop a useable ontology by 4Q07 that can be deposited in OBO. JF applauds the effort and hopes that they can all work together. As good as all these efforts all, there is definitely a feeling of "sporting competitiveness" too.
OCI will focus for now on translational research, e.g. clinical research as opposed to trials. The collected terms are from the CDISC glossary (35-page pdf file), STDM (standard tabulated data model: how you would share your files with the FDA), UTSW, MUSC, and RCT. They've organized them all and removed duplicates, and loosely categorized them within the OBI hierarchy. They're now in the process of refining definitions, and have shared terms with the roles and digital entities branches.
OCI would be part of OBI, but there would be a document which contains all of OCI for the benefit of that community. This matches as a 4th-tier owl file as discussed yesterday, or does it? Would OCI be a subset of OBI, or a superset of OBI? CC will show us their OWL file during the next talk. Plan a workshop next year to bring together all efforts for discussions. They have a google group, OCInv@googlegroups.com. CONSORT is already in MIBBI.
===Discussion of the Current OCI File (Presented by Christian Cocos)===
Should OCI be developed in the OBI namespace, or should it be developed separately? We can see from the OWL file, there is overlap with the branches currently in development. The idea is to eventually move *everything* to OBI. No end-user of OCI will even see the namespace, and will just be working with a UI. The OCI WG should be a working community in OBI, and there should not be two independent efforts. But, in the end, should OBI and OCI "appear" to be separate entities, e.g in papers?
Plan Branch (Presented by Phillippe Aldebert)
Protocols, Algorithms, and Study Design. However, Protocols were left out of their work due to the overlap with PA. We will need to come back to this later on, though.
What to do, in general, about adding terms quickly? Many of the terms are suffixed with "parent" words, like _design and _study. BS doesn't like this sort of naming, however some of these suffixes are very important. What should be done? Well, just ensure that the suffix clarifies and is not redundant. JF: If you have a design that included something that was part of the trial with the role of "placebo", then you don't need "placebo_design" as a term, as this could be inferred. You don't need to explicitly say it twice. AL: this is the same problem as we face with the Instrument branch and terms like "liquid_handler". BS: In the definitions, sometimes you use the word "trial" and somewhere you use "study", and this needs to be cleaned up. Offspring study and Parent study are the same study with two different subjects. Therefore instead, what there should be is a good classification of the subject time, and then just link a study to a particular subject type.
BS: Identify the 7 or 9 or 3 essential features that every study should have, e.g. subjects. Pick one that is central, and then assert a single-inheritance hierarchy on that basis. All other features should be put into their own single-inheritance hierarchy. Then use the reasoner to generate all of the appropriate associations and multiple inheritance on the fly. We may end up with the bottomless pit problem, though. We have to find a way of making it clear it isn't a bottomless pit. It should be clear that there is a principled way for finding places for these terms. JF: different people structure these things with different "primary" classifications. AR: use "faceted" browsers. Classic example is travel destination, where you may want to browse by either sport, or location, or family friendliness. Each of these facets are different relations whose target is these other single hierarchies BS mentioned. AL: Where do we start the defined classes, and where do we end the "standard" classes? AR: Should avoid "hardness" as long as possible. Could have no asserted isA until the last step, and the infer all isA's, and see how it plays out, and *then* choose the primary asserted hierarchy. A lot of this work will be integral to the Function Branch work, which BB will cover shortly.
PA: PATO will deal with biological qualities, but not non-biological ones like randomized or control, or qualities of instrumentation like "switched on". Such terms should go in OBI, at least for now.
Biomaterial Branch (Presented by Susanna Sansone)
72/315 terms were actually relevant to the biomaterial branch. There is a dispatcher sheet that is now on google docs. Have started refining definitions, adding examples, and making terms compliant to naming conventions (the latter is still to be done). Most of the information is in an excel spreadsheet. They don't know what to do about "quasi" material terms like lot number and serial number. What to do if a term is present in more than one external resource? How do we point at multiple sources of terms? Where should we put in genetic information like allele, diplotype. They are also thinking that they should probably also extend biomaterial to other types of material.
First division was between experimental and natural biomaterials, but after a little time it was clear this wasn't the right line to draw: for example, where to put transgenic organism? Also, just binned things like allele and haplotype into a single class, even though they don't really know where to put them. Many of the genotype specification are already in PATO. And currently, things like dominant and recessive are not yet in PATO. And, even though we can get PATO into OBI really easily, ontologies like SO are harder to fit in, as their structure doesn't mesh with OBI yet, so we can't just import them directly. AR: if we can get it in promptly, we do it. If we can't, then we put a specialist ontology term directly within OBI. Also, for links out to other ontologies, we don't use an OBI term ID but the ID from the other ontology. Note that this does NOT mean that we import the entirety of the hierarchy above or below that particular external term, just that (for now) we need to represent that particular, single, term.
AR: Population and cohort's current placement is not their final location. Diplotype is a quality of a sequence.
AR: If you have a defined class for whole_mount, you can say whole_mount organism is exactly those organisms that are the output of PA_XYZ. BS: A whole_mount organism is an organism playing the role of whole_mount. CS: But once it goes through PA XYZ, it is an entirely new entity. This entity can take on other roles, such as garbage, but it is a new entity. BS: It is simple to do what I propose, as you just have to add a phrase about the role to the defined class definition (necessary
& sufficient statement). BB: What we're trying to do is figure out where to put in that a biomaterial has been experimentally manipulated. BS: We need to have this role as, for example, you need the role to properly classify patients. You can't just say patients are people who have registered, as that would fit with people who once registered but are no longer true patients (e.g. they went home). RS: whole_mount is irreversible – once you are a whole_mount, you can't go back. For this reason, you should not use roles, as roles are for states that can be easily changed to one thing and then back again and then on to another thing. BS: You can have an organ section that does not play the role of the specimen (e.g. your dinner of part of a liver), therefore specimen is a role. As such, you need to define whole_mount with such role restrictions. CS: whole_mount cannot be a role, but in its definition I'm ok with it indicating that it plays the role / always plays the role of specimen. BB: to reinforce this, you could rename ExperimentalBiomaterial to BiomaterialSpecimen. JF: thinks biomaterial is actually a role, e.g. a fly before it is in an experiment is not a biomaterial. AL: A fly is always a biomaterial, irrespective of whether or not it is in an experiment.
Function Branch (Presented by Bill Bug)
Meant to provide a BFO-based definition of function for investigational artefacts, including instruments, reagents, and assays. They analyzed the BFO definition of the related realizable dependent continuants function, disposition and role, and defined how they are going to work with closely-related branches like role. They have a few examples/use-cases for function on the wiki page. They use as a primary example how to create the function of an HPLC system (high pressure liquid chromatography). He then used a well-written slide to show what minumum relations would be required to get the function correct for HPLC. See his slides on the OBI Wiki for more information.
BS: How are you distiguishing functions and roles from this particular example slide? BB: I don't think there is a clear distinction – it's not clear that function and role are distinct. BS: This is a BFO responsibility. BB: Well, the separation process is distinct. BS: whenever you have function, you should describe the process that is the functioning. A crucial feature of function is that you can have a function without realizing it (this statement applies equally to role). One test of a function is to see if it is possible to still be itself when it isn't functioning. When we say an algorithm has a function, we're not using it in the BFO sense. Functions have to involve realizations as a possibility, which means the thing that has the function has to be such that it can engage in causal processes. BB: I don't think that will work. BS: A laptop has a function, a heart has a function. The pumping of your heart is the exercise of the functioning of your heart. You cannot realize the pumping because the pumping is itself a realization. I think there are assays, but assays themselves do not have functions in the BFO sense. (Assessment is also a problem for him). Generally speaking, occurrents do not have functions. (And then BS had to leave the workshop.)
CS: I think of algorithm as a plan. AR: But you can't have a plan that has a function as they are both realizable entities. But still, perhaps algorithm is a plan, but you couldn't give it a function.
BB's slides include various cardinal parts of instrument, which CS describes as not necessary for the "molecular separation function", just required for the HPLC instruments.
BB: Should look at CC's proposal to use Systems Theory to provide a framework for defining function. It would be a relatively superficial application of ST.
CC says that in order to specify functions you don't need context, but for roles you do need context. BB: Functions imply some primacy, so perhaps each thing only has a single function?
CC: drafted by BS last year to write the "RO 2" paper. Some paradigm examples include:
- kidney UNDERGOES excretion process. There is disagreement on this term, as a kidney does not undergo that process. Blood undergoes filtration, and urine undergoes creation process. In the sense CC has written it is "participates", or has_function.
- excretion process HAS_PARTICIPANT nephron
- excretion function IMPLEMENTED_BY kidney/kidney IMPLEMENTS excretion function.
- kidney HAS_OUTPUT urine / urine OUTPUT_OF kidney. RS: We've defined processes as having inputs and outputs, and CC has a continuant with inputs and outputs. Urine is the output of a process, in the same way as chocolate bar is the output of the chocolate production process, whereas others would say it is the output of the factory.
BB: we want your help implementing functions and relations in OWL. RS: There is confusion in this example, as many organs don't have outputs. CC: Actually, I think all organs have outputs.
BB: As people in other branches hit functions (especially the Instrument branch) please go to the function wiki page and add the example to that page.
Protocol Application Branch (Presented by Alan Ruttenberg)
The group started out trying to figure out what relations should be used. HP: the clinical_diagnosis definition should not have "determination", but instead "assessment", as you may not always get your diagnosis right. RS: Doesn't think tumor grading is an assay, as there is an interpretive step that's not being captured. AR: The output is not material, but instead is information. RS: In tumor grading, the input and output are both data. Perhaps should be moved to Data transformation.
CS: Could delineate material combinations based on whether or not it is a pooling of samples. RS: You should structure according to pooling, partitioning, and transformation. BB: Perhaps shouldn't use transformation in material_transformation – should use a word that more precisely meets the definition.
Today was the first day of the OBI workshop. Here are my personal notes on the day. The official notes can be found on the OBI Wiki.
First talk was by Bill, and described OBI itself.
The main questions raised by CS in this discussion section were: Who's missing from OBI that should be involved? Any criteria to decide who to target? What incentives should we be trying to provide to join us?
The audience wondered what OBI means by "genomics" community, as it's a very broad topic. Further, many of the communities described overlap. CS replied with the following examples: the eventual replacement MGED Ontology and BIRNLex with OBI, and the RADLex project for the MRI community, e.g. Daniel Rubin.
It is difficult to get money for funding this work, as grant people won't generally give money for ontology curators. AR mentioned that money should be provided to develop the *skill* of ontology creation and curation. He wants to establish a teaching program.
Someone in the audience later made the statement that they (or a subset of users) might want to only use a minimal set of the ontology. BB mentions the MIA* efforts, and using them in the context of the ontologies. Also, members of the audience suggested that OBI could be used in a number of efforts, including the CaBIG (Cancer Bioinformatics) community, and the NCI Thesaurus. AR also says you could try to invest part of people's times, rather than getting specific funding for the entirety of a FTE (full-time equivalent). Another topic brought up was the Clinical Trials community – what can we show management? Does OBI have any good examples for them? This was brought up again later in the day, when the OBI developers thought of a number of good OBI use-cases (see below).
Next Talk: Chris and the "ecosystem" of biomedical standards.
Next Talk: Susanna and MIBBI.
The main questions raised by CS in this discussion section were: How should efforts such as OBI be funded? Encourage communities to make it a budget item? Put it in an OBI-focused resource grant? development, infrastructure, and training are three separate funding areas. Role of the NCBO? Currently serves as advisors, and provides tools and methodologies, not support for building.
Audience asked what is the real point of OBI? How to use it? Plenty of examples, like science commons and neuro commons, journals (e.g. tagging articles or sections of articles), alan's work, CISBAN DPI, ArrayExpress to GEO mapping will be a lot easier with the core of OBI developed. Suggestion of AHA, etc. for funding, as long as we can give good examples of the usefulness of OBI to these communities.
How will the world be different when OBI is complete? Provides method for data exchange and for correct analysis and searching over a large corpus of investigations. People will use MIBBI to discover if there is already a minumum information checklist. If there isn't anything there, they have to make their own MIA*. But how will they know how to do this? Look at MIBBI and get started: this sort of thing needs to be written up. There is a need for guidelines on how to do this sort of work. Publication costs money, but if you treat putting data into FuGE format as a publication of your data in electronic format, it could be a useful way of adding such work into the grant proposals.
AR: Changing incentives requires pushing from either journals or funding agencies to say this must be done. Secondly, a workforce that is able to do this sort of encoding does not yet exist. OBI is the start of this training. OBI promises (with a common language for describing results etc) a situation where integration and searching of genomic-scale datasets will be quicker than before. The interest isn't in the individual investigators, but in the people who fund the investigators, knowing they will get more for their money by using OBI.
Do we want to model raw data or "final research data"? Makes a big difference to the cost of using something like OBI. Everything should be included in the long term, including LIMS.
Reiteration of importance of use-cases (how to use OBI) from the point of view of the people who would use OBI. Inevitably, the response to "Our institute should use OBI" is, "What is the benefit to us"?
Formalizing how the advisors get credit for OBI. Have it "offline" as a little subgroup and present the results.
Other discussion topics for this week: SOP and reasoning, svn and branching "clinic" (Alan), how to organize OBI when mature, to make it easier for users to use it.
GOING THROUGH PREVIOUS MILESTONES:
Then we went through the milestones that we had created at the last workshop. Most milestones have been completed, but a few are not ready yet. The April 1st milestone of community submission of terms is complete (even though individual branch editors are adding terms during the development process as they see fit, which is important). An April 14th deadline was to review preliminary community OBI versions, but this is dependent upon the submission of terms being completed, which in this case will probably happen with the first release of the OBI core.
Then we talked about our policy in terms of multiple inheritance. Alan said one possibility would be to make a defined (necessary & sufficient) class that is not necessarily in the "real" hierarchy. Inference would place it in the right location. Example is Diploid Cell, which could go into multiple locations in the single hierarchy. Further, it may go into an external ontology (diploid could be a quality, which equals PATO). This will be discussed in its own session later in the week.
May 1st had a couple of milestones. The first is to present the proposal for environmental/medical/other history. Jen reported. Barry mentions Geo.obo (thought up by Michael Ashburner), which is an obo foundry. There is also EnvO (Environment Ontology), which Dawn Field, for instance, plans to use within the GSC framework. They are both OBO Foundry, and are primarily devoted to children of the BFO class Site. If they are both OBO, then how will we keep them orthogonal? Geo is for annotating *real* geographic locations (already begun, large-scale things like "Poland"), and EnvO (planned with funding, but not started) for terms like habitat and oral cavity (small-scale things like those kinds of entities where organisms live). Geo has a workshop at the end of August. Michael's Geo sort of popped up suddenly. A lot of Jen's terms have been subsumed into the EnvO. Laboratory or clinical artefact may go back to OBI. However, ultimately, those laboratory terms are still environments. We may develop them initially, but then submit them to the EnvO. Further discussion of this has been added to the agenda for this week.
The second May 1 milestone is the proposal for process – how to link to ontologies / terms / free text entries apart from canonical OBI links. The main point here is that we should/must reuse other ontologies where available. Will probably have a breakout session about this. Perhaps it is best a task to give to the Relations Branch.
The June 1 milestone of review of placement of community terms will be covered with the branch updates given tomorrow.
July 1 was the finalizing of terms into branches, which hasn't quite been reached as we are still working onthe branches – it took a while to get subversion sorted. The July 9th milestone of re-merging branches will no longer be necessary as we'll be keeping the branches for a while.
Another 9 July milestone was to have the deprecation policy finalized. Alan had a proposal about where to put deprecated terms – into a separate import file – so that "norma
lly" you wouldn't see them, but could import them if you want to see them. Will talk about the deprecation policy this week too.
This led into a longer discussion of versioning, history, and deprecation. Versioning is a lot more complex than deprecation, but Alan argues that you can't have deprecation without history. GO has a versioning policy. Should be documenting ANY change – spelling, add annotation etc. Both what and why the change was made. Barry suggests that each time any change has done, you should create a new ID. Alan says that this imposes a larger burden on the user. Bill and Ally agree that only semantic changes should make ID changes – syntatic changes shouldn't. Alan points out what happens if you have a closure axiom over a group of terms, and then you need to add a term, or remove the closure axiom. Is that a semantic change? Alan suggested that we not worry about it until we have a stable core. Perhaps a subgroup should set up a proposed policy and send it around. Bill suggests an intermediate milestone of 3 weeks where *everyone* would submit any use-cases / examples they want considered when building the requirements list for this policy. The policy should be ready for the next workshop.
Phillippe made the point that the first OBI core will be a beta, and should be announced as such. However, we should also present use-cases of how to use OBI, as this was a major point made by the guests this morning. Should definitely be added as a new milestone. Bill mentioned BrainMap.org, which uses CVs to try to get info from neuroimaging studies.
Examples of use-cases: data annotation, text mining, data aquisition, querying/searching (sparql?). Alan has a triple-store in science commons. We can load up OBI and data that has been annotated with OBI into it, and then Alan can write queries against it – another good example use-case.
Barry: equipment/instrument branches should be made in tandem with the vendors. This is already happening now via the PSI community, which already has links to vendors. Alan already has info on plasmids that's "waiting for an OBI makeover".
Over the next few days, we will do agenda/discussion items if we hit walls after working on the ontology. Items which have time constraints on them (based on when specific people arrive and leave) have been placed in the agenda at appropriate times. The updated agenda, as well as the combined minutes of Helen and myself, are up on the OBI Wiki (https://wiki.cbil.upenn.edu/obiwiki/index.php/Meeting_notes_and_report).
In the remaining 20 minutes, we talked a little about Matt Pocock's proposal for the final organization of the owl files for obi. His email can be read here: http://sourceforge.net/mailarchive/message.php?msg_name=200707091513.52789.matthew.pocock%40ncl.ac.uk
This leads to larger questions of what we want OBI to do, and what users we're aiming at. Programmers? People who want to reason and assert? Biologists who only want to browse? OLS (ontology lookup service) only works with OBO, and the NCBO's portal (an ontology browser) only works with single OWL files. Bill will send around the email he used when he contacted the NCBO to get BIRNLex to work with their browser (BIRNLex also has multiple OWL files), and we can send a similar one that is specific to OBI, to let NCBO know that we would really appreciate being able to use their service for OBI.
It would be good to have multiple verisons of the "Tier 3". Would also be good to have simple text files, that just have tab-delimited class and definition pairs. This means it would be nice to run a set of scripts that would make any of the "simple" files we want, perhaps at every svn commit.
We should have 15 minutes that covers what Protege 4's status is, and what it looks like.
Day 4 consisted mainly of discussions regarding the nature of a study versus investigation. These terms themselves are loaded, and what one person thinks of and calls a study may be exactly what others call investigation. In the end, after about 3/4 of the day devoted to the subject, the answer was…. ah, but that would be telling! Read through the notes of the day, and then at the end, all will be revealed.
- those on the instrument branch (and more generally, all OBI development branches where appropriate) should remember the following:
- don't ignore what the vendors are doing – they'll have term lists
- don't ignore what the communities have done (ie metabolomics and proteomics)
- The full specification for who is assigned to which branches, and how we will physically create the branches, is available on the branch OBI Wiki page.
Study Design Use Case – Drawing terms from the Community – Phillippe Rocca-Serra
- Very interesting, but I wasn't able to take many notes due to updating the OBI wiki on the Branch implementation. If anyone has any good notes, either let me know and I'll include them here, or add them yourself as a comment!
- The take-home message is that trying to reconcile many groups' different meanings for the same words is difficult, but drawing a large set of terms from literature and communities is very important.
- By showing us what he has done, it gives all branch groups an idea of the work involved in working on their branches.
- If a relation is missing while developing a branch, should it be referred to the relation ontology immediately?
- Try and define it as a group before submitting to RO, and ask for help if required. definition includes restrictions on where these relations can connect to and from.
- far too many ontology projects collapse because they have far to many relations.
- Should put a use-case of OBI on the wiki, for example, post use-cases of composition of study design
- When branches create terms, at the same time as they make the terms, useful community alternative terms should be added. When you get an OBI term, you will immediately get many other associated terms.
- Not only would you have a special interface for the biologists, but you would also be able to have computer people have a special structured interface to the literature via these alternative terms
Finding a real home for Information_Entities
- At the moment BFO has qualities, information entities (generically dependent) and realizable entities (specifically dependent) as children of dependent continuant (DC). But generically dependent is not the same as the current definition of DC, and therefore needs to be made a sibling called generically_dependent_continuant.
- Digital objects can exist in many different places and be identical. No other DC objects can be identical in this way.
- DC and GDC will be disjoint.
Solving Study versus Investigation
- If we are doing OBI, then we need an investigation design
- Consensus was quickly reached on what an Investigation is:
- Investigation is a process in which data are gathered from one or more protocol applications with the goal of reaching conclusions. It is guided by the realization of a investigation_design.
- However, whether or not a study was necessary at all was a real debate.
- One group thought a study was just a smaller version of an Investigation, and could therefore be dealt with by having investigations within investigations.
- The other defined a study as everything done to materials prior to actually doing the assays
- Therefore, a child of protocol_application was created for this purpose.
- A sample_preparation_for_assay is a protocol_application including material_enrollments and biomaterial_transformations. There is alternative_term for this called "study".
Today was a highly informative combination of talks and further improvement of OBI. Hopefully, you'll find these musings on the day's work helpful at either jogging your own memory of the events, or in giving you an idea what went on in our heads.
Ontologies – How do we integrate and/or make use of them?
Can we, at the moment or in future, place
parent classes for all OBO ontologies in OBI? Definitely not now, as they don't share the same ULO (Upper Level Ontology). Some work is being done by the OBO-UBO group on mapping OBO ontologies to ULOs like BFO. (See the OBO-UBO web page for more information)
In a related question, should all OBO
ontologies use BFO? It would make integration a much more straightforward process. In my opinion, this would be a great idea in the long term, however practicalities may prevent it. 🙂
Should things like
BioTop (http://www.ifomis.uni-saarland.de/biotop/) be integrated
into OBO, under BFO but before OBI? In my opinion (though today was the first time I have read about BioTop so it isn't the most informed one), in our case probably not, as resolving the three may be problematic. However, some terms or ideas might be useful to share.
Formal OWL, aka making OBI Formally correct
Should be assigned
to someone/some people for later, after more classes have been
created. There is simply too much flux in the file at the moment. Get the graphs in place first, perhaps working on some
complex relations as you go. Further, the definitions must explicitly hold information
on creating these relations, irrespective of whether or not you make the relationships as you go or at the end.
BFO and OBI use
different metadata tags, and there should be a
shared set of tags.
The metadata tags
used in BFO are part of snap/span, I think. Would need to bring up the idea of metadata resolution (if possible, and we all agree it should be pursed) with that group too.
Barry Smith will bring OBI's information object and plan terms to the BFO group.
A milestone has been added (see the OBI Wiki) to
hammer out exact implementation of the metadata list, and to work
with other communities as appropriate (e.g. BFO, OBO Foundry =
Barry, M Ashburner, Suzie, Chris M.).
Ontology – Simona Carini & Barry Smith
Rctbank is a
clinical trail db – information on all published clinical trials.
(from journal articles)
Its purpose is to provide enough
information to allow evaluation of these trials
RCT = randomized
Epoch and Clinical
Trial Ontology (CTO) are the other two that are being developed.
Barry Smith is involved in CTO, and therefore is built with OBI
in mind, but is still very small
RCT and Epoch
aren’t close to being OBO/OBI compliant.
Their choices are
in conflict with the choices we’ve made
- that does NOT mean that they aren't imminently useful (which they are), just that merging would be problematic
There has been
agreement between Epoch and RCT that all should work towards a CTO
that will work within the OBI framework
reconciliation is one of the goals of the CTO workshop in May.
There are people
claiming to develop a CTO but it is actually a CT database
ontology (I missed the name of the people being referred to here). It isn’t
the same beast. Understanding the data is not equivalent to
understanding the processes in a trial.
RCT Schema – Barry
independently of OWL or protégé, and is more correctly
a database schema, though it is called an ontology.
Not the right way
to do it – it is unbalanced: no place for a study, though is a
place for a 2o study.
2o study seems to
be at the wrong level in the hierarchy
it is unclear what
trial details means
When the same term (or portion of a term) is repeated
over and over, it is often the a sign of a mistake, of redundancy
One of the
children of population concept is population.
An ontology is
important for reasoning using the is_a hierarchy, which can be reasoned
over: Population is NOT a population concept and is NOT a concept
blocked here “from both directions”
Further, a recruitment
flowchart is not a population concept
These things, like
population concept, are headers/labels/conveniences, but they are not
ontological forms. Some options for restructuring could be the following two things:
is_a continuant is_a entity
occurrent is_a entity
Not all RCT terms have
Epoch Ontology (Dave
Parrish in charge of it) – Barry Smith
There are parts of
this ontology that don’t belong in the CTO, but do belong in OBI
developed to support the immune tolerance network (ITN), a big
clinical trial resource: they fund, implement, monitor and assess
clinical trials, and provide data services.
of ITN perform operations (generation and collection) -> data
management -> analysis
They have an
ontology of the kind of analytical steps their software needs to
perform, and it helps them configure the software application.
For example, elements are claimed to be
nouns, and represent the physical objects of the system. Classes of
elements are domain types, containers, relationships. These are not
physical objects always – they’re sometimes processes. Also,
they are not always nouns.
Fits in with the
community milestones, i.e. we could get many terms from the clinical trials community.
Branches have been assigned. See the OBI Branches Wiki Page for up to date information.
current terms in various OBO ontologies to BFO
biological process is_a span:process
already developed an environmental ontology in a plant context,
which we should remember and hopefully incorporate useful terms in the first round of community term dates.
Have moved all terms
that would fall under PATO out of the ontology, e.g. state and
anything under quality.
Do we really need
"in vitro state" as well as "in vitro"? Terms such as
these are always tied to objects like cells – these are not design
as much as the state of the cells.
Is in vivo
a location or a state? You can take in vitro cells and put
them into “vivo”, and they are still in vitro cells,
which means in vitro is a BFO quality.
The interior of
your gut is the site for your gut bacteria. The interior of gut (IG)
is also a type/node in the FMA (as a location). IG has qualities
(shape, etc). In addition to these qualities it has others that
determine its roles (having certain pressure, pH value). How to
distinguish what FMA means from what an environment ontology means?
If we remove
in-vivo_state, we run into problems with multiple inheritance. We
needed to separate out the state of a biomaterial from the
biomaterial itself, i.e. don’t have in-vivo_material as a child of
What terms do we
need to use to describe diseases?
Disease (hook for
disease ontology), disease_symptoms, disease_stages,
Ended up going through the entire ontology, resolving many problems. There is a new OWL file, but it is not yet ready for public consumption therefore it won't be posted here until it is available from the official OBI pages.
There is general consensus among the workshop attendees that a very large amount of work is getting done, and there is a lot of positive feeling that the Milestones developed this week are giving us hard dates for inclusion of many more terms. The addition of terms can only truly start once the high-level structure has been decided, and this workshop has moved in great leaps and bounds towards a final structure of the higher levels of OBI. The "higher levels" have been generally defined at this meeting as the top two levels of OBI below BFO. This is what was completed today: the two levels directly below BFO have been studied by the group and cleaned.