Data Integration

Background: What are ontologies? (Thesis 1.5)

[Previous: Standards as a shared structure for data]
[Next: Data integration methodologies for systems biology]

What are ontologies?

1 Introduction

Controlled vocabularies, often just a list of agreed-upon terms, have been used to describe data since the early releases of the first biological databases such as EMBL [1] and Swiss-Prot [2]. As data handling in the life sciences matured, more complex methods of categorising terms and their relationships developed. Ontologies are one such method, and their use has become pervasive within the life sciences. For example, 3137 papers have cited the main GO [3] publication as of December 2011 (Note:, accessed 3 August 2011). As their production is the result of scientific research and activity, ontologies have become accepted as “first-class citizens” of science, and their evaluation and maintenance are considered part of that activity [4]. Ontologies are mainly used in systems biology as a method of adding extra levels of information to core data. This extra information is called metadata, and describes the context of the data. For instance, author information, related publications and links to biological databases are all useful context for a systems biology model, but are not required for its simulation. While helpful even when used as a controlled vocabulary, an ontology can also be a primary structure for the data and, when created with the appropriate semantics, can be used in automated integration tasks [5]. The semantic data integration methodology described in this thesis uses ontologies, rules and reasoning to convert heterogeneous data into homogeneous knowledge.


  • define a clear, logically consistent structure for information that can be shared among people and is understandable to computers;
  • enable reuse of knowledge both within a single domain and across multiple domains;
  • explicitly describe information about a research domain;
  • separate domain knowledge from the operational knowledge;
  • can be reasoned over for the purposes of analysis and drawing inferences from domain knowledge that would otherwise remain hidden [6].

Within the life sciences, ontologists create ontologies both as reference works for a particular domain of interest and as a specific tool for a specific application. Reference ontologies such as GO provide terms commonly used for generic tagging of database entries, whereas application ontologies are designed for a specific purpose such as part of a bioinformatics tool. Section 2 provides an explanation of ontologies with respect to other classification schemes, while commonly used definitions of an ontology and strategies for modelling are examined in Section 3. Ontology languages used in biology are described in Section 4. Section 5 describes the structural components of ontologies. Naming conventions used in this thesis are listed in Section 6, and the choice of ontology language and modelling strategy is explained in Section 7.

2 From tag clouds to ontologies

The simplest way to categorise objects is via social tagging, where people add whatever tag or tags they feel are best suited to describe an object, thus creating folksonomies [7]. There are no definitions for these tags and there is no way to relate tags to each other. Frequencies of words can be analysed, and tag clouds are often created to display relative frequencies. The benefit of folksonomies is that they are free-form, with no constraints on the words people use to describe their objects, making them highly accessible to the general public. However, free tagging can lead to a large amount of near-identical tags for the same concept, making searches difficult. Folksonomies and ontologies are different, rather than opposing, technologies; where ontologies are an enabling technology for sharing information, folksonomies are emergent properties of shared information [8].

Controlled vocabularies are lists of terms that have been enumerated explicitly, of which there are four main types: simple lists, synonym rings, taxonomies and thesauri [9]. A list is a set of terms whose membership is limited, while synonym rings are used to broaden search queries according to predetermined lists of synonyms. Taxonomies are hierarchical controlled vocabularies where the hierarchy is determined by a single named relationship [9]. Thesauri are taxonomies that have additional standardised relationship indicators such as “equal to” or “related to”. Ontologies provide an extra level of expressivity and complexity by allowing specialised relationships beyond those available with any controlled vocabulary, such as one entity being a part of another. The W3C perceives ontologies as critical for searching and merging information [10].

3 Definitions

Ontologies are models of domains of interest, typically defining concepts using formal logic-based representations as well as relationships between, and constraints on, those concepts. In one of the most widely cited computer science definitions of an ontology, Thomas Gruber states that an ontology is “an explicit specification of a conceptualization” [11]. In other words, an ontology explicitly specifies, or defines, a concept of interest. Within the same paper, Gruber provides an extension to his original definition: “a specification of a representational vocabulary for a shared domain of discourse” [11]. This definition extends the first, stating that a particular vocabulary should be the method by which the shared domain of interest is described.

The definitions above are abstract, restricting the concepts used to describe a domain without specifying the syntax of that description. Concrete definitions of ontologies are also common. Such definitions make use of the structural aspects of an ontology. Gruber’s concrete definition of an ontology is “a set of representational primitives with which to model a domain of knowledge or discourse” [12]. In this definition, ontologies describe a topic, or domain, of interest in a formalised way using representational primitives. These primitives, or ontology entities, should include information about their meaning as well as any constraints on their use [12].

Bijan Parsia, one of the developers of the Pellet reasoner [13], states that a good first approximation of a concrete definition of an ontology is simply a set of axioms. Very generally, axioms are closed, well formed (Note: statements declaring what is true about the domain of interest [14]. The set of axioms is the main component of an ontology, and is built using ontological entities such as classes, properties, and instances (see Figure 1). Parsia’s definition is only a first approximation because an ontology can be defined not just as the set of its axioms, but be wholly characterised through the union of its

  • name/identifier;
  • set of annotations;
  • set of imported ontologies; and
  • set of axioms [14].

Figure 1: In each case where there is a single term followed by a set of terms in parentheses, those in parentheses are synonyms or near-synonyms of the first term. Further, the first term is the choice for this thesis. Casing conventions are indicated. The circles in the centre class show that while all universals are classes, not all classes are universals. Classes are abstract conceptualisations which describe common sets of properties. Concept is commonly used as a synonym for class. Concrete instantiations of these classes are called instances, and are constrained by the properties declared on that class. Commonly used synonyms for instances include individual and particular. Collectively, classes, properties, rules and instances are called entities in this thesis. The properties used to define membership in a class can define the relationship between two instances, or can link an instance to a data property such as a string or integer value. Synonyms and near- synonyms for property include slot, axiom, relation or relationship. See Section 6 for more information.

Philosophical versus computer science ontologies

In addition to the definitions described above, there are two categories of modelling strategies for building ontologies. Ontologists with a computer science background tend to perceive ontology development very differently from those with a philosophical background [15]. The role of an ontology to a philosopher is to accurately model reality, while computer scientists perceive an ontology’s role as shared communication and computational readability. The contrasting points of view of computer scientists and philosophical ontologists produce two strategies for ontology development which can be distinguished based on the intent, or purpose, of the ontologist [15].

Philosophical ontologies are created with the intent of representing reality as closely as possible [11]. The description of the nature of reality hinges upon how universals are defined [16]. Differences in the nature of universals result in three main strategies: realism, conceptualism and nominalism. Universals are the key to understanding the differences in these strategies.

Universals qualitatively identify and describe similarities among instances according to their common properties [17]. Universals are similar, but not identical, to ontology classes: all universals are generally modelled as classes, but all classes are not universals. Universals describe concepts and patterns which are repeatable in nature, such as black labradors. Other type of classes include arbitrary patterns that are useful to group together, but which are not repeatable in nature. One example of such an arbitrary, non-universal class is the union of (Cheeseburger Monkey Planet) (Note: While there may be some reason to represent the conjunction of these classes as a single class, they do not share properties other than being present in the set itself.

In realism, universals are considered both to exist, just as the particular instances which belong to a universal exist, and to be independent from any one person’s beliefs or language [18]. Conceptualists believe universals exist only as abstract concepts, not as real entities [17]. Nominalism does not make use of universals at all, not even as abstract concepts. Instead, nominalists believe that only instances exist and that the problems surrounding the nature of universals can be resolved through careful thinking about instances alone [17].

The “mathematics” perspective of Rzhetsky and Evans is broadly synonymous with this type of ontology; here, computationally-accessible realist ontologies could ultimately merge into a single ontology for all of the life sciences [19]. The main deviation from the definition of philosophical ontologies presented in this section is their statement that this perspective is a view prevalent among computer scientists, which is the exact opposite of what the other articles referenced in this section describe for computer science ontologies.

In computer science, an ontology only becomes useful to scientists if it successfully models a domain of interest. In other words, the purpose of an ontology is to fulfil a role and accomplish a task. In a very general sense, ontologies in computer science provide a representation of common areas of interest which can be useful both for humans and computers [15]. While a computer science ontology may be viewed in the context of a philosophical ontology, such an alignment is not the driving force in its creation. Computer science ontologies are intended to explicitly describe the shared background knowledge of a community in a way that facilitates communication and, as a consequence, integration of that knowledge [5]. Rzhetsky and Evans describe an “Esperanto” perspective which closely aligns with this definition of computer science ontologies by presenting on the one hand a social, collaborative view of ontology development and on the other hand a practical acceptance that multiple ontologies will persist within the life science community and that integrative measures will be required to manage the situation [19].

In computer science strategies, data is defined as a collection of facts, and knowledge is defined as the union of data and an interpretation of the data’s meaning [20]. As there can be many interpretations of a dataset, there can be many ontologies which validly describe that dataset. This is in contrast with the purpose of a philosophical ontology, which is to closely model existence. A philosophical ontologist might argue that there is only one true ontology to which a particular dataset or format can be applied, and that all others are not true interpretations of reality. As Gruber states, while philosophy is concerned with existence, those building computer science ontologies define that which exists as exactly that which is represented in their ontology [11]. Computer science ontologies focus on communicating a shared understanding of the domain of interest in a way accessible both to computers and humans; how closely reality is modelled is irrelevant in such a strategy.

In order for a computer science ontology to fulfil its role, certain structural constraints are often imposed. For instance, while Noy and McGuinness agree with Gruber that an ontology defines a shared domain of interest, they also require that ontologies be interpretable by machines [6]. Barkmeyer and Obrst go one step further, stating that written knowledge cannot be an ontology unless it is both understandable to computers and suitable for automated reasoning [21]. Barkmeyer and Obrst believe that an ontology without access to automated reasoning is incomplete and not fit for purpose. This requirement of a format being suitable for automated reasoning is at odds with many philosophical definitions of ontologies.

Rzhetsky and Evans add a third type of ontology to the two already mentioned; their “CS” (Note: This ontological perspective is called “computer science” in the paper by Rzhetsky and Evans, however to avoid confusion with the definition of a computer science ontology used earlier in this section, the acronym CS is used.) perspective, which describes a one-ontology-per-tool method of ontology development intended to be in direct competition with the mathematics perspective [19]. While on the surface this perspective shares some aspects with the already-described computer science ontologies, particularly the fit-for-purpose practicality of ontology development, in fact this perspective requires that each ontology is made and kept in isolation, with no attempt at reconciliation. The definition of computer science ontologies used in this section has as its primary requirement the facilitation of communication of a domain of interest both for humans and computers, which is why it most closely matches the Esperanto perspective.

4 Ontological formalisms, languages and formats

There are three initial considerations to take into account before construction of an ontology can begin: the formalism, or logical framework available to software systems; the knowledge representation language, which is a human-readable interpretation of a formalism; and finally, the syntax used for storing the ontology, or its format. Ontology languages may have multiple formal interpretations, or formalisms. However, there needs to be a tight link between a language and its formalism, as a human-readable definition in a language needs to be accurately represented in its formal definition (Note: The choices of ontology language, and even of the format, are independent from the choice of expressivity level and therefore from the choice of formalism. Ontology languages and formalisms are abstract concepts and, as such, require formats with which to create, share, and analyse the ontology itself. A single ontology language may also be implemented in more than one format.

The two most commonly used ontology languages in the life sciences community are OWL [14] and OBO [22], and this section describes the languages themselves as well as their formalisms and formats.


Description Logics

DLs are formalisms for knowledge representation characterised by various levels of expressivity. Knowledge-based systems created through the use of DLs are capable of finding implicit consequences of explicitly represented knowledge [23, pg.2]. Historically, DLs have been used in many domains: software engineering, configuration, medicine, digital libraries, Web-based information systems (e.g. the Semantic Web), data mining, natural language processing and data integration [23, pg. 23-29]. The more expressive a DL is, the less tractable it is for reasoning purposes [23, pg9]. Therefore, a language must be chosen that has an appropriate ratio of expressivity to tractability. DLs are widely used in the biomedical community via the OWL language, where OWL-DL is one of a number of OWL 2 profiles [24]. The DL formalisms accessible through OWL have the ability to represent complex logic constructs and constraints such as number restrictions and property hierarchies [23, pg8]. Editors such as Protégé (Note: and libraries such as the OWL API [25] can determine in which profile a particular ontology is written.

The OWL Language

Semantic Web technologies such as RDF and OWL have been espoused as a unified framework for integrating and retrieving life science data [26, 27, 28], and are core components of the Semantic Web. OWL improves upon RDF by providing the ability to explicitly describe objects and make assertions about them [28]. Constraints on class membership such as disjointedness (e.g. stating that BlackLabrador cannot also be a Person) cannot be expressed in RDF [29]. Decidability and complexity go hand-in-hand; the more complex the logic is, the more challenging the reasoning [23, pg44]. Therefore, a variety of tractable subsets of OWL, called profiles, have been developed. Each profile has optimisations for particular reasoning tasks. OWL-EL is a profile optimised for large numbers of properties and classes, while OWL-QL is aimed primarily at ontologies with large numbers of instances, where querying is the main concern [24]. Finally, OWL-RL is a profile which combines expressive power with scalable reasoning times and which is suitable for use with rule languages and rule-based reasoning [24]. However, OWL profiles are not limited to these three; for instance, OWL-DL and OWL-Lite are both valid OWL profiles.

OWL formats

OWL can be expressed in a number of formats, including the Manchester OWL Syntax [30] and RDF/XML. The Manchester OWL Syntax is highly understandable to a human, while the triples of an RDF-based format are easily handled by standard RDF libraries available to many programmers. The OWL RDF/XML format is created by layering OWL on top of RDF, which can then be serialised into XML [26]. Presenting biological data in RDF-based formats allows unambiguous naming of entities via URIs, simple addition of data via graph structures, the use of the open world assumption and the addition of new data without invalidating or changing existing data [26].


OBO Formalism and Language

The OBO Foundry [22], a consortium of ontology developers, does not explicitly state which formalism is required of its ontologies. However, as the Foundry recommends usage of their upper-level ontologies by all of its domain ontologies, these upper-level ontologies provide direction for Foundry domain ontologies. BFO, an OBO upper-level ontology for scientific research, was developed with interpretations in OWL, first-order logic and the native OBO. The RO [31], an upper-level ontology concerned with high-level biology-specific relationships, is native to OBO but also automatically converted to OWL.

While commonly used in the life sciences, the OBO language is not well suited to semantic reasoning, inference and querying. Indeed, when Golbriech and colleagues created a mapping from OBO to OWL, thus making reasoning and strict semantics available to OBO ontologies, modelling errors that had previously gone unnoticed were discovered [32]. While DL languages such as OWL were developed to unambiguously describe ontological concepts, OBO contains ambiguous and informal descriptions of its concepts, and as such its reasoner can miss important inferences [32]. In contrast to the formal approach provided by OWL where the properties link instances of classes and therefore must be quantified, OBO is a terminology-based language whose properties link classes directly [33]. As such, within OBO existential qualifications such as some, exactly or only are not possible [33]. Additionally, OWL has the ability to represent more complex logic constructs, such as number restrictions and property hierarchies, and OWL reasoners are more powerful than their OBO counterparts [32]. Updates to bring OBO closer to OWL and provide lossless conversion from OBO to OWL address many of these limitations [34].

The OBO Format

The OBO language currently only has a single format and therefore OBO can be considered both a language and a format. Like the Manchester OWL Syntax, OBO is highly readable to humans and is composed of stanzas describing entities and their locations within the hierarchy. Figure 2 shows a GO term and its stanza.

id: GO:0001555
name: oocyte growth
is_a: GO:0016049 ! cell growth
relationship: part_of GO:0048601 ! oocyte morphogenesis
intersection_of: GO:0040007 ! growth
intersection_of: has_central_participant CL:0000023 ! oocyte

Figure 2: The GO term oocyte growth and its stanza. This figure illustrates the native OBO format.

The OBO Foundry has created a set of naming conventions and other principles that, while not part of the format, are important restrictions on how the ontologies are constructed (Note: If utilised by the OBO Foundry ontologies, Foundry principles such as naming conventions aid social collaboration as well as the semantic alignment of multiple ontologies [35].

5 Structural components of an ontology

An ontology can be broadly defined by two types of components: entities and descriptors [36]. Entities are the main body of the ontology and include classes, properties, instances and rules. Figure 1 shows each of these entities and how they are related. Each class is an abstract concept representing a group of instances [36]. In DL, only one definition for each class is allowed, and that definition must not be cyclic [23, pg.13]. Properties describe the links between instances [23, pg.46], and can be expressed between classes and applying to all instances of that class [36]. Rules can be written in OWL or OWL-based rule languages such as SWRL [37], which provides the capability to write if-then statements about an ontology. Descriptors are vital for human understanding of an ontology, and include documentation (such as English definitions of entities) and annotation (such as notes and other metadata) [36]. Imports of other ontologies are not entities, but neither are they strictly descriptors. Import statements are vital for completeness of the ontology and for its successful reasoning, as they provide a method of including additional ontological resources.

The components of an ontology are commonly grouped into the TBox and ABox; classes reside in the TBox and instances in the ABox. Sometimes a third grouping, the RBox, is defined to specify the location of the rules within an ontology. The TBox is the component of an ontology describing intensional knowledge, or the properties of an entity required to identify it, and is considered to be unchanging knowledge [23, pg.12]. The ABox is similar to extensional knowledge, and describes knowledge specific to the domain of interest, or to a particular problem [23, pg.13]. ABox knowledge is considered to be dependent upon a set of circumstances, and therefore changeable [23, pg.13]. Some ontologies contain a set of rule axioms; such a set is called the RBox [38]. More information about rules in ontologies is available in Section 1.6.

There are a number of commonly used metrics available for summarising the components of an ontology (see Table in Chapter for examples). Class metrics include a count of the classes themselves as well as various metrics describing their axioms. If an instance fulfils the necessary and sufficient conditions of a class, then a reasoner will infer that instance to be a member of that class. These conditions are also known as equivalent class axioms. Similarly, necessary conditions (also called subclass axioms) must hold true if an instance is asserted to be a member of the class, but such axioms are not sufficient to infer the placement of an instance under that class. If an instance is asserted to be a member of two disjoint classes, then the reasoner is able to deduce an inconsistency [39].

There are also a number of useful property-level metrics. Total numbers of object and data properties give a general idea of the possible connections between classes. Functional properties are those which have either zero or one value. In other words, no individual in the domain of a functional property may have more than one value in the range of that property.

6 Conventions

Due to the large number of nearly synonymous terms for the components of an ontology where multiple equivalent terms exist, one has been used throughout. Figure 1 graphically illustrates the conventions described here, and includes synonymous and near-synonymous terminology and how each component interrelates. Figure 3 provides a simple example of a partial ontology which matches the structure described in Figure 1. The design decisions made with respect to the available range of terminology are listed below.

Figure 3: Using the structure described in Figure 1, this concrete example of a portion of an ontology shows how classes, properties and instances are used together to generate a model of a domain of interest. Here, the domain of interest is black labradors. See Section 6 for more information.
  • Class and concept are often used interchangeably as a name for the entity which describes and constrains a group of instances. Because concept is often used as a more general term similar to entity, class is used to describe this ontology component within this thesis. A third term, universal, is similar but not equivalent to class. While all universals should be written as classes when creating an ontology, all classes are not universals (see Section ).
  • Instance, individual and particular are all used to refer to the members of a class which are specific instantiations of the class type. Instance is the term used throughout this thesis.
  • Property, slot, relation and relationship are all used to describe the links between two instances as well as links from an instance to a data type (e.g. a plain string or integer). Property has been chosen for this thesis.
  • Entity is used in this thesis as a general term to define any object (e.g. class, property, instance, rules) within an ontology.
  • The definitions of computer science ontologies and philosophical ontologies match the definitions provided by Stevens and colleagues [15].

The following typographical conventions were used to distinguish specific object types from standard text:

  • ClassName: the name of a class is shown in an italicised font. Further, classes created in this research begin with a capital letter, although some third-party ontologies do not follow this convention.
  • propertyName: the name of a property is shown in an italicised font and begins with a lower-case letter unless otherwise specified. In some third-party ontologies, underscores are used to separate words in the property name, while in this thesis CamelCase is preferred.
  • instanceName: instances are shown in a fixed-width font and begin with a lower-case letter unless otherwise specified.
  • XML element and attribute names are used when describing some XML-based data sources. Such names are shown in a sans serif font.

7 Ontological design decisions

The Semantic Web allows computers to go beyond performing numerical computations and provides a common structure for easily sharing, reusing and integrating information (Note: OWL, a Semantic Web standard for representing knowledge, enjoys strong tool support and is often used for capturing biological and medical knowledge. OWL ontologies in the life sciences include, amongst others, OBI, BioPax [40], EXPO [41], FMA [42] and GALEN [43]. Once the information about the domain has been modelled in OWL, a software application called a reasoner (such as Pellet [13], FaCT++ [44] or HerMIT [45]) can automatically infer all other facts that must logically follow as well as find inconsistencies between asserted facts. OWL reasoners perform four main tasks:

  • Consistency checking. These checks ensure that all of the logical statements asserted in an ontology do not contradict each other. Inconsistent ontologies have no models which do not contradict the statements made in the ontology, and therefore no meaningful conclusions can be made [46].
  • Class satisfiability. Satisfiability is whether or not a class can have any instances without making that ontology inconsistent. If a class is unsatisfiable, it is equivalent to the empty set, or owl:Nothing, and therefore no useful analysis can be done with that class. Such an error is generally due to a fundamental modelling error [47].
  • Classification. During classification, the placement of all classes are checked and rearrangements to the hierarchy (based on the asserted logic) are made, creating an inferred hierarchy. Such inferred hierarchies are useful, especially when an ontology is normalised such that each class has only one asserted superclass, or parent [48]. Once classification has occurred, a class may be inferred to be a member of more than one parent class. Normalisation has a number of benefits, including aiding the creation of modular ontologies [49].
  • Realisation. Realisation is performed on instances in an ontology. When realizing instances, all instances are placed under their most specific defining class or classes, if they belong in multiple locations (Note:

Due to the more formal nature of the language, the use of existential quantification and the reasoning benefits listed above, OWL was chosen for this research. Specifically, OWL-DL was chosen because the DL profile ensures that ontologies are decidable and can be reasoned upon. While research domains are successfully modelled in OBO, this higher level of expressivity together with the logical inferences available when reasoning over a DL-based ontology make a DL format such as OWL the best choice. In the thesis, OWL is used as a shorthand for OWL 2 DL. With DL, the implicit knowledge that is present within an ontology—and which is not immediately obvious to a human—can be made explicit through inference and reasoning [23, p. 61]. Irrespective of the level of expressivity of an OWL ontology, OWL is more useful than other ontology languages such as OBO when reduction of the ambiguity of interpretation is paramount [50]. While any of the OWL syntaxes could be used, to ensure compatibility with the broadest range of third-party software, the RDF/XML format was chosen, although examples presented in this thesis make use of the more human readable Manchester OWL syntax [51].

Although the various strategies such as realism and nominalism are hotly debated in the philosophical ontology community, to a computer scientist, such differences are, to all intents and purposes, irrelevant. Irrespective of whether or not a BlackLabrador is a universal which exists in reality, it is a concept which a researcher may be interested in studying. Therefore, from a computer science perspective, it does not matter if BlackLabrador truly exists: it is enough that it is a concept which needs to be modelled. Further, the outcome of using BlackLabrador in an ontology will be the same, irrespective of whether it was added from a realist or nominalist point of view. Ultimately, philosophical strategies require that a concept must exist in reality for it to be modelled. Robert Stevens describes scientific concepts such as the Higgs boson and Newtonian mechanics as “unicorns” because they are imaginary, conjecture or simply convenience entities [52]. Even though these entities are not “real” according to a philosophical modelling strategy, they are relevant to science and need to be modelled. Because these unicorns need to be modelled, and because computer science ontologies are based on a shared understanding of the domain as well as computational accessibility, a computing science modelling strategy was chosen for the work described in this thesis.


Tamara Kulikova, Ruth Akhtar, Philippe Aldebert, Nicola Althorpe, Mikael Andersson, Alastair Baldwin, Kirsty Bates, Sumit Bhattacharyya, Lawrence Bower, Paul Browne, Matias Castro, Guy Cochrane, Karyn Duggan, Ruth Eberhardt, Nadeem Faruque, Gemma Hoad, Carola Kanz, Charles Lee, Rasko Leinonen, Quan Lin, Vincent Lombard, Rodrigo Lopez, Dariusz Lorenc, Hamish McWilliam, Gaurab Mukherjee, Francesco Nardone, Maria P. Pastor, Sheila Plaister, Siamak Sobhany, Peter Stoehr, Robert Vaughan, Dan Wu, Weimin Zhu, and Rolf Apweiler. EMBL Nucleotide Sequence Database in 2006. Nucleic Acids Research, 35(suppl 1):D16–D20, January 2007.
The UniProt Consortium. The Universal Protein Resource (UniProt). Nucl. Acids Res., 36(suppl_1):D190–195, January 2008.
Michael Ashburner, Catherine A. Ball, Judith A. Blake, David Botstein, Heather Butler, J. Michael Cherry, Allan P. Davis, Kara Dolinski, Selina S. Dwight, Janan T. Eppig, Midori A. Harris, David P. Hill, Laurie Issel-Tarver, Andrew Kasarskis, Suzanna Lewis, John C. Matese, Joel E. Richardson, Martin Ringwald, Gerald M. Rubin, and Gavin Sherlock. Gene Ontology: tool for the unification of biology. Nature Genetics, 25(1):25–29, May 2000.
Carola Eschenbach and Michael Grüninger, editors. Ontology (Science), volume 183 of Frontiers in Artificial Intelligence and Applications. IOS Press, 2008.
Robert Hoehndorf, Michel Dumontier, Anika Oellrich, Dietrich Rebholz-Schuhmann, Paul N. Schofield, and Georgios V. Gkoutos. Interoperability between Biomedical Ontologies through Relation Expansion, Upper-Level Ontologies and Automatic Reasoning. PLoS ONE, 6(7):e22006+, July 2011.
Natalya F. Noy and Deborah L. McGuinness. Ontology Development 101: A Guide to Creating Your First Ontology.
Thomas V. Wal. Folksonomy, February 2007.
Thomas Gruber. Ontology of Folksonomy: A Mash-up of Apples and Oranges. International Journal on Semantic Web & Information Systems, 3(2):1–11, 2007.
National Information Standards Organization. ANSI/NISO Z39.19 – Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies. National Information Standards Organization, Bethesda, Maryland, U.S.A.
The W3C OWL Working Group. OWL 2 Web Ontology Language Document Overview., October 2009.
Thomas R. Gruber. A translation approach to portable ontology specifications. Knowl. Acquis., 5(2):199–220, June 1993.
Tom Gruber. Ontology. Springer-Verlag, 2009.
E. Sirin, B. Parsia, B. Grau, A. Kalyanpur, and Y. Katz. Pellet: A practical OWL-DL reasoner. Web Semantics: Science, Services and Agents on the World Wide Web, 5(2):51–53, June 2007.
Conrad Bock, Achille Fokoue, Peter Haase, Rinke Hoekstra, Ian Horrocks, Alan Ruttenberg, Uli Sattler, and Mike Smith. OWL 2 Web Ontology Language Structural Specification and Functional-Style Syntax, June 2009.
Robert Stevens, Alan Rector, and Duncan Hull. What is an ontology? Ontogenesis, January 2010.
Nino B. Cocchiarella. Formal Ontology and Conceptual Realism, volume 339 of Synthese Library. Springer, 2007.
Mary C. MacLeod and Eric M. Rubenstein. Universals. In Internet Encyclopedia of Philosphy, February 2010.
Alexander Miller. Realism. In The Stanford Encyclopedia of Philosophy (2008), Fall 2008 edition, 2008.
Andrey Rzhetsky and James A. Evans. War of Ontology Worlds: Mathematics, Computer Code, or Esperanto? PLoS Comput Biol, 7(9):e1002191+, September 2011.
Erick Antezana, Martin Kuiper, and Vladimir Mironov. Biological knowledge management: the emerging role of the Semantic Web technologies. Briefings in Bioinformatics, 10(4):392–407, July 2009.
Ed J. Barkmeyer and Leo Obrst. Re: [ontolog-forum] Just What Is an Ontology, Anyway? (archive of ontolog-forum mailing list), October 2009.
Barry Smith, Michael Ashburner, Cornelius Rosse, Jonathan Bard, William Bug, Werner Ceusters, Louis J. Goldberg, Karen Eilbeck, Amelia Ireland, Christopher J. Mungall, Neocles Leontis, Philippe Rocca-Serra, Alan Ruttenberg, Susanna-Assunta Sansone, Richard H. Scheuermann, Nigam Shah, Patricia L. Whetzel, and Suzanna Lewis. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nature Biotechnology, 25(11):1251–1255, November 2007.
Franz Baader, Diego Calvanese, Deborah Mcguinness, Daniele Nardi, and Peter Patel-Schneider, editors. The Description Logic Handbook – Cambridge University Press. Cambridge University Press, first edition, January 2003.
Diego Calvanese, Jeremy Carroll, Giuseppe De Giacomo, Jim Hendler, Ivan Herman, Bijan Parsia, Peter F. Patel-Schneider, Alan Ruttenberg, Uli Sattler, and Michael Schneider. OWL2 Web Ontology Language Profiles, October 2009.
Matthew Horridge, Sean Bechhofer, and Olaf Noppens. Igniting the OWL 1.1 Touch Paper: The OWL API. In Proceedings of OWLEd 2007: Third International Workshop on OWL Experiences and Directions, 2007.
Xiaoshu Wang, Robert Gorlitsky, and Jonas S. Almeida. From XML to RDF: how semantic web technologies will change the design of ’omic’ standards. Nature Biotechnology, 23(9):1099–1103, September 2005.
D. Quan. Improving life sciences information retrieval using semantic web technology. Brief Bioinform, 8(3):172–182, May 2007.
Kei-Hoi Cheung, Andrew Smith, Kevin Yip, Christopher Baker, and Mark Gerstein. Semantic Web Approach to Database Integration in the Life Sciences. pages 11–30. 2007.
Joanne S. Luciano and Robert D. Stevens. e-Science and biological pathway semantics. BMC bioinformatics, 8 Suppl 3(Suppl 3):S3+, 2007.
Matthew Horridge, Nick Drummond, John Goodwin, Alan L. Rector, Robert Stevens, and Hai Wang. The Manchester OWL Syntax. In Bernardo C. Grau, Pascal Hitzler, Conor Shankey, Evan Wallace, Bernardo C. Grau, Pascal Hitzler, Conor Shankey, and Evan Wallace, editors, OWLED, volume 216 of CEUR Workshop Proceedings., 2006.
Barry Smith, Werner Ceusters, Bert Klagges, Jacob Kohler, Anand Kumar, Jane Lomax, Chris Mungall, Fabian Neuhaus, Alan Rector, and Cornelius Rosse. Relations in biomedical ontologies. Genome Biology, 6(5):R46+, 2005.
Christine Golbreich, Matthew Horridge, Ian Horrocks, Boris Motik, and Rob Shearer. OBO and OWL: Leveraging Semantic Web Technologies for the Life Sciences. ISWC 2007, 4825:169–182, 2007.
Martin Boeker, Ilinca Tudose, Janna Hastings, Daniel Schober, and Stefan Schulz. Unintended consequences of existential quantifications in biomedical ontologies. BMC Bioinformatics, 12(1):456+, 2011.
Syed Tirmizi, Stuart Aitken, Dilvan Moreira, Chris Mungall, Juan Sequeda, Nigam Shah, and Daniel Miranker. Mapping between the OBO and OWL ontology languages. Journal of Biomedical Semantics, 2(Suppl 1):S3+, 2011.
Daniel Schober, Barry Smith, Suzanna Lewis, Waclaw Kusnierczyk, Jane Lomax, Chris Mungall, Chris Taylor, Philippe R. Serra, and Susanna A. Sansone. Survey-based naming conventions for use in OBO Foundry ontology development. BMC Bioinformatics, 10(1):125+, 2009.
Phillip Lord. Components of an Ontology. Ontogenesis, January 2010.
Ian Horrocks, Peter F. Patel-Schneider, Harold Boley, Said Tabet, Benjamin Grosof, and Mike Dean. SWRL: A Semantic Web Rule Language Combining OWL and RuleML., May 2004.
Adila Krisnadhi, Frederick Maier, and Pascal Hitzler. OWL and Rules. In Reasoning Web 2011. Springer, to appear.
The W3C Consortium. OWL Web Ontology Language Overview, February 2004.
Emek Demir, Michael P. Cary, Suzanne Paley, Ken Fukuda, Christian Lemer, Imre Vastrik, Guanming Wu, Peter D’Eustachio, Carl Schaefer, Joanne Luciano, Frank Schacherer, Irma Martinez-Flores, Zhenjun Hu, Veronica Jimenez-Jacinto, Geeta Joshi-Tope, Kumaran Kandasamy, Alejandra C. Lopez-Fuentes, Huaiyu Mi, Elgar Pichler, Igor Rodchenkov, Andrea Splendiani, Sasha Tkachev, Jeremy Zucker, Gopal Gopinath, Harsha Rajasimha, Ranjani Ramakrishnan, Imran Shah, Mustafa Syed, Nadia Anwar, Ozgün Babur, Michael Blinov, Erik Brauner, Dan Corwin, Sylva Donaldson, Frank Gibbons, Robert Goldberg, Peter Hornbeck, Augustin Luna, Peter Murray-Rust, Eric Neumann, Oliver Reubenacker, Matthias Samwald, Martijn van Iersel, Sarala Wimalaratne, Keith Allen, Burk Braun, Michelle Whirl-Carrillo, Kei-Hoi H. Cheung, Kam Dahlquist, Andrew Finney, Marc Gillespie, Elizabeth Glass, Li Gong, Robin Haw, Michael Honig, Olivier Hubaut, David Kane, Shiva Krupa, Martina Kutmon, Julie Leonard, Debbie Marks, David Merberg, Victoria Petri, Alex Pico, Dean Ravenscroft, Liya Ren, Nigam Shah, Margot Sunshine, Rebecca Tang, Ryan Whaley, Stan Letovksy, Kenneth H. Buetow, Andrey Rzhetsky, Vincent Schachter, Bruno S. Sobral, Ugur Dogrusoz, Shannon McWeeney, Mirit Aladjem, Ewan Birney, Julio Collado-Vides, Susumu Goto, Michael Hucka, Nicolas Le Novère, Natalia Maltsev, Akhilesh Pandey, Paul Thomas, Edgar Wingender, Peter D. Karp, Chris Sander, and Gary D. Bader. The BioPAX community standard for pathway data sharing. Nature biotechnology, 28(9):935–942, September 2010.
Larisa N. Soldatova and Ross D. King. An ontology of scientific experiments. Journal of The Royal Society Interface, 3(11):795–803, December 2006.
Gergely Héja, Péter Varga, Péter Pallinger, and György Surján. Restructuring the foundational model of anatomy. Studies in health technology and informatics, 124:755–760, 2006.
Gergely Héja, György Surján, Gergely Lukácsy, Péter Pallinger, and Miklós Gergely. GALEN based formal representation of ICD10. International Journal of Medical Informatics, 76(2-3):118–123, February 2007.
Dmitry Tsarkov and Ian Horrocks. FaCT++ Description Logic Reasoner: System Description. In Ulrich Furbach and Natarajan Shankar, editors, Automated Reasoning, volume 4130 of Lecture Notes in Computer Science, chapter 26, pages 292–297. Springer Berlin / Heidelberg, Berlin, Heidelberg, 2006.
Boris Motik, Rob Shearer, and Ian Horrocks. Hypertableau Reasoning for Description Logics. Journal of Artificial Intelligence Research, 36:165–228, 2009.
Matthew Horridge, Bijan Parsia, and Ulrike Sattler. Explaining Inconsistencies in OWL Ontologies. In Proceedings of the 3rd International Conference on Scalable Uncertainty Management, SUM ’09, pages 124–137, Berlin, Heidelberg, 2009. Springer-Verlag.
Aditya Kalyanpur, Bijan Parsia, Evren Sirin, and James Hendler. Debugging unsatisfiable classes in OWL ontologies. Web Semantics: Science, Services and Agents on the World Wide Web, 3(4):268–293, 2005.
Alan L. Rector. Modularisation of domain ontologies implemented in description logics and related formalisms including OWL. In K-CAP ’03: Proceedings of the 2nd international conference on Knowledge capture, pages 121–128, New York, NY, USA, 2003. ACM.
Alan Rector, Matthew Horridge, and Nick Drummond. Building Modular Ontologies and Specifying Ontology Joining, Binding, Localizing and Programming Interfaces in Ontologies Implemented in OWL. In Derek Sleeman and Mark Musen, editors, AAAI Spring Symposium on Symbiotic Relationships between Semantic Web and Knowledge Engineering, volume Technical Report SS-08-07, pages 69+, Menlo Park, California, 2008. AAAI, The AAAI Press.
Mikel Aranguren, Sean Bechhofer, Phillip Lord, Ulrike Sattler, and Robert Stevens. Understanding and using the meaning of statements in a bio-ontology: recasting the Gene Ontology in OWL. BMC Bioinformatics, 8(1):57+, February 2007.
Matthew Horridge and Peter F. Patel-Schneider. OWL 2 Web Ontology Language Manchester Syntax., October 2009.
Robert Stevens. Unicorns in my Ontology, May 2011.

By Allyson Lister

Find me at and

7 replies on “Background: What are ontologies? (Thesis 1.5)”

[…] However, while large amounts of data can be gathered and queried in this manner, the syntactic integration process is limited to resolving formatting differences without attempting to address semantic heterogeneity. While existing syntactic tools for model annotation are suitable for many tasks, semantic data integration methods can resolve differences in the meaning of the data and theoretically provide a more useful level of integration. A number of semantic integration approaches have been applied to many different research areas [6, Section 9], and are beginning to be used in the life sciences (see also Sections 1.6 and 1.5). […]

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s