Semantics and Ontologies

Attribution vs Citation: Do you know the difference?

This is a cross-posted, two-author item available both from my and Frank Gibson’s blog (his post).

Often the words “attribution” and “citation” are used interchangeably. However, in the context of ensuring your work gets the referencing it deserves when others make use of it, it is important that the differences between these two concepts are clear. This article outlines the differences between attribution and citation, and suggests that what most scientists are interested in is not attribution, which can be ensured via licensing restrictions, but instead citation, which is a much tougher nut to crack.

From xkcd, at
From xkcd, at

At ISMB last week, there were a number of conversations about the difference between attribution and citation. This topic was brought up again yesterday in a conversation between the authors of this post. It is an important distinction which is explored in this post.

First, some definitions for attribution and citation. These are not the only definitions possible, but for the purposes of this discussion, please keep these in mind.

Attribution: Acknowledgement of the use of someone else’s information, data, or other work. Crucially, while Wikipedia has a fairly straightforward definition of citation, it does NOT mention even common ways that attribution should be implemented (see Wikipedia attribution page).

Citation: When you publish a paper that makes use of someone else’s information (data, ontology, etc.), you include in that paper a reference to the work of that other person or group. Wikipedia states that it is a “reference to a published or unpublished source” whose prime purpose is of “intellectual honesty”.

Distinguishing between attribution and citation.
You can imagine that citation is a specific type of attribution, but attribution itself can be performed in any number of ways. For scientists, citation is much more useful to their careers as a result of the publish or perish environment.

So, what could attribution consist of? First, let’s take as an example the re-use of someone else’s ontology or specific sub-parts or classes of that ontology. Each class in an ontology is identified by a URI. Therefore, is importing the URL enough? With a URI is it clear where you got the class from? If it’s not enough, where do you put that reference or statement that you are re-using other classes: within the overall metadata of your own ontology? Alternatively, when attributing data is a reference to the originating paper or URL from where you downloaded the data enough? Where do you put that reference: within the metadata of your own document? As a citation? How much is enough attribution?

These questions cannot easily be answered.

A common-sense answer to the question of properly fulfilling requirements is to, at a minimum, first cite their information in your paper, and second include URL(s)/URI(s) in your metadata. But here we get to the crux of the matter: we’ve now stated that a useful way to ensure attribution is to cite the other person. But, if you think carefully, what’s more important for your impact assessments, and your work? It’s actually the citation itself. Sure, acknowledgement via extra referencing in the metadata of the person using your information is great, but what you really need is a citation in their work. If we aren’t careful, we will all make the easy mistake of conflating citation in papers with importing a licensed piece of information and how to mark its inclusion: the former is what we often are scored on and what we would really like, while the latter is the only thing a license enforces. Licensing with attribution requirements is not citation; you can make use of a licensed ontology, but this does not require you to cite it in a paper.

Attribution: the legal entity.

Important point: It’s easy to use a license such as the CC-BY, thinking that you’ll ensure citation, when in fact all you’re doing is ensuring attribution.

What are the implications of attribution? It can quickly get out-of-control and difficult to manage.
By requiring attribution in an ontology or data file, if someone imports information (such as a class from an ontology) into their own document, the new one must attribute the original. Continuing the ontology analogy, if there are 20-30 ontologies being used for a single project (which is not inconceivable in the coming years), there could be great difficulty in maintaining attribution for them all.

Important point: While licenses such as the CC-BY allow the attribution to be performed “in the manner specified by the author or licensor”, this could lead to 30 different licensors requiring potentially 30 different methods of attribution, and attribution stacking isn’t pretty.

Citation: the gentlemen’s club.

Can citation be assured? No. Well, maybe.
You can imagine citation as a gentlemen’s club, as propriety dictates that you should cite another’s work that you use, but there is no legal requirement to do so. Indeed, many believe that citation should not be enforced anyway. In contrast, attribution as required by licenses is a legal statement. However, let’s revisit the clause in CC-BY that states the author/licensor can specify the manner in which the attribution is given.

Important point: Could you use a license such as CC-BY, and state that the attribution must come in the form of, at a minimum, citation in the paper which describes the work being performed by the licensee?

Bottom line: which one is more important to you, as a scientist? Depends on the context.
This is difficult to answer. There aren’t very many guidelines available for us to analyse. The OBO Foundry does have a set of principles, the first of which states that “their [the ontology(ies) and their classes] original source is always credited and that after any external alterations, they must never be redistributed under the same name or with the same identifiers”. However, how this credit is attained is unclear, as described in various blog posts (Allyson, Frank, Melanie). As a result, the following conclusions came out of the OBO Foundry workshop this summer (Monday outcomes): it is “unclear if each ontology should develop their own bespoke license or use develop ‘CC-by’; how to give attribution? Generally use own judgment, here MIREOT mechanism can help when importing external terms into an ontology, giving class level attribution” (MIREOT web page, see also OWLED 2008 paper). Therefore, while they are aware of the problem, they don’t offer a consensus solution(s).

The flipside of this is that in order to use an ontology, you first have to write a paper and cite the classes you wish to import, then get on with the work. If you never get a paper and therefore a citation, is you ontology/data illegal? If you take the example of OBI, which imports several other ontologies and is an open community of developers, would a license restriction requiring citation actually prevent the work starting? This is probably a bit of a chicken-and-egg scenario, if it were ever to come a reality. In short, while there are some tempting possibilities, there doesn’t yet seem to be a useful solution.

In summary, it’s generally not attribution that people want (which can be licensed, even if you don’t like the layers of attribution that will require once you’re using multiple sources) but citation, which isn’t so easily licensed – yet. When deciding what sort of license to use (e.g. an open one like CC0 or an attribution-based one like CC-BY), you need to take into account expected usage. In some cases, for a leaf ontology, perhaps CC-BY is appropriate, as it isn’t intended to be imported by others, but you never know when your leaf will turn into something others import. Science Commons also believes that attribution is a very different beast, and shouldn’t be required when licensing data. They provided me with an answer to how to license ontologies recently that favored CC0.

So, if you really want citation and not attribution, consider an open license such as CC0 and make a gentlemanly (gentle-science-person-ly) request that if someone uses it AND publishes a paper on it, please cite it in the way you suggest. Alternatively, I’d be interested to hear if it would be possible to use an attribution-based license such as CC-BY and then require the attribution method be citation in a paper. Would this method work, and would it be polite? Your comments, please.

FriendFeed Discussion

Semantics and Ontologies Software and Tools Standards

Choosing a license for your ontology

Over on Friendfeed this week, I started a discussion (both in The Life Scientists room and in the Science 2.0 room) about ontologies and licensing them. I am creating a couple, and was trying to determine whether I should use some flavor of CC license or perhaps an LGPL license or similar. CC people say that their licenses shouldn’t be used for software. But is an ontology software, a document, data, or something else entirely? I feel that it is a model or representation of knowledge, and a way to conceptualize what you need to describe. That doesn’t really provide an answer, however. As Egon pointed out in the FF discussion, it has no real inputs or outputs, and as such isn’t software. However, reasoners can present logical inferences as outputs when the ontology is given as an input… The situation is tricky, and I suggest that you head over to FF to get an idea of what people are saying about it.

I also asked some Science Commons people (thanks to Frank for the idea) what they had decided to do for this situation. Here is their reply, and based on their thoughts, I think a CC license is definitely OK for ontologies, and I will choose among them according to the policies of my boss and my university! Thanks to SC for the help, and for their permission to reproduce their thoughts:

Whether an ontology qualifies for copyright protection under U.S. Copyright law depends on whether it contains a sufficient degree of creative expression. For example, an ontology that draws entirely on facts or ideas in the public domain would not qualify for copyright protection. While there does not appear to be any legal cases that directly address the issue of copyright protection for ontologies, there have been some cases in medical ontologies (particularly in medical procedure coding) that have upheld copyright claims in classification schemes that might resemble ontologies.

Thus, the determination of whether an ontology qualifies for copyright protection may require case-by-case analysis. For sharing ontologies in a community or publicly, it would be prudent to think about copyright and licensing. For example, the ontology creator could say that “to the extent I may have copyright in my ontology, I license it in the following way.” In that way, she can reassure the community that even in the event copyright is later found to exist, they may rely upon her offer of a license. This provides an important “safety net” for the community of users, given the uncertainty about whether a given ontology may be copyrightable.

There are several reasonable ways to license ontologies. But it must be kept in mind that an important goal of publicly shared ontologies is to foster community involvement (which necessitates granting rights to modify and extend the ontology) and interoperability (we want to avoid license conflicts in the future if ontologies have to be combined or made to interoperate). The best way to avoid license conflicts is to place an ontology in the public domain—that is, to release it without restrictions. This can be done using CC0 ( This gives users maximum freedom and ensures maximum compatibility with all other licenses.

However, some creators may want to retain rights of attribution. In that case, they may make use of licenses that require attribution only, such as the Creative Commons Attribution license ( The drawback of this license is that since attribution is mandated, it may over time become more of a burden than a benefit (because as the list of contributors grows very large, the attribution requirements results in “attribution stacking”, where the number of people who need attribution become so large that it becomes not only meaningless but also a significant administrative and legal burden on future users).

For creators of ontologies who are concerned about protecting the quality of their distribution, trademark may offer an alternative form of protection. Unlike copyright, trademark does not protect the work itself, but it protects the “branding” of the work. An analogy would be while everyone can offer a different distribution of Linux, only Red Hat, Inc. can claim to offer “Red Hat” Linux. Thus, branding is used to protect the quality and integrity of the product, rather than copyright control.

Update: There is additional talk on this post at FF, including a post by Pierre about the UMLS Metathesaurus. Basically what you need to do is follow the 3 FF links I’ve provided to keep up-to-date, as comments on this post will post likely happen there rather than here! 🙂