Over on Friendfeed this week, I started a discussion (both in The Life Scientists room and in the Science 2.0 room) about ontologies and licensing them. I am creating a couple, and was trying to determine whether I should use some flavor of CC license or perhaps an LGPL license or similar. CC people say that their licenses shouldn’t be used for software. But is an ontology software, a document, data, or something else entirely? I feel that it is a model or representation of knowledge, and a way to conceptualize what you need to describe. That doesn’t really provide an answer, however. As Egon pointed out in the FF discussion, it has no real inputs or outputs, and as such isn’t software. However, reasoners can present logical inferences as outputs when the ontology is given as an input… The situation is tricky, and I suggest that you head over to FF to get an idea of what people are saying about it.
I also asked some Science Commons people (thanks to Frank for the idea) what they had decided to do for this situation. Here is their reply, and based on their thoughts, I think a CC license is definitely OK for ontologies, and I will choose among them according to the policies of my boss and my university! Thanks to SC for the help, and for their permission to reproduce their thoughts:
Whether an ontology qualifies for copyright protection under U.S. Copyright law depends on whether it contains a sufficient degree of creative expression. For example, an ontology that draws entirely on facts or ideas in the public domain would not qualify for copyright protection. While there does not appear to be any legal cases that directly address the issue of copyright protection for ontologies, there have been some cases in medical ontologies (particularly in medical procedure coding) that have upheld copyright claims in classification schemes that might resemble ontologies.
Thus, the determination of whether an ontology qualifies for copyright protection may require case-by-case analysis. For sharing ontologies in a community or publicly, it would be prudent to think about copyright and licensing. For example, the ontology creator could say that “to the extent I may have copyright in my ontology, I license it in the following way.” In that way, she can reassure the community that even in the event copyright is later found to exist, they may rely upon her offer of a license. This provides an important “safety net” for the community of users, given the uncertainty about whether a given ontology may be copyrightable.
There are several reasonable ways to license ontologies. But it must be kept in mind that an important goal of publicly shared ontologies is to foster community involvement (which necessitates granting rights to modify and extend the ontology) and interoperability (we want to avoid license conflicts in the future if ontologies have to be combined or made to interoperate). The best way to avoid license conflicts is to place an ontology in the public domain—that is, to release it without restrictions. This can be done using CC0 (http://creativecommons.org/license/zero). This gives users maximum freedom and ensures maximum compatibility with all other licenses.
However, some creators may want to retain rights of attribution. In that case, they may make use of licenses that require attribution only, such as the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/). The drawback of this license is that since attribution is mandated, it may over time become more of a burden than a benefit (because as the list of contributors grows very large, the attribution requirements results in “attribution stacking”, where the number of people who need attribution become so large that it becomes not only meaningless but also a significant administrative and legal burden on future users).
For creators of ontologies who are concerned about protecting the quality of their distribution, trademark may offer an alternative form of protection. Unlike copyright, trademark does not protect the work itself, but it protects the “branding” of the work. An analogy would be while everyone can offer a different distribution of Linux, only Red Hat, Inc. can claim to offer “Red Hat” Linux. Thus, branding is used to protect the quality and integrity of the product, rather than copyright control.
Update: There is additional talk on this post at FF, including a post by Pierre about the UMLS Metathesaurus. Basically what you need to do is follow the 3 FF links I’ve provided to keep up-to-date, as comments on this post will post likely happen there rather than here! 🙂