Daniel Faria et al.
If you look at GO classes, the distribution of protein functions follows a power law. GOclasses can help identify incomplete and inconsistent annotations. They devised some strategies to analyse things: Information content (IC), and id GOclasses with generic terms and id the primary term of a GOclass; conditional probability to id potential implicity relations between terms; use semantic similarity to id similar GOclasses.
The issue with IC based on annotation frequency is that it is biased by popularity in nature. Other methods also have problems. Most classes have a maximum IC of between 50-60%, but 87% of the GOclasses have at least one very specific primary term. Inconsistent GOclasses often correspond to cases of implicit relationships. These could be formalized in GO or set as annotation guidelines to improve the consistency of new annotations. Formalizing implicit relationships will mean less terms are required to describe a given function.
FriendFeed Discussion: http://ff.im/4xgmQ
Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!