GOclasses: molecular function as viewed by proteins (ISMB Bio-Ont SIG 2009)

Daniel Faria et al.

If you look at GO classes, the distribution of protein functions follows a power law. GOclasses can help identify incomplete and inconsistent annotations. They devised some strategies to analyse things: Information content (IC), and id GOclasses with generic terms and id the primary term of a GOclass; conditional probability to id potential implicity relations between terms; use semantic similarity to id similar GOclasses.

The issue with IC  based on annotation frequency is that it is biased by popularity in nature. Other methods also have problems. Most classes have a maximum IC of between 50-60%, but 87% of the GOclasses have at least one very specific primary term. Inconsistent GOclasses often correspond to cases of implicit relationships. These could be formalized in GO or set as annotation guidelines to improve the consistency of new annotations. Formalizing implicit relationships will mean less terms are required to describe a given function.

FriendFeed Discussion: http://ff.im/4xgmQ

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s