Open source data warehouse (ISMB DAM SIG 2009)

Gos Micklem, Cambridge University

It is a query-optimised data warehouse system. They’ve been working on this since 2002. It’s the main app for the modENCODE project, FlyMine, and others. Performance optimisation is separate from the schema design. A query optimiser intercepts the queries and essentially pre-computing things using views. That way you can still have your normalized data underneath. You can add precomputed tables at any time, and adapt performance to actual use. He describes himself as an auto-generation fundamentalist. That is, the object model is defined by the XML file. The data model is used to generate the schema and the Java classes. This means it uses model-driven architecture.

They have a QueryBuilder app to help the user build queries for the system, composed of a number of sections: model browser, query summary, constraint editor, order output columns, and set sort order. They also have template queries available, which automatically create RESTful services. There are graphical widgets for chromosome distribution, tissue expression distribution, expression by developmental stage. There are Enrichment widgest for GO terms and others. While they have a Java client API now, an additional Perl API is coming soon.

InterMine itself actually performs the creation of the java/database schema etc, rather than making use of a third-party app like AndroMDA. They have also written their own object-relational mapper rather than using Hibernate. The data model itself is either UML or is converted to UML in some way – the speaker was more a biologist than a programmer, which is fair enough!

This talk was mainly annotated screenshots, which are quite useful for describing a system like this.

FriendFeed discussion: http://ff.im/4uShd

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s