Research reproducibility through GenePattern (ISMB DAM SIG 2009)

Jill Mesirov, Broad Institute

Jill starts with the example of classifying leukemias. You need to do this because there are very different therapeutic treatments for different types of leukemias. They built a weighted-learning predictor that helped with the classification, which was 100% successful. What was the methodology for this leukemia analysis? Gene expression dataset -> pre-processing -> gene selection (using marker genes) -> build a predictor (using many different cross-validation tests) -> model -> test set prediction results.

The computational foundation for genomic medicine: massive data sets, sophisticated computational algorithms and methods including visualization, software infrastructure for itneroperable informatics. Just as a web browser gives users access to the web, we have to do the same for bioinformatics data. You don’t need to be a mechanic to drive a car anymore, and neither should you have to be a bioinformatician to use data (Allyson’s minor comment at this point: you just need smart bioinformaticians to prepare things ahead of time!).

Infrastructure is needed for this to support both sophisticated and naive users. This is where GenePattern comes in. The client-server architecture includes both a web browser UI and a programming environment on the client side, and a variety of components on the server side including SOAP and databases. Features of GenePattern include: LSIDs; Module repository of 120+ modules for analysis, pipelines, and visualisation; analytic reproducibility; automatic module integrator; grid enablement; local and distributed computing; workshops for use as well as user documentation. With GenePattern, they created a reproducible research version of the method used back in 1999 for the project on the classification of leukemias.

GenomeSpace is a Web 2 commujnity for integrative genomics analysis. It’s a newly-funded NHGRI project, and partners include Cytoscape, UCSC Genome Browser, Galaxy and others. Research projects associated with it: regulatory networks in cancer stem cells, and characterization of large non-coding RNAs in mammals. There are a number of panels to use in the GenomeSpace UI. They want others to join the GenomeSpace community.

FriendFeed discussion:

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!


1 thought on “Research reproducibility through GenePattern (ISMB DAM SIG 2009)”

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s