Standards are hugely dependent on their respective communities for reqs gathering, develppment, testing, uptake by stakeholders. In modeling the biosciences there are are a few generic features such as description of source material and experimental design components. Then there are biologically-delineated and technologically-delineated views of the world. These views are still common across many different areas of the life sciences. Much of it can fall under an ISA (Investigation-Study-Assay) structure.
You should then use three types of standards: syntax (images of FuGE, ISA-TAB etc), semantics, and scope. MIBBI is all about scope. How well are things working? Well, there is still separation, but things are getting better. There aren’t many carrots, though there are some sticks for using these standards. Why do we care about standards? Data exchange, comprehensibility, and scope for reuse. Many funders (esp public funders) are now requiring data sharing or ability for data storage and exchange.
“Metaprojects”: FuGE, OBI, ISA-TAB – draw together many different domains and present in structure/semantics useful across all. Many of the “MI” (Minimum information guidelines) are developed independently, and are sometimes defunct. It’s also hard to track what’s going on in these projects, can be redundant, difficult to obtain an overview of the full range of checklists. When the MI projects overlap, arbitrary decisions on wording and substructuring make integration difficult. This makes it hard to take parts of different guidelines – not very modular. Enter MIBBI. Two distinct goals: portal (registry of guidelines) and foundry (integration and modularization).
There’s lots of enthusiasm for the project (drafters, users, funders, journals). MIBBI raises awareness of various checklists and promotes gradual integration of checklists. Nature Biotechnology 26, 889 – 896 (2008) doi:10.1038/nbt0808-889 for the paper. He’s performed clustering and analysis of the different guidelines: displayed MIs in cytoscape and in fake phylogenetic tree. By the end of the year they’ll have a shopping-basked based tool, MICheckout, to get all concepts together and then you get your own specialized checklist as output. You can make use of isacreator and its configuration to set mandatory parameters etc.
The objections to fuller reporting. Why should I share? funders and publishers are starting to require a bare minimum of metadata – and researchers will just do the bare minimum then, however. Some people think that this is just a ‘make work’ scheme for bioinformaticians, or that bioinformaticians are parasitic. Some people don’t trust what others have done, but then that’s what the reporting guidelines are for in the first place – so you can figure out if you should trust it. Problems of quality are justified to an extent, but what of people lacking resource for large-scale work, or people who want to refer to proteomics data but don’t do proteomics? How should they follow theese guidelines? Perception is that there is no money for this, and no mature free tools, and worries about vendor support. Vendors will support what researchers say they need.
Credit: data sharing is more or less a given now, and need central registries of data sets that can record reuse (also openids, DOIs for data). Side benefits and challenges include clearing up problems with paper authorship wrt reporting who’s done which bit. Would also enable other kinds of credit, and may have to be self-policing. Finally, the problem of micro data sets and legacy data. Example of the former is EMBL entries – when searching against EMBL, you’re using the data in some way, even if you don’t pull it out for later analysis.
Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!