I live blogged Cameron Neylon‘s talk today at Newcastle University, and I did it in a Wave. There were a few pluses, and a number of minuses. Still, it’s early days yet and I’m willing to take a few hits and see if things get better (perhaps by trying to write my own robots, who knows?). In effect, today was just an exercise, and what I wrote in the Wave could have equally well been written directly in this blog.
(You’ll get the context of this post if you read my previous post on trying to play around with Google Wave. Others, since, have had a similar experience to mine. Even so, I’m still smiling – most of the time 🙂 )
Pluses: The Wave was easy to write in, and easy to create. It was a very similar experience to my normal WordPress blogging experience.
Minuses: I wanted to make the Wave public from the start, but have yet to succeed in this. Adding firstname.lastname@example.org or email@example.com just didn’t work: nothing I tried was effective. Also, the copying and pasting simply failed to work when copying the content of the Wave from Iron into my WordPress post in Firefox: while I could copy into other windows and editors, I simply couldn’t copy into WordPress. When I logged into Wave via Firefox, the copy-and-paste worked, but automatically included the highlighting that occurred due to my selecting the text, and then I couldn’t un-highlight the wave! What follows is a very colorful copy of my notes. I’ve removed the highlighting now, to make it more readable.
I’d like to embed the Wave here directly. In theory, I can do this with the following command:
Unfortunately, it seems this Wavr plugin is not available via the wordpress.com setup. So, I’ll just post the content of the Wave below, so you can all read about Cameron Neylon’s fantastic presentation today, even if my first experiment in Wave wasn’t quite what I expected. Use the Wave id above to add this Wave to your inbox, if you’d like to discuss his presentation or fix any mistakes of mine. It should be public, but I’m having some issues with that, too!
Cameron Neylon’s talk on Capturing Process and Science Online. Newcastle University, 15 October 2009.
Please note that all the mistakes are mine, and no-one else’s. I’m happy to fix anything people spot!
We’re either on top of a dam about to burst, or under it about to get flooded. He showed a graph of data entering GenBank. Interestingly, the graph is no longer exponential, and this is because most of the sequence data isn’t goinginto GenBank, but is being put elsehwere.
The human scientist does not scale. But the web does scale! The scientist needs help with their data, with their analysis etc. They’ll go to a computer scientist to help them out. The CS person gives them a load of technological mumbo jumbo that they are suspicious of. What they need is someone to interpolate the computer stuff and the biologist. They may try an ontologist, however, that also isn’t always too productive: the message they’re getting is that they’re being told how to do stuff, which doesn’t go down very well. People are shouting, but not communicating. This is because all the people might want different things (scientists want to record what’s happening in the lab, the ontologist wants to ensure that communication works, and the CS person wants to be able to take the data and do cool stuff with it).
Scientists are worried that other people might want to use their work. Let’s just assume they think that sharing data is exciting. Science wants to capture first and communicate second, ontologists want to communicate, and CS wants to process. There are lots of ways to publish on the web, in an appropriate way. However, useful sharing is harder than publishing. We need the agreed structure to do the communication, because machines need structure. However, that’s not the way humans work: humans tell stories. We’ve created a disconnect between these two things. The journal article is the story, but isn’t necessarily providing access to all the science.
So, we need to capture research objects, publish those objects, and capture the structure through the storytelling. Use the MyTea project as a example/story: a fully semantic (RDF-backed) laboratory record for synthetic chemistry. This is a structured discipline which has very consistent workflows. This system was tablet-based. It is effective and is still being used. However, what it didn’t work for was molecular biology / bioengineering etc — a much wider range of things than just chemistry. So Cameron and others got some money to modify the system: take MyTea (highly structured and specific system) and extend it into molecular biology. Could they make it more general, more unstructured? One thing that immediately stands out for unstructured/flexible is blogs. So, they thought that they could make a blog into a lab notebook. Blogs already have time stamps and authors, but there isn’t much revision history therefore that got built into the new system.
However, was this unstructured system a recipe for disaster? Well, yes it is — to start with. What warrants a post, for example? Should a day be one post? An experiment? There was little in the way of context or links. People who also kept a physical lab book ended up having huge lists of lab book references. So, even though there was a decent amount of good things (google indexing etc) it was still too messy. However, as more information was added, help came from an unexpected source: post metadata. They found that pull-down menus for templates were being populated by the titles of the posts. They used the metadata from the posts and used that to generate the pull-down menu. In the act of choosing that post, a link is created from that post to the new page made by the template. The templates depend on the metadata, and because the templates are labor saving, users will put in metadata! Templates feed on metadata, which feed the templates, and so on: a reinforcing system.
An ontology was “self-assembled” out of this research work and the metadata used for the templates. Their terms were compared to the Sequence Ontology and found some exact matches and some places where they identified some possible errors in the sequence ontology (e.g. conflation of purpose into one term). They’re capturing first, and then the structure gets added afterwards. They can then map their process and ontologies onto agreed vocabularies for the purpose of a particular story. They do this because we want to communicate to other communities and researchers that are interested in their work.
So, you need tools to do this. Luckily, there are tools available that exploit structure where it already exists (like they’ve done in their templates, aka workflows). You can imagine instruments as bloggers (take the human out of the loop). However, we also need tools to tell stories: to wire up the research objects into particular stories / journal articles. This allows people who are telling different stories to connect to the same objects. You could aggregate a set of web objects into one feed, and link them together with specific predicates such as vocabs, relationships, etc. This isn’t very narrative, though. So, we need tools that interact with people while they’re doing things – hence Google Wave.
An example is Igor, the Google Wave citation robot. You’re having a “conversation” with this Robot: it’s offering you links, choices, etc while having it look and feel like you’re writing a document. Also is the ChemSpider Robot, written by Cameron. Here, you can create linked data without knowing you’ve done it. The Robots will automatically link your story to the research objects behind it. Robots can work off of each other, even if they aren’t intended to work together. Example: Janey-robot plus Graphy. If you pull the result from a series of robots into a new Wave, the entire provenance from the original wave is retained, and is retained over time. Workflows, data, or workflows+data can be shared.
Where does this take us? Let’s say we type “the new rt-pcr sample”. The system could check for previous rt-pcr samples, and choose the most recent one to link to in the text (after asking them if they’re sure). As a result of typing this (and agreeing with the robot), another robot will talk to a MIBBI standard to get the required minimum information checklist and create a table based on that checklist. And always, adding links as you type. Capture the structure – it’s coming from the knowledge that you’re talking about a rt-pcr reaction. This is easier than writing out by hand. As you get a primer, you drop it into your database of primers (which is also a Wave), and then it can be automatically linked in your text. Allows you to tell a structured story.
We’re not getting any smarter. Experimentalists need help, and many recognize this and are hoping to use these new technologies. To provide help, we need structure so machines can understand things. However, we need to recognize and leverage the fact that humans tell stories. We need to have structure, but we need to use that structure in a narrative. Try to remember that capturing and communication are two different things.