How can socks facilitate scientific outreach?

OK, it seems odd, at least on the surface. How can socks help science generally, and science outreach specifically? I asked myself the same question a few months ago when I found an email lurking in my inbox, hidden since just before my maternity leave started. It seems a sock company called Sock It To Me socks featured a different Cool Girl each month, and they wished to feature me. I had a lot of questions. Was this for real? Was it an appropriate platform to be talking about myself, and about what I do? Would it seem as if I was trading my online work persona for socks?

Well, the other Cool Girls’ profiles seemed eclectic and interesting: dancers, astrophysicists, mathematicians and many others. So, not bad company to be in, and it was a genuine request. And, before you ask, I will be getting two pairs of socks for my efforts – ah, temptation. But ultimately, I need to take care of my work/public online persona, and I had to decide whether this was a good addition. But then I realized I could talk about ontologies to people who may have never even heard of bioinformatics and, for me, that was too exciting an opportunity to miss. True, it was limited to 600 words, and a writer used the information I gave her to write the final piece, but I think it was all worth it. She did a great job, and within the confines of the article format, I’m happy about how my field of research is portrayed. I really feel strongly about science outreach, and I do think that novel methods of information dissemination shouldn’t necessarily be ruled out.

So, here it is: Ms Cool Girl of the Month, July 2011. What do you think? Did I benefit science or just myself (well, maybe not just myself – I namechecked my high school biology teacher, and mentioned Cameron too)?

Social Networking and Guidelines for Life Science Conferences

ResearchBlogging.org
I had a great time in Sweden this past summer, at ISMB 2009 (ISMB/ECCB 2009 FriendFeed room). I listened to a lot of interesting talks, reconnected with old friends and met new ones. I went to an ice bar, explored a 17th-century ship that had been dragged from the bottom of the sea, and visited the building where the Nobel Prizes are handed out.

While there, many of us took notes and provided commentary through live blogging either on our own blogs or via FriendFeed and Twitter. The ISCB were very helpful, having announced and advertised the live blogging possibilities prior to the event. Once at the conference, they provided internet access, and even provided extension cords where necessary so that we could continue blogging on mains power.

Those of us who spent a large proportion of our time live blogging were asked to write a paper about our experiences. This quickly became two papers, as there were two clear subjects on our minds: firstly, how the live blogging went in the context of ISMB 2009 specifically; and secondly, how our experiences (and that of the organisers) might form the basis of a set of guidelines to conference organisers trying to create live blogging policies. The first paper became the conference report, a Message from ISCB published today in PLoS Computational Biology. This was published in conjunction with the second paper, a Perspective published jointly today in PLoS Computational Biology, that aims to help organisers create policies of their own. Particularly, it provides “top ten”(-ish) lists for organisers, bloggers and presenters.

So, thanks again to my co-authors:
Ruchira S. Datta: Blog FriendFeed
Oliver Hofmann: Blog FriendFeed Twitter
Roland Krause: Blog FriendFeed Twitter
Michael Kuhn: Blog FriendFeed Twitter
Bettina Roth
Reinhard Schneider: Blog FriendFeed
(you can find links to my social networking accounts on the About page on this blog)

If you have any questions or comments about either of these articles, please comment on the PLoS articles themselves, so there can be a record of the discussion.

Lister, A., Datta, R., Hofmann, O., Krause, R., Kuhn, M., Roth, B., & Schneider, R. (2010). Live Coverage of Scientific Conferences Using Web Technologies PLoS Computational Biology, 6 (1) DOI: 10.1371/journal.pcbi.1000563

Lister, A., Datta, R., Hofmann, O., Krause, R., Kuhn, M., Roth, B., & Schneider, R. (2010). Live Coverage of Intelligent Systems for Molecular Biology/European Conference on Computational Biology (ISMB/ECCB) 2009 PLoS Computational Biology, 6 (1) DOI: 10.1371/journal.pcbi.1000640

Ontogenesis: rapid reviewing and publishing of articles on semantics and ontologies

What happens when an ontologist or two gets frustrated at the long-scale publication process that is the norm when publishing scientific books? You get Ontogenesis: a quick-turnaround, low-maintenance solution using WordPress blog software. Next, a bunch of other ontologists are invited to a two-day, intensive writing and peer-reviewing workshop and the initial content is created. Result? Well, my favourite result was Kendall Clark tweeting this: “#Ontogenesis is awesome: http://ontogenesis.knowledgeblog.org/“.

What is Ontogenesis?

Phil Lord had the idea, and together with Robert Stevens and others organised the 2-day Ontogenesis workshop that occurred last week, 21-22 January 2010. Why look around for an alternative to traditional publishing methods? When writing a book, accepting the invitation might take 5 minutes, but getting around to doing it can take 6 months or more. You may only spend a couple of days writing the article, but then need to wait months for reviews (and do reviews for the other authors’ articles). Then there is the formatting and camera-ready copy. Then you may wait many months for proofs and then only get a few days to make corrections. Then, you can wait year or so for actual publication, by which time it is possibly out of date. Not ideal, but still necessary for some forms of publishing.

There are a number of benefits to using blog software, and to the Ontogenesis model in general:

  • stable, permanent URLs: Permanent URLs for articles and peer reviews. DOIs have been discussed as well, and are being considered.
  • automatic linking of peer reviews and related online articles. The WordPress software automatically adds trackbacks, pingbacks, etc. as comments on the relevant articles, making it easy for interested readers to visit the peer reviews written for that article.
  • completely open review system. Unlike many peer-review systems in use today, the reviewer (publicly) publishes his/her article in Ontogenesis.
  • less work and quick turnaround time for the editors, reviewers, and authors. Once you have written your article (in whatever format you like, other than a few broad suggestions about licensing and intention), you publish it as “Uncategorised” in the system, and then once reviewers have agreed to look at it, move it to “Under Review”. Once reviews are complete, and the editors have checked everything, it is moved to “Reviewed”. Pretty simple.

A blog that isn’t a blog

But is Ontogenesis a blog? Not really. Is it a book? Not in the traditional sense. While it seems to be correct to call it a blog, how the blog software is being used isn’t the way many people use it. And, though Duncan has called it “blogging a book”, this isn’t quite right either: while content, once completed, will not be changed, new content will be continually added. Phil discussed this point in his introduction to the workshop. He stated that wikis are best suited for certain styles of articles, but not for this sort of infrequently-updated information. Further, in wikis in general, crediting is poor. Google Knol is a nice idea, but not many people are using it. If it’s just a plain website, then there is no real way to have (and to show, more importantly), peer review.

To me, and to the general agreement of the people at the workshop, Ontogenesis can be viewed as a title/proper noun, in the same way as Nature is a title of a journal. Ontogenesis is the first of a class of websites called Knowledge Blogs. It is has more in common with the high-quality, article-style blogging of ScienceBlogs or Research Blogging than it does with the short, informal blogging style that is used by most bloggers. Each article stands on its own, is of a high quality, and describes a topic of interest to both ontologists and novices in the ontology world. Each article is aimed at a general life science readership, ensuring accessibility of knowledge and broad appeal.

My experiences as a contributor

I was lucky enough to be invited to the workshop last week, and had a great time. After an introductory set of presentations, we all got started writing our articles. The idea was that, once written, each article would be peer reviewed by at least 2 others at the workshop. Once the peer reviews were complete, the article would be re-categorised from “Under Review” to “Review”. As Phil said in a recent blog post, we wrote a large number of articles, though the number that have gone through the full review process was not as high. We expect that over the next few days, the number of completed articles will rise.

My article on Semantic Integration in the Life Sciences was the first to come out of peer review. Thanks are very much due to Frank Gibson, Michel Dumontier, and David Shotton for their peer reviewing and constructive criticism: it is a much better article for their input. I also reviewed a couple of articles (1,2) by Helen Parkinson and James Malone, which should be moved to a Reviewed status soon.

Ok, but what’s the downside?

Well, it is new, and there are some kinks to work out. This workshop highlighted a number of them, such as the difficulty people unfamiliar with WordPress had using its UI. Sean has posted a useful summary of his thoughts on the pluses and minuses, which I encourage you to have a read of and comment on. Here are a few thoughts on how to improve the experience in future, as mentioned during the meeting:

  • Enable the plugin for autogenerating related articles to improve cross-links.
  • The Table of Contents has been started, but different “pathways” for different intended readerships to help guide them through the articles would be helpful.
  • Reviewers should be able to change Categories in any article so they can mark when it is Under Review, rather than waiting for the Authors to do this.
  • The article-specific Table of Contents are very helpful, but it might be better to move it to different location in the post (e.g. the top rather than the bottom).
  • Have a way to mark yourself as willing to accept papers to review, for instance if you have some time in your schedule that week: authors could then preferentially choose you.
  • The ability for your name in the byline of an article to link to your profile on Ontogenesis. Currently, the profiles are private and some authors have put their profiles into the article text as a temporary alternative.
  • Add the Stats wordpress plugin.
  • Comments do not have the author of the comment within them, e.g. pingbacks to reviews have to be clicked through to find out who wrote the review.
  • Dealing with references/citations will be done better in future, when an appropriate plugin is found. Currently, basic HTML links to DOIs is the standard way to go.

Conclusions? Be an author yourself, and try it out!

This method of publishing is new, interesting, and quick. If you have a topic you’d like to write about, are interested in peer reviewing, or are just interested in reading the articles then please visit Ontogenesis and have a go, and then let us know what you think!

Please note: as mentioned in the main text, I am one of the authors of articles and peer reviews in Ontogenesis.

Social filtering of scientific information – a view beyond Twitter

ResearchBlogging.org

It’s not information overload, it’s filter failure. (Clay Shirky)

Bonetta (2009) gave an excellent introduction to the micro-blogging service Twitter and its uses and limitations for scientific communication. We believe that other social networking tools merit a similar introduction, especially those that provide more effective filtering of scientifically relevant information than Twitter. We find that FriendFeed (already mentioned in the first online comment on the article, by Jo Badge) shares all of the features of Twitter but few of its limitations and provides many additional features valuable for scientists. Bonetta quotes Jonathan Weissman, a Howard Hughes Medical Institute investigator at the University of California, San Francisco: “I could see something similar to Twitter might be useful as a way for a group of scientists to share information. To ask questions like ‘Does anyone have a good antibody?’ ‘How much does everyone pay for oligos?’ ‘Does anyone have experience with this technique?'” It is precisely for such and many more purposes that scientists use FriendFeed, which allows the collection of many kinds of contributions, not just short text messages.

Also in contrast to Twitter, comments to each contribution are archived in that context (and without a time limit), providing a solid base for fruitful, threaded discussions. In your user profile, you can choose to aggregate any number of individual RSS or Atom feeds‘, including scientific publications you bookmark in your online reference manager (e.g. CiteULike or Connotea), your blog entries, social bookmarks (Google Reader, del.icio.us, etc.), and Tweets; and any other items you wish to post directly to your feed. You then look for other users whose profile is relevant to your work and subscribe to them. Every individual item posted in your subscriptions will then appear on your personalized FriendFeed homepage, plus optionally a configurable subset of the feeds you subscribed to. You can choose to bookmark (‘like‘) any of these items (Facebook copied this ‘like’ functionality just before it bought FriendFeed), comment on them, and share discussion threads in various ways.

At first, this aggregation of information and threaded discussions might seem daunting. However, the stream of information can be channeled by organizing it into separate sub-channels (‘lists’; similar to but more versatile than ‘folders’ in email), according to your personal preferences (e.g. one for search alerts). In addition to individual users, you can also subscribe to rooms that revolve around particular topics. For example, the “The Life Scientistsroom currently has 1,267 members and imports one feed.

The feature that makes FriendFeed truly useful is its social filtering system. Active discussions move to the top of your FriendFeed homepage with each new addition, which automatically brings them to the attention of you and everyone else who reads those feeds. In a sense, the most current and the most popular entries compete for attention at the top, making notifications unnecessary. This means that your choice of both rooms and subscriptions affects and filters the content you see. In that way, for instance, you could set your preferences such that you would only see papers with a certain minimum number of ‘likes’ among your colleagues. Alternatively, you can opt to hide items with zero likes or comments, ensuring that only those that someone found interesting will reach you. Thanks to a very fine-grained search functionality, threads also remain easily retrievable.

Some of the synergistic effects of the many scientists interacting on FriendFeed are already apparent at this early stage of adoption. FriendFeed provides a convenient way to microblog from conferences by means of dedicated threads or discussion rooms created for the event, thus allowing to share comments within and across sessions, or even with people not physically present at the meeting. Such conference coverage has even received direct (e.g. ISMB09 , BioSysBio09 ) or indirect (e.g. ISMB08 ) support from the conference organizers.

Above and beyond conference coverage, scientists use FriendFeed to share papers, experiences on laboratory equipment, resources for teaching, or anything else commonly asked at mailing lists. A number of real-world scientific collaborations have already been sparked from such interactions. Collaborative grant proposals have been initiated, submitted and some of them approved after the idea was passed around and discussed on FriendFeed. Several bioinformatics problems have been solved by code-sharing and advice. Articles in scientific journals have been published by FriendFeed users after meeting and discussing on the platform [1-5].

Of course, since FriendFeed was not designed for scientists, there is room for improvement in terms of usability for scientific purposes. For instance, files can only be uploaded upon starting a thread, not while commenting on it, and there is currently no functionality which infers a measure of reputation to a user from his/her contributions (though the wide-spread use of real names somewhat allows that to be imported). As with all online contributions, citability and long-term archiving are unresolved issues, as is the permanence of services whose source code is not public. Fortunately, the development of social networks tailored to the needs of scientists is actively being pursued from various angles. The Polymath projects , in which researchers collaborate online to solve mathematical problems, provide a number of examples. The recent award of two NIH grants of over $US10M each for exactly such purposes is another. Ultimately, the continued enthusiastic adoption of the sophisticated variants of social filtering tools by a broad community of researchers interested in sharing their science will only increase the usefulness for and thus the capabilities of the online scientific community.

References:

Bonetta, L. (2009). Should You Be Tweeting? Cell, 139 (3), 452-453 DOI: 10.1016/j.cell.2009.10.017


1 Lister, A., Charoensawan, V., De, S., James, K., Janga, S. C. C., Huppert, J.,   2009. Interfacing systems biology and synthetic biology. Genome biology. 10 (6), 309+. http://genomebiology.com/2009/10/6/309
2 Saunders N, Beltr‹o P, Jensen L, Jurczak D, Krause R, et al. (2009) Microblogging the ISMB: A New Approach to Conference Reporting. PLoS Comput Biol 5(1): e1000263. http://dx.doi.org/10.1371/journal.pcbi.1000263
3 Neylon C, Wu S (2009) Article-Level Metrics and the Evolution of Scientific Impact. PLoS Biol 7(11): e1000242. http://dx.doi.org/10.1371/journal.pbio.1000242
4 Daub J, Gardner PP, Tate J, Ramskšld D, Manske M, Scott WG, Weinberg Z, Griffiths-Jones S, Bateman A. (2008): The RNA WikiProject: community annotation of RNA families. RNA. 14(12):2462-4 http://dx.doi.org/10.1261/rna.1200508
5. Huss & al. The Gene Wiki: community intelligence applied to human gene annotation. http://dx.doi.org/10.1093/nar/gkp760

Acknowledgment: This comment has received input from a number of FriendFeed users, as detailed in this thread, and was jointly blogged today by Björn Brembs (FriendFeed; blog post), Allyson Lister (FriendFeed; this blog post) and Daniel Mietchen (FriendFeed; blog post).

Live blogging with Wave: not so live when you can’t make the Wave public

I live blogged Cameron Neylon‘s talk today at Newcastle University, and I did it in a Wave. There were a few pluses, and a number of minuses. Still, it’s early days yet and I’m willing to take a few hits and see if things get better (perhaps by trying to write my own robots, who knows?). In effect, today was just an exercise, and what I wrote in the Wave could have equally well been written directly in this blog.

(You’ll get the context of this post if you read my previous post on trying to play around with Google Wave. Others, since, have had a similar experience to mine. Even so, I’m still smiling – most of the time 🙂 )

Pluses: The Wave was easy to write in, and easy to create. It was a very similar experience to my normal WordPress blogging experience.

Minuses: I wanted to make the Wave public from the start, but have yet to succeed in this. Adding public@a.googlewave.com or public@a.gwave.com just didn’t work: nothing I tried was effective. Also, the copying and pasting simply failed to work when copying the content of the Wave from Iron into my WordPress post in Firefox: while I could copy into other windows and editors, I simply couldn’t copy into WordPress. When I logged into Wave via Firefox, the copy-and-paste worked, but automatically included the highlighting that occurred due to my selecting the text, and then I couldn’t un-highlight the wave! What follows is a very colorful copy of my notes. I’ve removed the highlighting now, to make it more readable.

I’d like to embed the Wave here directly. In theory, I can do this with the following command:

[wave id=”googlewave.com!w%252BtZ-uDfrYA.2″]

Unfortunately, it seems this Wavr plugin is not available via the wordpress.com setup. So, I’ll just post the content of the Wave below, so you can all read about Cameron Neylon’s fantastic presentation today, even if my first experiment in Wave wasn’t quite what I expected. Use the Wave id above to add this Wave to your inbox, if you’d like to discuss his presentation or fix any mistakes of mine. It should be public, but I’m having some issues with that, too!

Cameron Neylon’s talk on Capturing Process and Science Online. Newcastle University, 15 October 2009.

Please note that all the mistakes are mine, and no-one else’s. I’m happy to fix anything people spot!

We’re either on top of a dam about to burst, or under it about to get flooded. He showed a graph of data entering GenBank. Interestingly, the graph is no longer exponential, and this is because most of the sequence data isn’t goinginto GenBank, but is being put elsehwere.

The human scientist does not scale. But the web does scale! The scientist needs help with their data, with their analysis etc. They’ll go to a computer scientist to help them out. The CS person gives them a load of technological mumbo jumbo that they are suspicious of. What they need is someone to interpolate the computer stuff and the biologist. They may try an ontologist, however, that also isn’t always too productive: the message they’re getting is that they’re being told how to do stuff, which doesn’t go down very well. People are shouting, but not communicating. This is because all the people might want different things (scientists want to record what’s happening in the lab, the ontologist wants to ensure that communication works, and the CS person wants to be able to take the data and do cool stuff with it).

Scientists are worried that other people might want to use their work. Let’s just assume they think that sharing data is exciting. Science wants to capture first and communicate second, ontologists want to communicate, and CS wants to process. There are lots of ways to publish on the web, in an appropriate way. However, useful sharing is harder than publishing. We need the agreed structure to do the communication, because machines need structure. However, that’s not the way humans work: humans tell stories. We’ve created a disconnect between these two things. The journal article is the story, but isn’t necessarily providing access to all the science.

So, we need to capture research objects, publish those objects, and capture the structure through the storytelling. Use the MyTea project as a example/story: a fully semantic (RDF-backed) laboratory record for synthetic chemistry. This is a structured discipline which has very consistent workflows. This system was tablet-based. It is effective and is still being used. However, what it didn’t work for was molecular biology / bioengineering etc — a much wider range of things than just chemistry. So Cameron and others got some money to modify the system: take MyTea (highly structured and specific system) and extend it into molecular biology. Could they make it more general, more unstructured? One thing that immediately stands out for unstructured/flexible is blogs. So, they thought that they could make a blog into a lab notebook. Blogs already have time stamps and authors, but there isn’t much revision history therefore that got built into the new system.

However, was this unstructured system a recipe for disaster? Well, yes it is — to start with. What warrants a post, for example? Should a day be one post? An experiment? There was little in the way of context or links. People who also kept a physical lab book ended up having huge lists of lab book references. So, even though there was a decent amount of good things (google indexing etc) it was still too messy. However, as more information was added, help came from an unexpected source: post metadata. They found that pull-down menus for templates were being populated by the titles of the posts. They used the metadata from the posts and used that to generate the pull-down menu. In the act of choosing that post, a link is created from that post to the new page made by the template. The templates depend on the metadata, and because the templates are labor saving, users will put in metadata! Templates feed on metadata, which feed the templates, and so on: a reinforcing system.

An ontology was “self-assembled” out of this research work and the metadata used for the templates. Their terms were compared to the Sequence Ontology and found some exact matches and some places where they identified some possible errors in the sequence ontology (e.g. conflation of purpose into one term). They’re capturing first, and then the structure gets added afterwards. They can then map their process and ontologies onto agreed vocabularies for the purpose of a particular story. They do this because we want to communicate to other communities and researchers that are interested in their work.

So, you need tools to do this. Luckily, there are tools available that exploit structure where it already exists (like they’ve done in their templates, aka workflows). You can imagine instruments as bloggers (take the human out of the loop). However, we also need tools to tell stories: to wire up the research objects into particular stories / journal articles. This allows people who are telling different stories to connect to the same objects. You could aggregate a set of web objects into one feed, and link them together with specific predicates such as vocabs, relationships, etc. This isn’t very narrative, though. So, we need tools that interact with people while they’re doing things – hence Google Wave.

An example is Igor, the Google Wave citation robot. You’re having a “conversation” with this Robot: it’s offering you links, choices, etc while having it look and feel like you’re writing a document. Also is the ChemSpider Robot, written by Cameron. Here, you can create linked data without knowing you’ve done it. The Robots will automatically link your story to the research objects behind it. Robots can work off of each other, even if they aren’t intended to work together. Example: Janey-robot plus Graphy. If you pull the result from a series of robots into a new Wave, the entire provenance from the original wave is retained, and is retained over time. Workflows, data, or workflows+data can be shared.

Where does this take us? Let’s say we type “the new rt-pcr sample”. The system could check for previous rt-pcr samples, and choose the most recent one to link to in the text (after asking them if they’re sure). As a result of typing this (and agreeing with the robot), another robot will talk to a MIBBI standard to get the required minimum information checklist and create a table based on that checklist. And always, adding links as you type. Capture the structure – it’s coming from the knowledge that you’re talking about a rt-pcr reaction. This is easier than writing out by hand. As you get a primer, you drop it into your database of primers (which is also a Wave), and then it can be automatically linked in your text. Allows you to tell a structured story.

Natural user interaction: easy user interaction with web services and databases. You have to be careful: you don’t want to be going back to the chemical database every time you type He, is, etc. In the Wave, you could somehow state that you’re NOT doing arsenic chemistry (the robot could learn and save your preferences on a per-user, per-wave basis. There are problems about Wave: one is the client interface, another is user understanding. In the client, some strange decisions have been made – it seems to have been made the way that people in Google think. However, the client is just a client. Specialized clients, or just better clients, will be some of the first useful tools. In terms of user understanding, all of us don’t quite understand yet what Wave is.

We’re not getting any smarter. Experimentalists need help, and many recognize this and are hoping to use these new technologies. To provide help, we need structure so machines can understand things. However, we need to recognize and leverage the fact that humans tell stories. We need to have structure, but we need to use that structure in a narrative. Try to remember that capturing and communication are two different things.

The sound of two hands Waving

The Life Scientists Wave in Iron

I got a Google Wave account (grin) via Cameron Neylon on Monday morning (thanks, Cameron!). I’m trying not to get caught up in all the hype, but I can’t help grinning when I’m using it, even though I don’t really know what I’m doing, and even after seeing the Science Online Demo and a couple Google videos.

But where and how will we get the benefit of the Wave?

I’ve read a few articles, and played around a little, and chatted with people, but I’m still a complete novice. So, I’m not going to talk about technical aspects of waving here. However, even now I can see that the power of Wave will not be in what’s available by default (as was the case with Gmail – you got an account, started using it, and that was pretty much it). It will be in the new applications, interfaces and most especially the Robots that will be riding the Wave with us where the most value will be. OK, so I’ve only had an account for one day, but I think even as a beginner, I can see it is in what we will create for ourselves and our communities to use that will make or break this new thing. And, as ‘we‘ are so much a requirement for this to work, my next point becomes pretty important.

What it will really take to get the best out of Wave for us researchers and scientists?

It will take many, many scientists participating. Social networking needs to get a lot more important to people who currently may just make use of e-mail and web browsing. This is exciting, but we’ll need their help. A very good slideshow by Sacha Chua about this can be found on Slideshare. Use it to convince your friends!

First steps.

As for me, I’ll be waving with both hands this Thursday at 2pm, when Cameron Neylon comes to talk about open science, Google Wave, and more. Unless Cameron is a fantastic multitasker, I may be the only one with an account at the presentation. Not sure how interesting it will be if I am the only one waving. I’ll keep you updated, and post my experience with live blogging with Wave here, and let you know how it goes.

I’m also hoping that I can get some of my research out there into the wider world via Wave robots. I have an interest in structured information (ontologies, data standards etc) and think this may lead to some interesting things.

So, the sound of two hands waving? Pretty quiet, I think. But add another few hundred pairs of hands, and things may get a lot louder.

Science Online London 09: Thoughts, not Transcript

First off, I’d like to thank the many people who re-tweeted my blog posts throughout Science Online London this past Saturday. With your help, Saturday was my best day ever for visits to the site. I hope people enjoyed my posts, and perhaps stayed long enough to find out what I blog about when I’m not at conferences (those I’m most proud of include a day I spent at a primary school last year, and a co-authored post with Frank Gibson on attribution versus citation).

The Royal Institute
The Royal Institute

Those solo09 posts I wrote on Saturday were intended mainly as notes, as a transcript of what went on. It helps me concentrate to take notes, and due to my fabulous parents talking me into taking typing classes in high school, I am able to (mostly) keep up with presentations! But I wasn’t the only one blogging, and many people since Saturday have been writing up and posting their thoughts: Martin Fenner has been keeping track of what seem to be all blog posts about solo09, so please visit his post to find out what everyone thought of the day.

My blog posts on the day were a record of the day’s presentations, from my point of view. Today’s post is more personal – it was my first time at a Science Online conference, and this is a record of my impressions.

The day started very early for me, though I was not alone in this. I was on a 6am train, and managed to find my way to the Royal Institue (my first visit) before 8:30am. Luckily, they had already laid out the name badges of people whose first name began with “A”, and I grabbed my badge and went to see how many people were around. After geeking out way too much when I met Cameron Neylon for the first time in the physical world (when discussing online avatars with him I tried a bad pun referencing the recent Guild music video about avatars which fell a bit flat), I went for a wander around the building. In one of the libraries I found this book, which amused me:

A book in one of the libraries at the Royal Institute. Memoirs of Libraries, in a library!F
A book in one of the libraries at the Royal Institute. Memoirs of Libraries, in a library!

Then I wandered upstairs and had a look at the Faraday Theatre, with its surprisingly uncomfortable seating but beautiful fittings and fantastic ambience. Just a tip though – watch out for the Ambulatory Displays up there on the first floor. The British Library had a table set up in a prime position opposite the Faraday Theatre, and at that table I met some BL people as well as Stewart Wills, an Editor for Science. I had never spoken with a Science editor before, and I had a really enjoyable conversation with him and the BL people about wildflowers and ontologies for 20 minutes or so, until it was time for the conference to start.

I won’t go heavily into the presentations, as I have already covered them. Suffice to say I thought they were all very interesting, often entertaining, and definitely educational. While I would have loved to have much more time for open discussion at the end of each presentation, that didn’t spoil my enjoyment. I had my first experience with Second Life, and watching the odd behaviors of the avatars in it was almost hypnotic. One seemed to be playing the spoons or typing on an invisible keyboard or something. Many others seemed to be hanging off an invisible wire in their back, and others flounced, tilted alarmingly, or even looked attentive.

I will choose a favorite presentation though: I loved the theatrics and the content of John Gilbey. He presented a number of speculations about the far future, and said that we could all vote for our favorite by emailing him in the next week. Then, he’ll do his best to write about it in the context of the University of Rural England and get it into print 🙂 Fun! You can email him at gilbey@bcs.org.uk.

I had a number of good conversations with Sara Fletcher of Diamond Light Source about power cables, last year’s Science Online, and meeting people in the real world who you’ve gotten to know only through the (unreal?) world of the Internet. We were the ones sitting near the annoying ringing iPhone during the metrics/statistics talk by Richard Grant and others. No, it was NOT our phone, and yes, we tried to find it to turn it off but were unsuccessful.

It was great seeing bloggers made flesh: Petra Boynton, Jack of Kent, Cameron Neylon and Peter Murray-Rust were just a few of the people I either listened to or spoke with for the first time. Peter, Phil Lord and I had a great conversation about ontologies OWL ontologies – well, about semantics.

I left London that evening, this time on a full train of tired people wanting to get home that was in stark contrast to the quiet, empty train and the beautiful sunrise that began the day. I had a great experience and my thanks goes out to all the organizers and people who helped make Science Online London work. I am now more interested in Google Wave, still want a single unifying identifier for me and my online personas (one identifier per persona, or one per person?)  and am more aware of the legal implications of blogging. I feel like I’ve increased not just my knowledge of all things science and online, but also the size of my online science community, which is a community that has enriched my research environment and work life more in the past year than I ever thought possible. The Life Scientists, Science 2.0, Twitter and my good friend Google Reader keep me in touch with all of the other blogs of science of friends and colleagues, and I’m following many more after Science Online. I am a better scientist and researcher because of my connections to this community – Thank you all!

Breakout 3: Author identity – Creating a new kind of reputation online (Science Online London 2009)

Duncan Hull, Geoffrey Bilder, Michael Habib, Reynold Guida

ResearcherID, Contributor ID, Scopus Author ID, etc. help to connect your scientific record. How do these tools connect to your online identity, and how can OpenID and other tools be integrated? How can we build an online reputation and when should we worry about our privacy?

Geoff Bilder:

Almost every aspect of a person can change without the person themselves changing. So, you want to have an identifier that is a hook to you, and which is better than a name (which is changeable). What about retinal scans? Fingerprints? OpenID? Where does your profile come in? A profile is a collection of attributes that you use to describe who you are. With author identity, what we want is the ability to get at the profile of a person in an unambiguous manner. Until we have such a thing, how do you tell people what your canonical profile is? To complicate matters even more, each user will want multiple personas, each with their own profiles.

When talking about identity, two issues are often conflated: identity authentication and knowledge discovery identity system. That is, you must be more rigorous in determining swho someone is (logging into your identity) than in figuring out who wrote a paper. Further complications occur in the lossy conversion between languages of authors’ names.

Whatever is done, has to be done on an international scale, must be interdisciplinary, and must be interinstitutional. The oldest content cited thus far in CrossRef (with a DOI) is from the 1600s. What happens when you die to your identifier? A final issue is scale: there are about 200K new DOIs per month, and even if we guess at 5 authors per DOI, then there could be between 5-21K failures of identification per month if you estimate a 96-97% success rate for author identification.

Duncan Hull:

He spoke about openID is science, among other things. Currently, authentication of people is very different in most online applications, and is generally only done with a simple username and password combination. Simon Willison (The Guardian) estimates that the average online user has at least 18 user accounts and 3.49 passwords. OpenID is trying to end up with a situation where there are fewer usernames AND passwords.

OpenID works by redirecting you to your openID provider to log in, then sends you back to the location you started at. However, having a URL as a username is not very intuitive. Further, logging in via redirection can be confusing. Therefore while adoption of openId is growing, it may not properly take off until browsers and other vendors support it better. Mentioned myExperiment as something which accepts openId.

Michael Habib:

Michael presented a nice diagram: a square divided into 4 parts, with “about me” and “not about me” across the top, and “by me” and “not by me” down the side. It is the “not” category for both where the disambiguation of people is the most important. He used the example of Einstein and the LC Authority Files to figure out what all of the different versions of his name are.

Completely different from the LC Authority files, which is manually and carefully checked by only certain people, is ClaimID. ClaimID is a way to collect all aspects of your identity in one place. However, it is dependent upon each individual being truthful about what they have ownership over.

Another approach is the Scopus Author ID, which is completely machine aggregated. It is validated by publications, and scales well. It has 99% precision and 95% recall. The cons is that it is impersonal, and those precision and recall values really aren’t very good when you consider that this is about ownership of an article, and that there are a very large number of people.

There is also 2collab, where you can combine author ids (that you know about) into one identity. Then, you can add any other item on the web that is about you.

Reynold Guida (from Thomson Reuters):

They’ve built software to try to address author identity and attribution. If you look at the literature since 2000, communication and scientific collaboration has really changed. What we notice is that the number of multi-author papers has started to increase, while single-author papers have decreased. A google search for common surnames really highlights the problems associated with identity. Name ambiguity is a real problem. The connection between the reseacher and the institution and the community is a real problem. Two of the most important parts in this discussion are who do I know, and who do I want to know? The connections a person makes affects all aspects of their career.

Therefore they have created researcherId (free, secure, open). Privacy options are controlled by the user, even if the institution created the record. There is integration with EndNote, Web of Knowledge, and other systems to help build publication lists. You can link to / visualize your researcher id profile really easily from your own websites.

Discussion:

Question: Has anyone thought through the security implications of these single ID systems: one slip-up and your entire identity has been hacked? GB: Multiple identities encourages poor behaviour, as the thought of changing your password everywhere is so overwhelming that people don’t do it. But yes, these problems exist. However, the tradeoffs make it worthwhile to their minds. You should NOT conflate knowledge issues with security issues. This is because information for your scholarly profile is, by definition, public anyway.

Question: Do different openId providers and author id and researcher id know about each other in the computational sense? Not really yet.

Question: What about just making the markup of the web more semantically friendly? DH: The Google approach is a good one. RG: It’s all about getting the information into the workflow.

Question (Phil Lord): What worries me is that there has been a big land grab for author identity space: for example, you cannot log into Yahoo with any other open id than a Yahoo open id. There’s a lot of value in being in control of someone’s id. Therefore there is a big potential danger. GB: For every distributed system, you need a centralized indexing thing to get it to work correctly. Therefore we need to make sure that if a centralized system appears, there should be accountability.

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Real-time statistics in science (Science Online London 2009)

Victor Henning, Richard Grant, Virginia Barbour

Academic prestige, setting research trends, getting jobs and tenure, grant funding – they are largely based on publishing in high-Impact Factor journals and getting citations. Not only are these measures flawed and widely critized: “You could write the entire history of science in the last 50 years in terms of papers rejected by Science or Nature”, said Nobel laureate Paul Lauterbur. Citation measures are also subject to a considerable time-lag. If you write a paper today, it takes a year to get it published, and another year passes by until citations of it appear. What if there were alternative measures of scientific impact? What if these measures were available in real-time, letting you track the trends in your discipline as they develop? That’s what we’ll discuss in this session.

Richard Grant:

Employers like metrics to discover if they’re spending money in the right places. Researchers want to see that what they’re doing is relevant. This is why we want metrics. But what can metrics do, and what can’t they do? Impact factors: doesn’t actually tell you how good research is in a given journal. He is involved in the qualitative assessment of articles. More like a FriendFeed method of assessment. Corporate bit: http:/f1000.com. The crucial thing they want to have is quality. What they do at f1000 is pretty slow, by necessity. There is also, though, a tying-in with the community.

Virginia Barbour:

She’d like to reclaim the word “impact” from “impact factor”. How do you assess quality: usage, media coverage, blog coverage, expert ratings, discussion thread activity, who is reading it, who is citing it, where the research was done, effect on public policy? No single one is one you should rely on. Traditional measures are often not the most important. Many feel that the way papers are being evaluated is actually detrimental to the research process. Most users of journal sites are not coming via the home page – they’re coming via Google and other methods: people just don’t start at the first page of a journal and read through.

NEJM is changing the way their front pages look and the Journal of Vision is changing the way the metrics are displayed. At PLoS, in Phase 1 they want to have data that isn’t owned by someone else – that we can actually use and verify. In Phase 2, they also want to have the number of downloads of the article. This data will be broken down by the type of views. They also want to make the metrics more sophisticated, with more sources for each data type, more sophisticated web usage data, provide tools for analysis, and more.

Victor Henning:

Used last.fm as an analogy for article metrics, and as an introduction to Mendeley. In this way, you can track article pervasiveness in reference manager libraries, track article reading time in PDF viewers, and track user tags and ratings. One key difference with Mendeley and last.fm is privacy: they believe that some scientists don’t want others to know what literature they find interesting.

They have synchronization with citeulike, and will shortly have synchronization with Zotero. The goal of all this is to aggregate statistics for their users. All of the information is available by academic discipline, geographic region, and more. Once we’re at the point where there are true article metrics, this can be the basis for individualized recommendations.

Discussion:

Question: It seems we’re replacing a single impact factor with a large number of new ones. How do you forsee people managing and understanding all of those metrics? RG: we’re not in the business of replacing the impact factor – just providing more information to the researcher.

VB: I can imagine that people will be able to go to grant funding agencies and tell them how much coverage in all sorts of media your paper received.

Question (Phil Lord): I worry about reading times as a measure of quality. In music the listeners and musicians are largely disjoint. In science this is definitely not true. Many of the metrics mentioned are very much open to fiddling and self-citation. What do you say about this? VH: We’re not advocating replacing the impact factor. However, it is always better to have more data, more metrics.

Question: I print out my articles. How will that affect things?

Missed most of the rest of the discussion because of a phone that wouldn’t stop ringing – see the Twitter hashtag #solo09 for all the gory details.

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Google Wave: Just another ripple or science communication tsunami? (Science Online London 2009)

Cameron Neylon, Chris Thorpe, Ian Mulvany

Google Wave is a new tool for communication and collaboration on the web that will be released later this year. For this session we plan a live demo of the prerelease version of Google Wave to show off the potential for scientists.

What can you do with a wave? Make robots, embed into blog, build gadgets. Robots (server side) can inspect data within a wave, then go and do something about it and change the content within a wave. For the geeks, it’s powered by webhooks. You can put waves anywhere, into any HTML file. Changes are immediately propogated to every embedded wave. Therefore, if you make a comment on a waved blog, that comment appears wherever people have requested it. It makes flame wars almost immediate 🙂

Gadgets (client side) extend the functionality of waves, and are xml-based and store their data within a Wave. Changes can be replayed and are stored on a per-user/wavelet basis.

Cameron then live-demoed a wave by writing something “like an email” and showed how it propogated to other users. (Ian said “o noes! i iz in ur wave editing ur text”. Highly amusing. But they’re just showing versioned instant messaging, right now. cool, but I would like to see more.) He can invoke the Guardian robot with “?guardian” and the search results are put right back into the wave. There’s also a robot for chemspider, and another for producing Latex figures (Watexy).

They also showed Igor, a robot which helps retrieve citations. Also Graphy which, as the name suggests, produces basic graphs from text that look suspiciously like what you might want an SBML pathway to look like!

The entire Google Wave system is going to be open-sourced. Most of the client architecture is HTML 5 and Javascript. Google had a robot (not public) that would translate into another language as you typed – supposedly quite resource hungy?

What would make people use it who aren’t geeks? At the moment, it is difficult to get used to using the interface. Also, it doesn’t yet integrate with email as we know it. However, Cameron Neylon says that it’s easier than it looks to use, so once they sort the interface it should become popular.

IM: If Google wave is as easy to install by institutions as a wiki setup, then it might work and really help collaborations and sharing. Even more so if Wave successfully integrates email.

More short notes about the demo and discussion:

  • CN: I have the feeling it will be very very good at taking collaborative note taking during talks.
  • People can edit each other’s comments, and there is versioning so you can see how things have changed.
  • Wave is much more efficient in terms of resources – not a whole series of gets, but instead a few puts (if I understand this correctly).
  • One problem: Google Wave can’t be used offline. Is there any way to get some limited functionality offline?

Phil Lord suggested that google wave might be good for collaborative ontology development. (I agree!)

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!