Housekeeping & Self References

A Case of Stolen Professional Identity

…or when a bogus review with my name on it was submitted to a journal.

Update: Please read how the same people tried it again, with another journal, in this post.

At the beginning of last month, just as I was starting to work part-time on my PhD again after maternity leave, a curious and worrying thing happened. My name – and, as such, my professional identity – was stolen and used in a bogus review for a journal submission.

How it Happened

The story begins with an email I received from an Editor asking me to confirm that I had recently submitted a review to his Journal. He was suspicious, as the email address provided for me was a gmail account rather than my institutional address. While my name and affiliation in the review were correct, the gmail address was not mine. A little while later, the Editor let me know that other reviewer email addresses were equally dubious, and at least one other person had confirmed that their professional identity had also been stolen and used to create one of the other reviews for the same submission.

The review itself was badly written and very short, and I am indebted to the Editor for catching the oddness of the email address and for delving into this situation so deeply. Despite all the help provided by author and reviewer databases, a little personal attention by editors goes a long way. This Journal’s rules for reviewing are pretty standard, and as with many journals, they allow authors to submit reviewer suggestions. I don’t think this practice should be stopped, as many research communities are relatively specialised or small, and you are more likely to get suitable reviewers if the authors are able to suggest options. However, abuse of this system is possible, and I would be very surprised if nothing like this scam has happened before.

The Outcome

It was caught early here, just after the reviews were submitted. The culprits were banned. Though I’m not privy to whether or not any further legal action can or will be taken, at least there was a positive result for the Journal. The only way it could have been caught earlier is if the odd email addresses were noticed at the point the reviewer names were suggested, rather than once the reviews came in. I sincerely hope there aren’t other bogus reviews out there in other journals using anyone else’s name.

Personally Speaking…

I’d like to compliment the Editor and his Journal for discovering this unprofessional behaviour early on and for taking action. While it is a kind of dubious honour to be selected for such a scam (the scammers must think I’m a good reviewer choice?), it has been an uncomfortable experience for me personally. I expend a reasonable amount of effort on maintaining my professional online appearance. A search on my name retrieves mainly work-related hits, and this is a useful aid for both sharing work and finding other like-minded researchers. I assume this is how the scammers came up with my name, and the names of the others whose professional personae were misused in the same way. Such sub-standard reviews could harm the perception of the real researcher in the eyes of the journals concerned, and this is a worry.

Catching the Crooks

This isn’t a post on the purpose or usefulness of peer review. Whatever your views (and some are quite negative), the process is firmly entrenched in our community, at least for now. But how should we be working to prevent such scams in future?

Should journals require institutional email addresses? Should journals not accept email addresses from authors at all, and search for reviewers’ addresses independently? Certainly there are few reasons why honest reviewers would be using a non-institutional address, but is it a little too much to force such a constraint?

Additionally, there are many proponents of getting rid of anonymity in the refereeing process. Indeed, PLoS journals encourage reviewers to name themselves. Would be more difficult to perform this kind of a scam if the name of the reviewer were visible? What if the scammers managed to succeed, and the wronged party never noticed their name on that review, visible for all to see? It could be a real blow for professional reputations.

A Final Note

I’m happy that the wrongdoers were caught, and that the Journal and Editor were open enough about what happened to encourage me to write about it: they hope that this openness will make it harder for people to perform the same stunt again. Bad reviews lead to substandard papers being accepted, which lowers the standing of whatever journal publishes them: a bad outcome for the whole community.

Hopefully this will be a timely warning to others, as I’ve never heard of it happening before. Please let me know if you’ve ever had a similar experience, as I’d be interested to hear about it.

One final thought: having written a review in my name, do you think these scammers could write my PhD thesis for me too? Hmmm, perhaps not such a good idea after all….

What are your ideas? How could such a scam be prevented in future? Let me know about your suggestions on this topic, or your own experiences. Is this more common than we think? You can contact me via the comments on this post or via the various social networking methods I use. Further information is available from my About page.

Housekeeping & Self References

Pause in Posts: New Arrival

Just to let you know that over the coming months my posts will be infrequent, as my husband and I have a new addition to the family:

He’s great, but he’s definitely a little generator of time warps – haven’t had much time for anything else! I’ll be back in the blogging game as I have the time for it – promise!

Housekeeping & Self References Meetings & Conferences Papers Science Online

Social Networking and Guidelines for Life Science Conferences
I had a great time in Sweden this past summer, at ISMB 2009 (ISMB/ECCB 2009 FriendFeed room). I listened to a lot of interesting talks, reconnected with old friends and met new ones. I went to an ice bar, explored a 17th-century ship that had been dragged from the bottom of the sea, and visited the building where the Nobel Prizes are handed out.

While there, many of us took notes and provided commentary through live blogging either on our own blogs or via FriendFeed and Twitter. The ISCB were very helpful, having announced and advertised the live blogging possibilities prior to the event. Once at the conference, they provided internet access, and even provided extension cords where necessary so that we could continue blogging on mains power.

Those of us who spent a large proportion of our time live blogging were asked to write a paper about our experiences. This quickly became two papers, as there were two clear subjects on our minds: firstly, how the live blogging went in the context of ISMB 2009 specifically; and secondly, how our experiences (and that of the organisers) might form the basis of a set of guidelines to conference organisers trying to create live blogging policies. The first paper became the conference report, a Message from ISCB published today in PLoS Computational Biology. This was published in conjunction with the second paper, a Perspective published jointly today in PLoS Computational Biology, that aims to help organisers create policies of their own. Particularly, it provides “top ten”(-ish) lists for organisers, bloggers and presenters.

So, thanks again to my co-authors:
Ruchira S. Datta: Blog FriendFeed
Oliver Hofmann: Blog FriendFeed Twitter
Roland Krause: Blog FriendFeed Twitter
Michael Kuhn: Blog FriendFeed Twitter
Bettina Roth
Reinhard Schneider: Blog FriendFeed
(you can find links to my social networking accounts on the About page on this blog)

If you have any questions or comments about either of these articles, please comment on the PLoS articles themselves, so there can be a record of the discussion.

Lister, A., Datta, R., Hofmann, O., Krause, R., Kuhn, M., Roth, B., & Schneider, R. (2010). Live Coverage of Scientific Conferences Using Web Technologies PLoS Computational Biology, 6 (1) DOI: 10.1371/journal.pcbi.1000563

Lister, A., Datta, R., Hofmann, O., Krause, R., Kuhn, M., Roth, B., & Schneider, R. (2010). Live Coverage of Intelligent Systems for Molecular Biology/European Conference on Computational Biology (ISMB/ECCB) 2009 PLoS Computational Biology, 6 (1) DOI: 10.1371/journal.pcbi.1000640

Housekeeping & Self References Meetings & Conferences Science Online Semantics and Ontologies

Ontogenesis: rapid reviewing and publishing of articles on semantics and ontologies

What happens when an ontologist or two gets frustrated at the long-scale publication process that is the norm when publishing scientific books? You get Ontogenesis: a quick-turnaround, low-maintenance solution using WordPress blog software. Next, a bunch of other ontologists are invited to a two-day, intensive writing and peer-reviewing workshop and the initial content is created. Result? Well, my favourite result was Kendall Clark tweeting this: “#Ontogenesis is awesome:“.

What is Ontogenesis?

Phil Lord had the idea, and together with Robert Stevens and others organised the 2-day Ontogenesis workshop that occurred last week, 21-22 January 2010. Why look around for an alternative to traditional publishing methods? When writing a book, accepting the invitation might take 5 minutes, but getting around to doing it can take 6 months or more. You may only spend a couple of days writing the article, but then need to wait months for reviews (and do reviews for the other authors’ articles). Then there is the formatting and camera-ready copy. Then you may wait many months for proofs and then only get a few days to make corrections. Then, you can wait year or so for actual publication, by which time it is possibly out of date. Not ideal, but still necessary for some forms of publishing.

There are a number of benefits to using blog software, and to the Ontogenesis model in general:

  • stable, permanent URLs: Permanent URLs for articles and peer reviews. DOIs have been discussed as well, and are being considered.
  • automatic linking of peer reviews and related online articles. The WordPress software automatically adds trackbacks, pingbacks, etc. as comments on the relevant articles, making it easy for interested readers to visit the peer reviews written for that article.
  • completely open review system. Unlike many peer-review systems in use today, the reviewer (publicly) publishes his/her article in Ontogenesis.
  • less work and quick turnaround time for the editors, reviewers, and authors. Once you have written your article (in whatever format you like, other than a few broad suggestions about licensing and intention), you publish it as “Uncategorised” in the system, and then once reviewers have agreed to look at it, move it to “Under Review”. Once reviews are complete, and the editors have checked everything, it is moved to “Reviewed”. Pretty simple.

A blog that isn’t a blog

But is Ontogenesis a blog? Not really. Is it a book? Not in the traditional sense. While it seems to be correct to call it a blog, how the blog software is being used isn’t the way many people use it. And, though Duncan has called it “blogging a book”, this isn’t quite right either: while content, once completed, will not be changed, new content will be continually added. Phil discussed this point in his introduction to the workshop. He stated that wikis are best suited for certain styles of articles, but not for this sort of infrequently-updated information. Further, in wikis in general, crediting is poor. Google Knol is a nice idea, but not many people are using it. If it’s just a plain website, then there is no real way to have (and to show, more importantly), peer review.

To me, and to the general agreement of the people at the workshop, Ontogenesis can be viewed as a title/proper noun, in the same way as Nature is a title of a journal. Ontogenesis is the first of a class of websites called Knowledge Blogs. It is has more in common with the high-quality, article-style blogging of ScienceBlogs or Research Blogging than it does with the short, informal blogging style that is used by most bloggers. Each article stands on its own, is of a high quality, and describes a topic of interest to both ontologists and novices in the ontology world. Each article is aimed at a general life science readership, ensuring accessibility of knowledge and broad appeal.

My experiences as a contributor

I was lucky enough to be invited to the workshop last week, and had a great time. After an introductory set of presentations, we all got started writing our articles. The idea was that, once written, each article would be peer reviewed by at least 2 others at the workshop. Once the peer reviews were complete, the article would be re-categorised from “Under Review” to “Review”. As Phil said in a recent blog post, we wrote a large number of articles, though the number that have gone through the full review process was not as high. We expect that over the next few days, the number of completed articles will rise.

My article on Semantic Integration in the Life Sciences was the first to come out of peer review. Thanks are very much due to Frank Gibson, Michel Dumontier, and David Shotton for their peer reviewing and constructive criticism: it is a much better article for their input. I also reviewed a couple of articles (1,2) by Helen Parkinson and James Malone, which should be moved to a Reviewed status soon.

Ok, but what’s the downside?

Well, it is new, and there are some kinks to work out. This workshop highlighted a number of them, such as the difficulty people unfamiliar with WordPress had using its UI. Sean has posted a useful summary of his thoughts on the pluses and minuses, which I encourage you to have a read of and comment on. Here are a few thoughts on how to improve the experience in future, as mentioned during the meeting:

  • Enable the plugin for autogenerating related articles to improve cross-links.
  • The Table of Contents has been started, but different “pathways” for different intended readerships to help guide them through the articles would be helpful.
  • Reviewers should be able to change Categories in any article so they can mark when it is Under Review, rather than waiting for the Authors to do this.
  • The article-specific Table of Contents are very helpful, but it might be better to move it to different location in the post (e.g. the top rather than the bottom).
  • Have a way to mark yourself as willing to accept papers to review, for instance if you have some time in your schedule that week: authors could then preferentially choose you.
  • The ability for your name in the byline of an article to link to your profile on Ontogenesis. Currently, the profiles are private and some authors have put their profiles into the article text as a temporary alternative.
  • Add the Stats wordpress plugin.
  • Comments do not have the author of the comment within them, e.g. pingbacks to reviews have to be clicked through to find out who wrote the review.
  • Dealing with references/citations will be done better in future, when an appropriate plugin is found. Currently, basic HTML links to DOIs is the standard way to go.

Conclusions? Be an author yourself, and try it out!

This method of publishing is new, interesting, and quick. If you have a topic you’d like to write about, are interested in peer reviewing, or are just interested in reading the articles then please visit Ontogenesis and have a go, and then let us know what you think!

Please note: as mentioned in the main text, I am one of the authors of articles and peer reviews in Ontogenesis.

Housekeeping & Self References Science Online Software and Tools

Live blogging with Wave: not so live when you can’t make the Wave public

I live blogged Cameron Neylon‘s talk today at Newcastle University, and I did it in a Wave. There were a few pluses, and a number of minuses. Still, it’s early days yet and I’m willing to take a few hits and see if things get better (perhaps by trying to write my own robots, who knows?). In effect, today was just an exercise, and what I wrote in the Wave could have equally well been written directly in this blog.

(You’ll get the context of this post if you read my previous post on trying to play around with Google Wave. Others, since, have had a similar experience to mine. Even so, I’m still smiling – most of the time 🙂 )

Pluses: The Wave was easy to write in, and easy to create. It was a very similar experience to my normal WordPress blogging experience.

Minuses: I wanted to make the Wave public from the start, but have yet to succeed in this. Adding or just didn’t work: nothing I tried was effective. Also, the copying and pasting simply failed to work when copying the content of the Wave from Iron into my WordPress post in Firefox: while I could copy into other windows and editors, I simply couldn’t copy into WordPress. When I logged into Wave via Firefox, the copy-and-paste worked, but automatically included the highlighting that occurred due to my selecting the text, and then I couldn’t un-highlight the wave! What follows is a very colorful copy of my notes. I’ve removed the highlighting now, to make it more readable.

I’d like to embed the Wave here directly. In theory, I can do this with the following command:

[wave id=”!w%252BtZ-uDfrYA.2″]

Unfortunately, it seems this Wavr plugin is not available via the setup. So, I’ll just post the content of the Wave below, so you can all read about Cameron Neylon’s fantastic presentation today, even if my first experiment in Wave wasn’t quite what I expected. Use the Wave id above to add this Wave to your inbox, if you’d like to discuss his presentation or fix any mistakes of mine. It should be public, but I’m having some issues with that, too!

Cameron Neylon’s talk on Capturing Process and Science Online. Newcastle University, 15 October 2009.

Please note that all the mistakes are mine, and no-one else’s. I’m happy to fix anything people spot!

We’re either on top of a dam about to burst, or under it about to get flooded. He showed a graph of data entering GenBank. Interestingly, the graph is no longer exponential, and this is because most of the sequence data isn’t goinginto GenBank, but is being put elsehwere.

The human scientist does not scale. But the web does scale! The scientist needs help with their data, with their analysis etc. They’ll go to a computer scientist to help them out. The CS person gives them a load of technological mumbo jumbo that they are suspicious of. What they need is someone to interpolate the computer stuff and the biologist. They may try an ontologist, however, that also isn’t always too productive: the message they’re getting is that they’re being told how to do stuff, which doesn’t go down very well. People are shouting, but not communicating. This is because all the people might want different things (scientists want to record what’s happening in the lab, the ontologist wants to ensure that communication works, and the CS person wants to be able to take the data and do cool stuff with it).

Scientists are worried that other people might want to use their work. Let’s just assume they think that sharing data is exciting. Science wants to capture first and communicate second, ontologists want to communicate, and CS wants to process. There are lots of ways to publish on the web, in an appropriate way. However, useful sharing is harder than publishing. We need the agreed structure to do the communication, because machines need structure. However, that’s not the way humans work: humans tell stories. We’ve created a disconnect between these two things. The journal article is the story, but isn’t necessarily providing access to all the science.

So, we need to capture research objects, publish those objects, and capture the structure through the storytelling. Use the MyTea project as a example/story: a fully semantic (RDF-backed) laboratory record for synthetic chemistry. This is a structured discipline which has very consistent workflows. This system was tablet-based. It is effective and is still being used. However, what it didn’t work for was molecular biology / bioengineering etc — a much wider range of things than just chemistry. So Cameron and others got some money to modify the system: take MyTea (highly structured and specific system) and extend it into molecular biology. Could they make it more general, more unstructured? One thing that immediately stands out for unstructured/flexible is blogs. So, they thought that they could make a blog into a lab notebook. Blogs already have time stamps and authors, but there isn’t much revision history therefore that got built into the new system.

However, was this unstructured system a recipe for disaster? Well, yes it is — to start with. What warrants a post, for example? Should a day be one post? An experiment? There was little in the way of context or links. People who also kept a physical lab book ended up having huge lists of lab book references. So, even though there was a decent amount of good things (google indexing etc) it was still too messy. However, as more information was added, help came from an unexpected source: post metadata. They found that pull-down menus for templates were being populated by the titles of the posts. They used the metadata from the posts and used that to generate the pull-down menu. In the act of choosing that post, a link is created from that post to the new page made by the template. The templates depend on the metadata, and because the templates are labor saving, users will put in metadata! Templates feed on metadata, which feed the templates, and so on: a reinforcing system.

An ontology was “self-assembled” out of this research work and the metadata used for the templates. Their terms were compared to the Sequence Ontology and found some exact matches and some places where they identified some possible errors in the sequence ontology (e.g. conflation of purpose into one term). They’re capturing first, and then the structure gets added afterwards. They can then map their process and ontologies onto agreed vocabularies for the purpose of a particular story. They do this because we want to communicate to other communities and researchers that are interested in their work.

So, you need tools to do this. Luckily, there are tools available that exploit structure where it already exists (like they’ve done in their templates, aka workflows). You can imagine instruments as bloggers (take the human out of the loop). However, we also need tools to tell stories: to wire up the research objects into particular stories / journal articles. This allows people who are telling different stories to connect to the same objects. You could aggregate a set of web objects into one feed, and link them together with specific predicates such as vocabs, relationships, etc. This isn’t very narrative, though. So, we need tools that interact with people while they’re doing things – hence Google Wave.

An example is Igor, the Google Wave citation robot. You’re having a “conversation” with this Robot: it’s offering you links, choices, etc while having it look and feel like you’re writing a document. Also is the ChemSpider Robot, written by Cameron. Here, you can create linked data without knowing you’ve done it. The Robots will automatically link your story to the research objects behind it. Robots can work off of each other, even if they aren’t intended to work together. Example: Janey-robot plus Graphy. If you pull the result from a series of robots into a new Wave, the entire provenance from the original wave is retained, and is retained over time. Workflows, data, or workflows+data can be shared.

Where does this take us? Let’s say we type “the new rt-pcr sample”. The system could check for previous rt-pcr samples, and choose the most recent one to link to in the text (after asking them if they’re sure). As a result of typing this (and agreeing with the robot), another robot will talk to a MIBBI standard to get the required minimum information checklist and create a table based on that checklist. And always, adding links as you type. Capture the structure – it’s coming from the knowledge that you’re talking about a rt-pcr reaction. This is easier than writing out by hand. As you get a primer, you drop it into your database of primers (which is also a Wave), and then it can be automatically linked in your text. Allows you to tell a structured story.

Natural user interaction: easy user interaction with web services and databases. You have to be careful: you don’t want to be going back to the chemical database every time you type He, is, etc. In the Wave, you could somehow state that you’re NOT doing arsenic chemistry (the robot could learn and save your preferences on a per-user, per-wave basis. There are problems about Wave: one is the client interface, another is user understanding. In the client, some strange decisions have been made – it seems to have been made the way that people in Google think. However, the client is just a client. Specialized clients, or just better clients, will be some of the first useful tools. In terms of user understanding, all of us don’t quite understand yet what Wave is.

We’re not getting any smarter. Experimentalists need help, and many recognize this and are hoping to use these new technologies. To provide help, we need structure so machines can understand things. However, we need to recognize and leverage the fact that humans tell stories. We need to have structure, but we need to use that structure in a narrative. Try to remember that capturing and communication are two different things.

Housekeeping & Self References In The News Outreach

Inspiring Science Autumn Newsletter

I recently attended an open day at the Science Learning Centre North-East (SLCNE) in my role as half of a Teacher Scientist Network (TSN) partnership. There Louise, my partnered teacher, and I gave a short presentation on how the TSN works, and more specifically about our efforts last year. I enjoyed talking about what a positive experience it was, and also enjoyed seeing the other initiatives (such as Science in the Spotlight and Scientists@Work) that the SLCNE manages.

As an extra bonus, the newsletter for this Centre for Autumn had an article on my TSN partnership with Louise (hence the categorization of this post into the “Self Reference” section). Not only can you read the interview with me and Louise, but you can also read about:

  • ‘Liquid Science’ in March 2010 at Newcastle’s Liquid and Diva Nightclub
  • How you can get funding from the Royal Society (up to £3000!) for “teachers and scientists or engineers to work together on creative investigations involving 5–16 year olds”. The funding goes straight to the school, and the closing date is November 6th. More information:
  • Details on the 2009 SLCNE Christmas Lecture from Dr. Laura Grant. She’ll be giving a ‘Cool Science’ presentation “which looks at some of the strange things that happen at low temperatures. The lectures will be performed at four venues across the North East during the first week of December and are suitable for Year 6/7 pupils.” More information:

I strongly encourage you all to join in with your local SLC or branch of TSN, and to have a look at this season’s newsletter!

Housekeeping & Self References Meetings & Conferences

Highest-Viewed Blog Posts and Personal Thoughts on ISMB 2009

ISMB 2009 has come to a close, and with its end I’d like to chat a little about three topics: which ISMB 2009 blog posts readers clicked on the most, which presentations I (personally) found the best, and what I thought about the parts of the conference where no slides were involved (the social aspects).

If you want to check out all of my ISMB 2009 posts, remember you can always search on ‘ismb 2009‘. And don’t forget to check out the other bloggers: Oliver Hofmann, Cass Johnston, and Mikhail Spivakov. If there are more of you out there, let me know and I’ll include you here.

Most Highly-Viewed Blog Posts

Below you’ll find a top-ten list of my blog posts of the talks I attended at ISMB. This top ten is based on number of views according to the stats pages WordPress provides. Of course, this ranking is not very scientific. And additionally, this is just a little bit of fun and doesn’t represent any kind of relative merit of these talks. 🙂 I just wanted a snapshot of what the immediate interest was, both from attendees and non-attendees who followed the conference via FriendFeed or similar, and from there found my blog. Some more thoughts about this list:

  • It could be said to either positively or negatively relate to the quality of the FriendFeed comments. People liking the FF comments may have wanted to learn more, and thus clicked through to my posts. Conversely, people not getting enough information from the FF comments may have clicked through to learn more.
  • It could definitely also be said that the simple viewing of one of my posts doesn’t mean the user received any benefits, or indeed liked my post at all!
  • This may be obvious, but I only blogged those talks I attended. Therefore this list isn’t a representation of the popularity of all presentations, just of the number of views of the blog posts about presentations that I actually attended.
  • If I ever want to do a further ranking, this post will probably influence the numbers 🙂
  • It’s just a ranking of the most-viewed pages over the past 7 days, which pretty much covers the SIGs and the main conference. These numbers can and will change over the coming days and weeks. In fact, the positions shifted slightly while I was writing this, but I kept to the original list from this morning.

I hope nobody takes this this little bit of fun too seriously, and enjoy!

The top posts, listed with the most-visited one first (as of the morning of July 3, 2009):

  1. TT:23 Utopia Documents: The Digital Meta-Library, Steve Pettifer
  2. Keynote: New Challenges and Opportunities in Nework Biology, Trey Ideker
  3. Research reproducibility through GenePattern, Jill Mesirov, from the DAM SIG
  4. Keynote: Information and Biology, Pierre-Henri Gouyon
  5. TT40: BioCatalogue: A Curated Web Service Registry for the Life Science Community, Franck Tanoh
  6. Keynote: Computational Neuroscience: Models of the Visual System, Tomaso A. Poggio
  7. Special Session 4: Adam Arkin on Synthetic Biology, part of the Special Session on Advances and Challenges in Computational Biology, hosted by PLoS Computational Biology
  8. Annotation of SBML Models Through Rule-Based Semantic Integration, Allyson Lister, from the Bio-Ontologies SIG
  9. HL53: Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project, Chris Taylor
  10. Workflow development and reuse in Taverna, Carole Goble, from the DAM SIG

It’s nice to see a standards talks in the top ten (the MIBBI talk, at number 9). And yes, that is my presentation at number 8, but I promise it was really there in the list, and that it wasn’t me: WordPress doesn’t count my own vists to my blog!

And now onto the talks I liked the most…

This section has two parts: talks that I liked the most,  and my favorite talk. Please note that these are in addition to the top ten I already mentioned above: those will not be getting a double mention here. Also, I’m not mentioning any of the papers I was involved in here, deliberately!

Firstly, presentations I heard that I enjoyed, in no particular order:

And, my absolute favorite? I’ll have to choose a keynote for that, Pierre-Henri Gouyon’s talk on Information and Biology. He was the most engaging of all of the speakers, and had the best style of speech. His talk was funny, intelligent and well constructed. A great way to start the conference.

The ISMB 2009 social scene (no, you’ll find no dirt here!)

The FriendFeed group was mainly sober, serious, and related directly to the presentations. Over 137 subscribed, though fewer people contributed. That’s no bad thing, though – it’s more important to encourage readers to discover the FF group and make use of it than it is to get loads of people writing in the group. Getting readers for the group is the hard part: once people are aware it exists, it’s a lot easier for them to start contributing to the dialog once they’re comfortable. It was on FF that I learnt that people had items stolen from the Light Factory party, which was one of the very few downsides to this conference.

However, it wasn’t all serious. Ruchira Datta started an open thread that was lively from the beginning. There was a Twitterer who was worried about the quality of music in the rooms prior to the talk (here’s just one example of his thoughts on the matter), more than one mention of where power sockets could be found (in the open thread and here), Lars (who wasn’t at the conference but followed in on FF) provided a number of wordles concerning both content and authorship of FF comments. Neil (one of the main bloggers from last year), still eagerly awaits photos (I promise I’ll put some up this weekend, and am myself looking forward to Ruchira’s pics of us FFers at the Thai place!)

It wasn’t all online: many attendees managed to actually meet and talk in person 🙂 . I felt the Vasa Museum was a fabulous place to have the dinner on the Wednesday, and having the initial drinks receptions at the City Hall impressed both me and everyone I spoke with. With alcoholic drinks roughly twice the price of their UK counterparts, I didn’t do much drinking, but then also didn’t miss it. I was kindly invited to the press conference (an experience which I may write up separately later), which was a fantastic first for me. I met people that I had only interacted online with before.

While I have been to other ISMBs before, I think in terms of my work and research, this was the best one. (The Brisbane ISMB was my favorite for non-work reasons, as there I got to cuddle a koala and take a 2-week break in Oz afterwards!)

Finally, I’d like to thank the organisers (especially Reinhard Schneider and the people who embedded all the FF sections into the ISMB pages – well done!), the people who toughed it out through my talk on the Sunday, the other FFers attending both remotely and physically, and the bosses (Tom Kirkwood and Neil Wipat from CISBAN here at Newcastle Uni) who let me attend.

Data Integration Housekeeping & Self References Meetings & Conferences Semantics and Ontologies

Annotation of SBML Models Through Rule-Based Semantic Integration (ISMB Bio-Ont SIG 2009)

Allyson Lister et al.

I didn’t take any notes on this talk, as it was my own talk and I was giving it. However, I can link you out to the paper on Nature Precedings and the Bio-Ontologies programme on the ISMB website. Let me know if you have questions!

You can download the slides for this presentation from SlideShare.

FriendFeed Discussion:

Housekeeping & Self References Outreach

Slides and Notes available on “Working with Genes” (presentation for kids)

Those of you who have been following my posts for a while might have read this one from Fall 2008: Scientist Meets Small Children, and doesn’t stop talking (and listening) all day!.

The slides are now available from SlideShare, and embedded below:

The only problem I’m having is that the slides are mainly pictures. I have extensive notes to guide the speaker in the notes section of the Open Office document, but they don’t seem to be saved to SlideShare. So, until I can figure something better out, here are the notes for each slide. Any comments, suggestions, modifications, etc very much welcome. I hope it helps people. Enjoy!

Notes for slides:

Slide 1 (Title Slide)

KS1 and KS2:
These are Maine Coons, a particular breed of cat.
Has anyone heard of “genes” before?
Genes store the information that makes each one of us different. Eye color, shoe size, hair color…
Sometimes, there can be a change in a gene that is “good”: that allows a cat to run faster, or a dog to smell better
Sometimes that change can cause problems: some diseases are caused by mistakes in genes
Cats have been around for 1000s of years. They were domesticated by us.
How do we get domestic animals? What does domestic mean?
We can breed animals we like the most together
New ways of doing this are around now, which I’ll talk about later
How do you know it is the right thing to do? (Irish setters – epilepsy, laborador retrievers – hip problems, “mutts” – can be healthier)
In short: remember to think for yourself, and learn before reaching a decision.

Slide 2

KS1 and KS2:
Charlie is a normal domestic cat. She is 6 years old and lives with me. Do you know what that pattern is? She’s a brown tabby with some orange spots.
This other cat looks the same, and acts the same. But there is one big difference. He wouldn’t make my neighbour sneeze!
How many of you know people who sneeze when they are around cats or dogs?
The domestic cat was selectively bred from wild cats at least 9.500 years ago, and has been around since at least ancient Egypt ( and probably longer, see recent SciAm article:
The company that breeds cats like the guy at the bottom here found a few cats that didn’t cause allergies, and bred more of them.
What traits do you like most in cats? What would you like to see?
Allerca bred out cases where the Fel d 1 glycoprotein was a version that caused allergic reactions. The process uses gene sequencing to detect rare naturally occurring genetic divergences in cats.

Slide 3

What kind of domestic animal is this?
KS1 and KS2:
Humans breed horses to look and act specific ways
What do you think are the most important things that make up a good horse?
Strong muscles?
Good eyesight?

What might you want to breed out of horses? Do they have any problems that should be fixed?

Slide 4

KS1 and KS2
Would those things you suggested in the previous slide be good all the time?
A large horse would have trouble finding food on a small island
A black horse would stand out in the desert.
Having lots of different types of horses makes sure that some of them will always survive changes in the environment

Slide 5

KS1 and KS2
What kind of animal is this?
This is a zebrafish. You can often find it in home aquaria. It’s pretty small – only a few centimetres long
Why do you think it is called a zebrafish?

Slide 6

KS1 and KS2
What animals are these?
They’re jellyfish
Under the right light, some jellyfish are fluorescent, and you can get both yellow and green colours.
You can get red fluorescence from a sea coral

Slide 7

KS1 and KS2
What is different about these zebrafish?
They are not striped, and they are different colours.
Instead, they’re called glofish.
The colours are not normally found in zebrafish.
The genes for these colours are taken from the coral and the jellyfish, and added to the zebrafish

Slide 8

KS1 and KS2
What do you think a fishberry is? Can you tell from the name? Do you know what antifreeze is? – it gets put into cars in the winter.
Some scientists tried to make tomatoes resistant to frost by putting a fish antifreeze gene into it. It never worked, but the media picked up on it anyway. “fishberries” –  tomatoes and/or strawberries with the flounder antifreeze gene – were researched, but never worked properly. A bit of an urban legend. See
What are some other ideas for plants that might help them survive bad weather, diseases, or insects?
Scientists have lots of ideas, but they don’t always work. Also, scientists are very careful and try to ensure that the combinations they make are good ones. Lots of testing!

Slide 9

KS1 and KS2
We use germs to make medicine!
The germs in the picture live in our guts, and help us out in digesting our food.
You might have drunk some if you have had a probiotic drink.
Some examples of good choices discussed in previous slide: Using “germs” to make medicine.
Insulin (Diabetes)
We can put the human gene for insulin into these guys, and they will make the medicine for us
Originally from cow, horse, pig or fish pancreases , but now 70% of insulin sold is recombinant (2002).(
With cells dividing rapidly (every 20 minutes), a bacterium containing human cDNA (encoding for insulin, for example) will shortly produce many millions of similar cells (clones) containing the same human gene. : 51 amino acids long
Adding vaccines for humans into food crops/animals (into tomatoes, e.g.)

Slide 10

KS1 and KS2
We can try to fix mistakes in our own bodies!
Target a specific area: Diseases of the eye:
Cystic fibrosis:
It is a single gene defect.
The lung is most affected.
Most heterozygote carriers have approximately 50 % CFTR function and are completely asymptomatic.
(others include Haemophilia)

Slide 11

KS1 and KS2

we change things, and have done for 1000s of years
The tool is not the issue: it used to be just selective breeding, now there are new tools
each individual change has to be thought about, to determine if it is a good idea or not

Slide 12

KS1 and KS2

pigs with less saturated fat (not happened yet, but people talking about it):
spider silk from goats’ milk (in the original study, 5x stronger than steel, by weight, and very flexible – bulletproof vests!):
and now in 2008 from alfalfa , as it would otherwise take 600 lbs of goats’ milk to make one bulletproof vest!
caffeine-free coffee plants:
no-tears onion:

Slide 13

KS1 and KS2

Just a nice picture of different-coloured bacteria on a plate.
Shinomura, Chalfie, and Tsien shared this year’s Nobel Prize for Chemistry for their work on green fluorescent protein, originally isolated from a jellyfish. Science is interesting and beautiful!
This is the last true slide. The one after this links to the licenses for the photos, and the ones after that are just in case we want to show them.

Slide 14 – no notes

Slide 15 – extra

This slide is too old for them, but put it at the end in case there is a specific question by a precocious kid or an adult.
A weakened strain of the common bacterium, Escherrichia coli (E. coli), an inhabitant of the human digestive tract, is the ‘factory’ used in the genetic engineering of insulin. (

Slide 16 – extra

Fluorescent mice. However, the kids might be scared of this pic, or not like it, so include it at the back and only use it if it seems appropriate.

Housekeeping & Self References Papers Research Blogging Software and Tools Standards

Modeling and Managing Experimental Data Using FuGE

Want to share your umpteen multi-omics data sets and experimental protocols with one common format? Encourage collaboration! Speak a common language! Share your work! How, you might ask? With FuGE, and this latest paper (citation at the end of the post) tells you how.

In 2007, FuGE version 1 was released (website, Nature Biotechnology paper). FuGE allows biologists and bioinformaticians to describe any life science experiment using a single format, making collaboration and repeatability of experiments easier and more efficient. However, if you wanted to start using FuGE, until now it was difficult to know where to start. Do you use FuGE as it stands? Do you create an extension of FuGE that specifically meets your needs? What do the developers of FuGE suggest when taking your first steps using it? This paper focuses on best practices for using FuGE to model and manage your experimental data. Read this paper, and you’ll be taking your first steps with confidence!

[Aside: Please note that I am one of the authors of this paper.]

What is FuGE? I’ll leave it to the authors to define:

The approach of the Functional Genomics Experiment (FuGE) model is different, in that it attempts to generalize the modeling constructs that are shared across many omics techniques. The model is designed for three purposes: (1) to represent basic laboratory workflows, (2) to supplement existing data formats with metadata to give them context within larger workflows, and (3) to facilitate the development of new technology-specific formats. To support (3), FuGE provides extension points where developers wishing to create a data format for a specific technique can add constraints or additional properties.

A number of groups have started using FuGE, including MGED, PSI (for GelML and AnalysisXML), MSI, flow cytometry, RNA interference and e-Neuroscience (full details in the paper). This paper helps you get a handle on how to use FuGE by presenting two running examples of capturing experimental metadata in the fields of flow cytometry and proteomics of flow cytometry and gel electrophoresis. Part of Figure 2 from the paper is shown on the right, and describes one section of the flow cytometry FuGE extension from FICCS.

The flow cytometry equipment created as subclasses of the FuGE equipment class.
The flow cytometry equipment created as subclasses of the FuGE equipment class.

FuGE covers many areas of experimental metadata including the investgations, the protocols, the materials and the data. The paper starts by describing how protocols are designed in FuGE and how those protocols are applied. In doing so, it describes not just the protocols but also parameterization, materials, data, conceptual molecules, and ontology usage.

Examples of each of these FuGE packages are provided in the form of either the flow cytometry or the GelML extensions. Further, clear scenarios are provided to help the user determine when it is best to extend FuGE and when it is best to re-use existing FuGE classes. For instance, it is best to extend the Protocol class with an application-specific subclass when all of the following are true: when you wish to describe a complex Protocol that references specific sub-protocols, when the Protocol must be linked to specific classes of Equipment or Software, and when specific types of Parameter must be captured. I refer you to the paper for scenarios for each of the other FuGE packages such as Material and Protocol Application.

The paper makes liberal use of UML diagrams to help you understand the relationship between the generic FuGE classes and the specific sub-classes generated by extensions. A large part of the paper is concerned expressly with helping the user understand how to model an experiment type using FuGE, and also to understand when FuGE on its own is enough. But it also does more than that: it discusses the current tools that are already available for developers wishing to use FuGE, and it discusses the applicability of other implementations of FuGE that might be useful but do not yet exist. Validation of FuGE-ML and the storage of version information within the format are also described. Implementations of FuGE, including SyMBA and sysFusion for the XML format and ISA-TAB for compatibility with a spreadsheet (tab-delimited) format, are also summarised.

I strongly believe that the best way to solve the challenges in data integration faced by the biological community is to constantly strive to simply use the same (or compatible) formats for data and for metadata. FuGE succeeds in providing a common format for experimental metadata that can be used in many different ways, and with many different levels of uptake. You don’t have to use one of the provided STKs in order to make use of FuGE: you can simply offer your data as a FuGE export in addition to any other omics formats you might use. You could also choose to accept FuGE files as input. No changes need to be made to the underlying infrastructure of a project in order to become FuGE compatible. Hopefully this paper will flatten the learning curve associated for developers, and get them on the road to a common format. Just one thing to remember: formats are not something that the end user should see. We developers do all this hard work, but if it works correctly, the biologist won’t know about all the underpinnings! Don’t sell your biologists on a common format by describing the intricacies of FuGE to them (unless they want to know!), just remind them of the benefits of a common metadata standard: cooperation, collaboration, and sharing.

Jones, A., Lister, A.L., Hermida, L., Wilkinson, P., Eisenacher, M., Belhajjame, K., Gibson, F., Lord, P., Pocock, M., Rosenfelder, H., Santoyo-Lopez, J., Wipat, A., & Paton, N. (2009). Modeling and Managing Experimental Data Using FuGE OMICS: A Journal of Integrative Biology, 2147483647-13 DOI: 10.1089/omi.2008.0080