Science Online London 09: Thoughts, not Transcript

First off, I’d like to thank the many people who re-tweeted my blog posts throughout Science Online London this past Saturday. With your help, Saturday was my best day ever for visits to the site. I hope people enjoyed my posts, and perhaps stayed long enough to find out what I blog about when I’m not at conferences (those I’m most proud of include a day I spent at a primary school last year, and a co-authored post with Frank Gibson on attribution versus citation).

The Royal Institute
The Royal Institute

Those solo09 posts I wrote on Saturday were intended mainly as notes, as a transcript of what went on. It helps me concentrate to take notes, and due to my fabulous parents talking me into taking typing classes in high school, I am able to (mostly) keep up with presentations! But I wasn’t the only one blogging, and many people since Saturday have been writing up and posting their thoughts: Martin Fenner has been keeping track of what seem to be all blog posts about solo09, so please visit his post to find out what everyone thought of the day.

My blog posts on the day were a record of the day’s presentations, from my point of view. Today’s post is more personal – it was my first time at a Science Online conference, and this is a record of my impressions.

The day started very early for me, though I was not alone in this. I was on a 6am train, and managed to find my way to the Royal Institue (my first visit) before 8:30am. Luckily, they had already laid out the name badges of people whose first name began with “A”, and I grabbed my badge and went to see how many people were around. After geeking out way too much when I met Cameron Neylon for the first time in the physical world (when discussing online avatars with him I tried a bad pun referencing the recent Guild music video about avatars which fell a bit flat), I went for a wander around the building. In one of the libraries I found this book, which amused me:

A book in one of the libraries at the Royal Institute. Memoirs of Libraries, in a library!F
A book in one of the libraries at the Royal Institute. Memoirs of Libraries, in a library!

Then I wandered upstairs and had a look at the Faraday Theatre, with its surprisingly uncomfortable seating but beautiful fittings and fantastic ambience. Just a tip though – watch out for the Ambulatory Displays up there on the first floor. The British Library had a table set up in a prime position opposite the Faraday Theatre, and at that table I met some BL people as well as Stewart Wills, an Editor for Science. I had never spoken with a Science editor before, and I had a really enjoyable conversation with him and the BL people about wildflowers and ontologies for 20 minutes or so, until it was time for the conference to start.

I won’t go heavily into the presentations, as I have already covered them. Suffice to say I thought they were all very interesting, often entertaining, and definitely educational. While I would have loved to have much more time for open discussion at the end of each presentation, that didn’t spoil my enjoyment. I had my first experience with Second Life, and watching the odd behaviors of the avatars in it was almost hypnotic. One seemed to be playing the spoons or typing on an invisible keyboard or something. Many others seemed to be hanging off an invisible wire in their back, and others flounced, tilted alarmingly, or even looked attentive.

I will choose a favorite presentation though: I loved the theatrics and the content of John Gilbey. He presented a number of speculations about the far future, and said that we could all vote for our favorite by emailing him in the next week. Then, he’ll do his best to write about it in the context of the University of Rural England and get it into print 🙂 Fun! You can email him at gilbey@bcs.org.uk.

I had a number of good conversations with Sara Fletcher of Diamond Light Source about power cables, last year’s Science Online, and meeting people in the real world who you’ve gotten to know only through the (unreal?) world of the Internet. We were the ones sitting near the annoying ringing iPhone during the metrics/statistics talk by Richard Grant and others. No, it was NOT our phone, and yes, we tried to find it to turn it off but were unsuccessful.

It was great seeing bloggers made flesh: Petra Boynton, Jack of Kent, Cameron Neylon and Peter Murray-Rust were just a few of the people I either listened to or spoke with for the first time. Peter, Phil Lord and I had a great conversation about ontologies OWL ontologies – well, about semantics.

I left London that evening, this time on a full train of tired people wanting to get home that was in stark contrast to the quiet, empty train and the beautiful sunrise that began the day. I had a great experience and my thanks goes out to all the organizers and people who helped make Science Online London work. I am now more interested in Google Wave, still want a single unifying identifier for me and my online personas (one identifier per persona, or one per person?)  and am more aware of the legal implications of blogging. I feel like I’ve increased not just my knowledge of all things science and online, but also the size of my online science community, which is a community that has enriched my research environment and work life more in the past year than I ever thought possible. The Life Scientists, Science 2.0, Twitter and my good friend Google Reader keep me in touch with all of the other blogs of science of friends and colleagues, and I’m following many more after Science Online. I am a better scientist and researcher because of my connections to this community – Thank you all!

Breakout 3: Author identity – Creating a new kind of reputation online (Science Online London 2009)

Duncan Hull, Geoffrey Bilder, Michael Habib, Reynold Guida

ResearcherID, Contributor ID, Scopus Author ID, etc. help to connect your scientific record. How do these tools connect to your online identity, and how can OpenID and other tools be integrated? How can we build an online reputation and when should we worry about our privacy?

Geoff Bilder:

Almost every aspect of a person can change without the person themselves changing. So, you want to have an identifier that is a hook to you, and which is better than a name (which is changeable). What about retinal scans? Fingerprints? OpenID? Where does your profile come in? A profile is a collection of attributes that you use to describe who you are. With author identity, what we want is the ability to get at the profile of a person in an unambiguous manner. Until we have such a thing, how do you tell people what your canonical profile is? To complicate matters even more, each user will want multiple personas, each with their own profiles.

When talking about identity, two issues are often conflated: identity authentication and knowledge discovery identity system. That is, you must be more rigorous in determining swho someone is (logging into your identity) than in figuring out who wrote a paper. Further complications occur in the lossy conversion between languages of authors’ names.

Whatever is done, has to be done on an international scale, must be interdisciplinary, and must be interinstitutional. The oldest content cited thus far in CrossRef (with a DOI) is from the 1600s. What happens when you die to your identifier? A final issue is scale: there are about 200K new DOIs per month, and even if we guess at 5 authors per DOI, then there could be between 5-21K failures of identification per month if you estimate a 96-97% success rate for author identification.

Duncan Hull:

He spoke about openID is science, among other things. Currently, authentication of people is very different in most online applications, and is generally only done with a simple username and password combination. Simon Willison (The Guardian) estimates that the average online user has at least 18 user accounts and 3.49 passwords. OpenID is trying to end up with a situation where there are fewer usernames AND passwords.

OpenID works by redirecting you to your openID provider to log in, then sends you back to the location you started at. However, having a URL as a username is not very intuitive. Further, logging in via redirection can be confusing. Therefore while adoption of openId is growing, it may not properly take off until browsers and other vendors support it better. Mentioned myExperiment as something which accepts openId.

Michael Habib:

Michael presented a nice diagram: a square divided into 4 parts, with “about me” and “not about me” across the top, and “by me” and “not by me” down the side. It is the “not” category for both where the disambiguation of people is the most important. He used the example of Einstein and the LC Authority Files to figure out what all of the different versions of his name are.

Completely different from the LC Authority files, which is manually and carefully checked by only certain people, is ClaimID. ClaimID is a way to collect all aspects of your identity in one place. However, it is dependent upon each individual being truthful about what they have ownership over.

Another approach is the Scopus Author ID, which is completely machine aggregated. It is validated by publications, and scales well. It has 99% precision and 95% recall. The cons is that it is impersonal, and those precision and recall values really aren’t very good when you consider that this is about ownership of an article, and that there are a very large number of people.

There is also 2collab, where you can combine author ids (that you know about) into one identity. Then, you can add any other item on the web that is about you.

Reynold Guida (from Thomson Reuters):

They’ve built software to try to address author identity and attribution. If you look at the literature since 2000, communication and scientific collaboration has really changed. What we notice is that the number of multi-author papers has started to increase, while single-author papers have decreased. A google search for common surnames really highlights the problems associated with identity. Name ambiguity is a real problem. The connection between the reseacher and the institution and the community is a real problem. Two of the most important parts in this discussion are who do I know, and who do I want to know? The connections a person makes affects all aspects of their career.

Therefore they have created researcherId (free, secure, open). Privacy options are controlled by the user, even if the institution created the record. There is integration with EndNote, Web of Knowledge, and other systems to help build publication lists. You can link to / visualize your researcher id profile really easily from your own websites.

Discussion:

Question: Has anyone thought through the security implications of these single ID systems: one slip-up and your entire identity has been hacked? GB: Multiple identities encourages poor behaviour, as the thought of changing your password everywhere is so overwhelming that people don’t do it. But yes, these problems exist. However, the tradeoffs make it worthwhile to their minds. You should NOT conflate knowledge issues with security issues. This is because information for your scholarly profile is, by definition, public anyway.

Question: Do different openId providers and author id and researcher id know about each other in the computational sense? Not really yet.

Question: What about just making the markup of the web more semantically friendly? DH: The Google approach is a good one. RG: It’s all about getting the information into the workflow.

Question (Phil Lord): What worries me is that there has been a big land grab for author identity space: for example, you cannot log into Yahoo with any other open id than a Yahoo open id. There’s a lot of value in being in control of someone’s id. Therefore there is a big potential danger. GB: For every distributed system, you need a centralized indexing thing to get it to work correctly. Therefore we need to make sure that if a centralized system appears, there should be accountability.

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Real-time statistics in science (Science Online London 2009)

Victor Henning, Richard Grant, Virginia Barbour

Academic prestige, setting research trends, getting jobs and tenure, grant funding – they are largely based on publishing in high-Impact Factor journals and getting citations. Not only are these measures flawed and widely critized: “You could write the entire history of science in the last 50 years in terms of papers rejected by Science or Nature”, said Nobel laureate Paul Lauterbur. Citation measures are also subject to a considerable time-lag. If you write a paper today, it takes a year to get it published, and another year passes by until citations of it appear. What if there were alternative measures of scientific impact? What if these measures were available in real-time, letting you track the trends in your discipline as they develop? That’s what we’ll discuss in this session.

Richard Grant:

Employers like metrics to discover if they’re spending money in the right places. Researchers want to see that what they’re doing is relevant. This is why we want metrics. But what can metrics do, and what can’t they do? Impact factors: doesn’t actually tell you how good research is in a given journal. He is involved in the qualitative assessment of articles. More like a FriendFeed method of assessment. Corporate bit: http:/f1000.com. The crucial thing they want to have is quality. What they do at f1000 is pretty slow, by necessity. There is also, though, a tying-in with the community.

Virginia Barbour:

She’d like to reclaim the word “impact” from “impact factor”. How do you assess quality: usage, media coverage, blog coverage, expert ratings, discussion thread activity, who is reading it, who is citing it, where the research was done, effect on public policy? No single one is one you should rely on. Traditional measures are often not the most important. Many feel that the way papers are being evaluated is actually detrimental to the research process. Most users of journal sites are not coming via the home page – they’re coming via Google and other methods: people just don’t start at the first page of a journal and read through.

NEJM is changing the way their front pages look and the Journal of Vision is changing the way the metrics are displayed. At PLoS, in Phase 1 they want to have data that isn’t owned by someone else – that we can actually use and verify. In Phase 2, they also want to have the number of downloads of the article. This data will be broken down by the type of views. They also want to make the metrics more sophisticated, with more sources for each data type, more sophisticated web usage data, provide tools for analysis, and more.

Victor Henning:

Used last.fm as an analogy for article metrics, and as an introduction to Mendeley. In this way, you can track article pervasiveness in reference manager libraries, track article reading time in PDF viewers, and track user tags and ratings. One key difference with Mendeley and last.fm is privacy: they believe that some scientists don’t want others to know what literature they find interesting.

They have synchronization with citeulike, and will shortly have synchronization with Zotero. The goal of all this is to aggregate statistics for their users. All of the information is available by academic discipline, geographic region, and more. Once we’re at the point where there are true article metrics, this can be the basis for individualized recommendations.

Discussion:

Question: It seems we’re replacing a single impact factor with a large number of new ones. How do you forsee people managing and understanding all of those metrics? RG: we’re not in the business of replacing the impact factor – just providing more information to the researcher.

VB: I can imagine that people will be able to go to grant funding agencies and tell them how much coverage in all sorts of media your paper received.

Question (Phil Lord): I worry about reading times as a measure of quality. In music the listeners and musicians are largely disjoint. In science this is definitely not true. Many of the metrics mentioned are very much open to fiddling and self-citation. What do you say about this? VH: We’re not advocating replacing the impact factor. However, it is always better to have more data, more metrics.

Question: I print out my articles. How will that affect things?

Missed most of the rest of the discussion because of a phone that wouldn’t stop ringing – see the Twitter hashtag #solo09 for all the gory details.

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Google Wave: Just another ripple or science communication tsunami? (Science Online London 2009)

Cameron Neylon, Chris Thorpe, Ian Mulvany

Google Wave is a new tool for communication and collaboration on the web that will be released later this year. For this session we plan a live demo of the prerelease version of Google Wave to show off the potential for scientists.

What can you do with a wave? Make robots, embed into blog, build gadgets. Robots (server side) can inspect data within a wave, then go and do something about it and change the content within a wave. For the geeks, it’s powered by webhooks. You can put waves anywhere, into any HTML file. Changes are immediately propogated to every embedded wave. Therefore, if you make a comment on a waved blog, that comment appears wherever people have requested it. It makes flame wars almost immediate 🙂

Gadgets (client side) extend the functionality of waves, and are xml-based and store their data within a Wave. Changes can be replayed and are stored on a per-user/wavelet basis.

Cameron then live-demoed a wave by writing something “like an email” and showed how it propogated to other users. (Ian said “o noes! i iz in ur wave editing ur text”. Highly amusing. But they’re just showing versioned instant messaging, right now. cool, but I would like to see more.) He can invoke the Guardian robot with “?guardian” and the search results are put right back into the wave. There’s also a robot for chemspider, and another for producing Latex figures (Watexy).

They also showed Igor, a robot which helps retrieve citations. Also Graphy which, as the name suggests, produces basic graphs from text that look suspiciously like what you might want an SBML pathway to look like!

The entire Google Wave system is going to be open-sourced. Most of the client architecture is HTML 5 and Javascript. Google had a robot (not public) that would translate into another language as you typed – supposedly quite resource hungy?

What would make people use it who aren’t geeks? At the moment, it is difficult to get used to using the interface. Also, it doesn’t yet integrate with email as we know it. However, Cameron Neylon says that it’s easier than it looks to use, so once they sort the interface it should become popular.

IM: If Google wave is as easy to install by institutions as a wiki setup, then it might work and really help collaborations and sharing. Even more so if Wave successfully integrates email.

More short notes about the demo and discussion:

  • CN: I have the feeling it will be very very good at taking collaborative note taking during talks.
  • People can edit each other’s comments, and there is versioning so you can see how things have changed.
  • Wave is much more efficient in terms of resources – not a whole series of gets, but instead a few puts (if I understand this correctly).
  • One problem: Google Wave can’t be used offline. Is there any way to get some limited functionality offline?

Phil Lord suggested that google wave might be good for collaborative ontology development. (I agree!)

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Far out: Speculations on science communication 50 years from now (Science Online London 2009)

John Gilbey

This session will discuss future models for online science communication – but on a timescale well beyond the usual technology horizon. To judge the role of science communication in possible futures, we need to assess how research itself will be carried out in the future. In many scenarios online communication becomes the core enabling force – rather than a useful adjunct – and we can speculate as to the form that communication might best take.

He will be discussing science communication “in the broadest sense”.

(Who is he? A science fiction author; a former research scientist. You may have seen his work in Nature, New Scientist, Times Higher Education, Guardian, Nature Physics. He is speaking only on behalf of himself, and not on behalf of any of his employers.)

If he could distill everything he’s learnt about the scientific process and create the fictional “University of Rural England (URE)”, where things are not always as they seem, and where students and faculty suffer the same weaknesses. Then he switches to a synopsis of the first Nature journal in 1869, where in the editorial TH Huxley said the people in 50 years would look at the back issues of Nature “not without a smile”. We’re in danger of losing that connection, he says.

Then he moves on to talking about Second Life, and speaks about physical representations of a virtual space being captured in a digital media and re-presented back in SL. 🙂 To his generation, internet/computers/etc are still the future, even though they’re here. To younger people, they are the present – this is a different way of thinking.

Three options for the future: 1) steady state 2) step change (significant developments) 3) surprise parties (major unexpected advances completely changing the game). So, back to URE, the fictional place where he has sci-fi story ideas: machine-enhanced clairvoyance for science quality auditors; network developments expose a temporal portal to allow historic (dead) research leaders to be employed on projects; digitally-supported thought control of higher mammals. Speculation 1: in 50 years’ time, the world political, economic and social structure will change radically. In that case, who will our sponsors be for research? How “free” will the science community be? Will science be encouraged to engage with the wider social environment? If it was your job on the line, would you lie or toe the party line?

Will you suffer for your integrity?

Speculation 2: Virtual reality in some form will become ubiquitous in society across the globe: location becomes irrelevant, scientists become nomadic, opportunities for citizen science increase, social involvement with science grows. Speculation 3: significant environment events will spur major increases in research activity: science profile is raised significantly, there is a greater need for communication of science. Speculation 4: Society crashes totally following an unrecoverable Internet failure.

Email him in the next week to vote as to which scenario you’d like to see the URE address, and he’ll do his best to get it into print! He’s at gilbey@bcs.org.uk

(This was one of my favorite talks of today.)

Question: will our universities be around in 50 years’ time? JG: I think they will be, but in a radically-different form. “VR” classes, for one.

FriendFeed Discussion

And there was at least one other Science Online London attendee blogging this presentation – take a look!

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Cat herding: The challenges and rewards… (Science Online London 2009)

…of managing online scientific communities

Arikia Millikan, Corie Lok, Ijad Madisch

This session will provide you with an inside look into how online science communities are built and maintained. We will discuss how to manage expectations, social/cultural issues, the role of moderation, differences between science communities and ‘other communities’, and how to encourage diversity/debate whilst maintaining some sort of order. You’ll come away with tips on how to successfully build community and maintain it throughout flame wars and other tribulations.

Arikia Millikan (scienceblogs):

Scienceblogs started as a network, and ended up as a community. It was a very successful project. To do things well, you need a diverse selection of bloggers, which makes the cat herding more difficult. To have a successful blogging community, you first need a solid technological foundation. Secondly, you need acknowledgement, accessibility and analytics (it can be a very good motivational tool to see who is looking at your site and how often). Higher up in the hierarchy, you need to allow identity and individuality for your bloggers. When all these aspects are present you have a good community, but if any of these components fail you start to have problems.

If needs aren’t met, a lot of the energy that normally gets sent to achieving science is made more destructive. This is when you get the dreaded flame wars, which has happened in the past at scienceblogs. This is where the community management skills come in. Something important to know about such situations is that they are not always bad – you can learn important lessons from them. There are also a lot of rewarding aspects working at scienceblogs: it’s never boring, and there’s lots of nice science, and it is emotionally rewarding (all the benefits of a strong community).

Corie Lok (Nature Network):

She described what she’s seen work in a community of bloggers. Bloggers in such a community need to remain online and engage with the commenters. Friends can help you do this. You have to find the right balance in the volume of your postings. The main thing that the bloggers struggle with is incentive: it’s hard to get payback from the mainstream community. Seeding the blogging community with people other scientists want to interact with has been very important in making the community successful. Having forums such as the Ask the Nature Editor forum, and on fluorescence imaging in the life sciences has been really useful.

There’s a new website for Parkinson’s set up by the Michael J Fox foundation that has been really successful, and is a really great model of how things could proceed.

Ijad Madisch (co-founder of ResearchGATE):

He’s originally a medical doctor and is also working in Computer Science. ResearchGATE helps researchers with targeted, rapid-response Q&A and provides efficient literature search and professional bookmarking. For institutional users, ResearchGATE offers a comprehensive communication platform and collaboration tools for promoting inter-disciplinary research.

The greatest challenges are to serve a variety of disciplines and to perform community management on a large scale. However, they’re proud of the global online community they’ve created, and the feedback they’ve received. There is a new API coming soon to enable scientists to add their own applications to the ResearchGATE platform.

Discussion:

Question: What percentage of your registered users are active? IM: 30-35% log in at least once per month in ResearchGATE. Similar numbers for Nature Network.

Question (Matt Brown): How do you guard against spam attacks, and what do you do about potentially libelous (or similar) comments, or other legal problems? CL: There is spam software in place. In terms of legal aspects, UK libel law puts publishers in a difficult position: publishers are better off not moderating content in a legal sense. So you have to find a balance. IM: They don’t have that many spam problems, but they have created a reporting system. ResearchGATE has one lawyer, working full time. AM: They’ve had some pretty bad spam problems in the past, but the community helps out a lot with this. Bloggers also moderate their own comments. Legally, there hasn’t been that many problems yet.

Question: Is there an optimum group size for a blogging or online research community? CL: She hasn’t seen any correlation between group size and activity. AM: There may be capacity issues – as the group gets larger, it’s harder to sustain the interpersonal aspects of it.

Question: What is the effect of networks like this on scientists’ productivity – it would be great if that information were published more? Are the networks ever acknowledged in the paper? CL: There have been fruitful collaborations that came out of Nature Network. IM: Collaborations at ResearchGATE have resulted in at least one paper. However, less tangible things such as discussions are harder to quantify.

Question (Cameron Neylon): It would be really nice to see some serious social anthropology happening with these communities, and even for the communities themselves to consider funding.

Question: There are a number of social networks available. How do you coordinate where and when to pull the information from? CL: Over time, we just see which ones survive. But it’s good to have choice. IM: They want to connect ResearchGATE to science-related comments on FF and Twitter. AM: It’s quite exciting right now to see what features and useability and tools will take off and be successful.

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Breakout 1: What is a scientific paper? (Science Online London 2009)

Lee-Ann Coleman, Katharine Barnes, Enrico Balli, Theo Bloom

Is the traditional paper format derived from the printed paper still appropriate today? How can new kinds of content such as audio, video, 3D structures, etc. be integrated into a research paper? Can a scientific paper contain just datasets or descriptions of a method? And how does free access to a paper change the way we use the information contained in a paper?

From Katharine Barnes (Nature Protocols):

Is the traditional paper format derived from the printed paper still appropriate today? How can new kinds of content such as audio, video, 3D structures, etc. be integrated into a research paper? Can a scientific paper contain just datasets or descriptions of a method? And how does free access to a paper change the way we use the information contained in a paper?

What is a scientific paper at Nature Protocols? Peer-reviewed and edited articles. Network Protocols are not papers, but are more like blogs in that they’re published and comments are invited. They believe movies about protocols are very important. Specifically, JOVE is a good resource.

Other innovations at NPG include: nature precedings, nature chemistry, and thinking of improvements to articles. For instance, the presentation of the article could be improved, stats about the article could be visible, etc. They maintain a traditional view of the basic unit of what constitutes a published paper (peer reviewed and edited). However, they are keen to enhance the basic paper as much as possible with additional material. Her question to us is how far can we go from the model of the traditional paper?

From Theo Bloom (PLoS Biology):

Science publishing has come a long way in the past 50 years, but the current definition of “paper” no longer really works in her opinion. She thinks it’s time for a radical overhaul. For instance, in the “Finishing the euchromatic sequence of the human chromosome”, Nature simply couldn’t fit all the authors on the first page. Also, sometimes the results are just too big for the traditional paper format, or indeed for any journal to host. Perhaps central databases could provide a snapshot of the data at the point in time associated with publishing of an article.

There’s also much that can be done for visualization of figures in papers and display of specialized media types. How best to match the data to the experiment? You need to datestamp and store results appropriately. In an era of machine-readable factoids, how and where does the author express a view? A Crick and Watson style 1-page view of the data? The time is ripe to integrate references with databases for real-time analysis. Specifically, look at what Shotton et al. did for a paper originally published in PLoS Tropical Diseases.

From Enrico Balli:

SISSA started publishing scientific content in 1991, and since then have had to consider reprints, archiving and more. In 1997 they started publishing journals, which was not the original intent (if I am interpreting the talk correctly). What they published was online-only journals, and since it is not physical, the type of content can be very different from what otherwise could be published.

They have normal proceedings, but they also publish posters. Is a poster a paper? Can it be reviewed? They also publish lectures (video and/or slides). They publish some in collaboration with the British Institute of Physics. Manuals are also published by them, which are not “papers”. How can you really review and edit manuals for software? Another instance of the author problem happened with the ATLAS research at the CERN Large Hadron Collider, where there were more than 5000 people working on the project. What does it mean to have such author lists?

They are thinking about a new project due out next year called “The Journal of Stuff”. All the details haven’t been worked out yet. It is a kind of “un-Journal”.

Discussion:

Creating enhanced content is more costly than just typing. So, who should be covering the burden of those costs? Should you attempt to preserve that information long-term? How do you future proof it? The people who fund the research should fund the dissemination of that research: this is the standard method for open-access journals (by TB). For some reason, some researchers find it very difficult to find the original data, which isn’t how it should be (by KB). Publishers are not obliged to provide digital libraries their information.

Audience: It can be very hard to organize the data in a way that others can understand, e.g. when there are petabytes of data. It also seems like what constitutes a paper in the biological field (as opposed to physics) is much more constrained.

Peter Murray-Rust: It’s good, but what’s happening is not nearly enough. He feels the scientific paper is appalling at communicating in the modern world: dense text, for example, is not very communicative. Universities cannot afford to innovate because they have to publish with conventional, high-impact journals. This means that publishers are actually holding back innovation.

Phil Lord: The idea that publishing the data in a way that is plausibly useful is cheap to do is wrong: it can be quite expensive. Of course, it’s completely worth it to publish the data to help both others and you in a couple of years. There’s this bifurcation of data: on the one hand a relatively content-free paper for the RAEs, and on the other hand the data in a database. We don’t really know how to link these two things together. At the moment, we still have to publish, as the RAEs and similar demand it.

(Didn’t catch the name): Agreed with Phil: the description of the research is a very important item to be archived. What really needs to be mentioned is corrections – corrections need to be logged and tracked. What we really need are clear, agreed, annotated databases for all these papers.

TB: Giving credit back to the source of information, e.g. the original paper describing a new knockout mouse, isn’t always done and should be. This often happens with journals that do not allow unlimited references.

Cameron Neylon: The problem isn’t with the paper (publishing a summary or discussion of some research), but with the journal. Filtering and peer review are useful, but the use of the “legacy” paper format is not useful.

Question: Will open source and open access come together in the publishing world? With PLoS, their publishing platform is open source, and they try to use open-source software where possible. Any software it published has to be open source (Answered by TB).

Question: What about redefining papers as open-source software? This would be a situation where the paper is constantly changing and undergoing version changes, as software is. TB: The versions MUST be date stamped.

FriendFeed Discussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!