I live blogged Cameron Neylon’s talk today at Newcastle University, and I did it in a Wave. There were a few pluses, and a number of minuses. Still, it’s early days yet and I’m willing to take a few hits and see if things get better (perhaps by trying to write my own robots, who knows?). In effect, today was just an exercise, and what I wrote in the Wave could have equally well been written directly in this blog.
(You’ll get the context of this post if you read my previous post on trying to play around with Google Wave. Others, since, have had a similar experience to mine. Even so, I’m still smiling – most of the time
)
Pluses: The Wave was easy to write in, and easy to create. It was a very similar experience to my normal WordPress blogging experience.
Minuses: I wanted to make the Wave public from the start, but have yet to succeed in this. Adding public@a.googlewave.com or public@a.gwave.com just didn’t work: nothing I tried was effective. Also, the copying and pasting simply failed to work when copying the content of the Wave from Iron into my WordPress post in Firefox: while I could copy into other windows and editors, I simply couldn’t copy into WordPress. When I logged into Wave via Firefox, the copy-and-paste worked, but automatically included the highlighting that occurred due to my selecting the text, and then I couldn’t un-highlight the wave! What follows is a very colorful copy of my notes. I’ve removed the highlighting now, to make it more readable.
I’d like to embed the Wave here directly. In theory, I can do this with the following command:
[wave id="googlewave.com!w%252BtZ-uDfrYA.2"]
Unfortunately, it seems this Wavr plugin is not available via the wordpress.com setup. So, I’ll just post the content of the Wave below, so you can all read about Cameron Neylon’s fantastic presentation today, even if my first experiment in Wave wasn’t quite what I expected. Use the Wave id above to add this Wave to your inbox, if you’d like to discuss his presentation or fix any mistakes of mine. It should be public, but I’m having some issues with that, too!
Cameron Neylon’s talk on Capturing Process and Science Online. Newcastle University, 15 October 2009.
Please note that all the mistakes are mine, and no-one else’s. I’m happy to fix anything people spot!
We’re either on top of a dam about to burst, or under it about to get flooded. He showed a graph of data entering GenBank. Interestingly, the graph is no longer exponential, and this is because most of the sequence data isn’t goinginto GenBank, but is being put elsehwere.
The human scientist does not scale. But the web does scale! The scientist needs help with their data, with their analysis etc. They’ll go to a computer scientist to help them out. The CS person gives them a load of technological mumbo jumbo that they are suspicious of. What they need is someone to interpolate the computer stuff and the biologist. They may try an ontologist, however, that also isn’t always too productive: the message they’re getting is that they’re being told how to do stuff, which doesn’t go down very well. People are shouting, but not communicating. This is because all the people might want different things (scientists want to record what’s happening in the lab, the ontologist wants to ensure that communication works, and the CS person wants to be able to take the data and do cool stuff with it).
Scientists are worried that other people might want to use their work. Let’s just assume they think that sharing data is exciting. Science wants to capture first and communicate second, ontologists want to communicate, and CS wants to process. There are lots of ways to publish on the web, in an appropriate way. However, useful sharing is harder than publishing. We need the agreed structure to do the communication, because machines need structure. However, that’s not the way humans work: humans tell stories. We’ve created a disconnect between these two things. The journal article is the story, but isn’t necessarily providing access to all the science.
So, we need to capture research objects, publish those objects, and capture the structure through the storytelling. Use the MyTea project as a example/story: a fully semantic (RDF-backed) laboratory record for synthetic chemistry. This is a structured discipline which has very consistent workflows. This system was tablet-based. It is effective and is still being used. However, what it didn’t work for was molecular biology / bioengineering etc — a much wider range of things than just chemistry. So Cameron and others got some money to modify the system: take MyTea (highly structured and specific system) and extend it into molecular biology. Could they make it more general, more unstructured? One thing that immediately stands out for unstructured/flexible is blogs. So, they thought that they could make a blog into a lab notebook. Blogs already have time stamps and authors, but there isn’t much revision history therefore that got built into the new system.
However, was this unstructured system a recipe for disaster? Well, yes it is — to start with. What warrants a post, for example? Should a day be one post? An experiment? There was little in the way of context or links. People who also kept a physical lab book ended up having huge lists of lab book references. So, even though there was a decent amount of good things (google indexing etc) it was still too messy. However, as more information was added, help came from an unexpected source: post metadata. They found that pull-down menus for templates were being populated by the titles of the posts. They used the metadata from the posts and used that to generate the pull-down menu. In the act of choosing that post, a link is created from that post to the new page made by the template. The templates depend on the metadata, and because the templates are labor saving, users will put in metadata! Templates feed on metadata, which feed the templates, and so on: a reinforcing system.
An ontology was “self-assembled” out of this research work and the metadata used for the templates. Their terms were compared to the Sequence Ontology and found some exact matches and some places where they identified some possible errors in the sequence ontology (e.g. conflation of purpose into one term). They’re capturing first, and then the structure gets added afterwards. They can then map their process and ontologies onto agreed vocabularies for the purpose of a particular story. They do this because we want to communicate to other communities and researchers that are interested in their work.
So, you need tools to do this. Luckily, there are tools available that exploit structure where it already exists (like they’ve done in their templates, aka workflows). You can imagine instruments as bloggers (take the human out of the loop). However, we also need tools to tell stories: to wire up the research objects into particular stories / journal articles. This allows people who are telling different stories to connect to the same objects. You could aggregate a set of web objects into one feed, and link them together with specific predicates such as vocabs, relationships, etc. This isn’t very narrative, though. So, we need tools that interact with people while they’re doing things – hence Google Wave.
An example is Igor, the Google Wave citation robot. You’re having a “conversation” with this Robot: it’s offering you links, choices, etc while having it look and feel like you’re writing a document. Also is the ChemSpider Robot, written by Cameron. Here, you can create linked data without knowing you’ve done it. The Robots will automatically link your story to the research objects behind it. Robots can work off of each other, even if they aren’t intended to work together. Example: Janey-robot plus Graphy. If you pull the result from a series of robots into a new Wave, the entire provenance from the original wave is retained, and is retained over time. Workflows, data, or workflows+data can be shared.
Where does this take us? Let’s say we type “the new rt-pcr sample”. The system could check for previous rt-pcr samples, and choose the most recent one to link to in the text (after asking them if they’re sure). As a result of typing this (and agreeing with the robot), another robot will talk to a MIBBI standard to get the required minimum information checklist and create a table based on that checklist. And always, adding links as you type. Capture the structure – it’s coming from the knowledge that you’re talking about a rt-pcr reaction. This is easier than writing out by hand. As you get a primer, you drop it into your database of primers (which is also a Wave), and then it can be automatically linked in your text. Allows you to tell a structured story.
Natural user interaction: easy user interaction with web services and databases. You have to be careful: you don’t want to be going back to the chemical database every time you type He, is, etc. In the Wave, you could somehow state that you’re NOT doing arsenic chemistry (the robot could learn and save your preferences on a per-user, per-wave basis. There are problems about Wave: one is the client interface, another is user understanding. In the client, some strange decisions have been made – it seems to have been made the way that people in Google think. However, the client is just a client. Specialized clients, or just better clients, will be some of the first useful tools. In terms of user understanding, all of us don’t quite understand yet what Wave is.
We’re not getting any smarter. Experimentalists need help, and many recognize this and are hoping to use these new technologies. To provide help, we need structure so machines can understand things. However, we need to recognize and leverage the fact that humans tell stories. We need to have structure, but we need to use that structure in a narrative. Try to remember that capturing and communication are two different things.
Chris Rawlings, Also speaking: Catherine Canevet and Paul Fisher
BBSRC-funded research collaboration in Newcastle, Manchester, and Rothamsted : ONDEX and Taverna. Demo: Integration and augmentation of yeast metabolome model (Nature Biotech October 2008 26(10). Presented: Taverna and ONDEX. In ONDEX, everything can be seen as a network. To help with this, ONDEX contains an ontology of concept classes, relation types, and additional properties. Their example is yeast jamboree data integration. They have both specific (e.g. KEGG) and generic (e.g. tab delimited) parsers to load in data.
When ONDEX works with Taverna, instead of using the pipeline manager you use the ONDEX web services and access ONDEX from Taverna. This means you can use Taverna to pull in data into ONDEX. So, first parse jamboree data into ONDEX and remove currency metabolites (e.g. ATP, NAD). Add publications to the graph, from which domain experts can view and manually curate that data. Finally, annotate the graph using network analysis results. Then switch to taverna and identify orphans discovered in ONDEX. Retrieve the enzymes relating to the orphans and assemble the PubMed query and then add hits back to the ONDEX graph. Finally, have a look at the completed visualization. Use the ONDEX pipeline manager to upload data – it’s all in a GUI, which is good.
Then followed a live demo.
Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!
This week I attended a great two-hour session run by the brand-spanking new Great North Museum (GNM) designed to encourage collaboration between Newcastle University researchers and the GNM. In addition, ideas for using this type of collaboration in the form of outreach to the community (e.g. schoolkids) was welcome. There have already been some useful research collaborations between the university and the museum, and they want to encourage even more.
The GNM was formed from a number of museums (e.g. the Hancock, and the Hatton Gallery) and under the auspices of many different groups including Newcastle University (a full list is available). It opened its doors last week, over the school holidays. I work in the university building that sits just across the street from the GNM: Hancock building, and every time I looked there was a queue stretching down to the road. You can see an example of this on Simon’s Twitpic (pictured above). It has received more than 67,000 visitors in its first week. Congratulations! I have to say that the museum is really impressive from the outside, and looks great on the inside. I haven’t given myself the full tour yet, but I will be doing so soon.
While at the event today, I learned some interesting things about the contents of the GNM, and I thought it might be of general interest. The GNM has over 500,000 items in its collection, of which there is only space for 3,500 to be displayed, even with the revamp of the museums. They have a taxidermist on-site, as they still get roadkill and the occasional other type of animal to prepare for the collection.
Their collection covers a wide array of natural history and archaeology, and includes:
- birds and bird eggs, including a Great Auk egg
- an extensive collection of molluscs, including 1000s of type specimens
- sea slug specimens and figures
- insects, most of which are stored in their original victorian cabinets
- an osteology collection which includes moa, great hawks and dodos
- game heads
- botany specimens and drawings, including an extensive herbarium with lichens and north-eastern seaweed
- paleozoology, including a carboniferous tetrapod (crocodile-like amphibian), with predominately local geology with lots of type material, some of which is on display – recent improvements in display cases’ environments now allow this
- paleobotany including a big fossilized tree trunk, a bunch of specimens from the 1830s and 100s of thin sections of fossils
- minerals
- ethnography material, including some original items from Captain Cook
- Egyptology
- extensive Roman archaelogy from Hadrian’s Wall
- prehistoric archaeology
- Anglo-saxon and medieval collections
- Greek and Etruscan art and archaeology
- fine art in the Hatton collections and original Bewick prints and blocks
- a large archive which includes letters from people like Mary Anning, Richard Owen and Charles Darwin
The oldest item in the archaeology collection is a 11,000-year-old paleolithic flint blade found in the region. There is also a prehistoric gallery at the GNM, and the Hadrian’s Wall gallery is the largest at the GNM. The museum also houses the Shefton collection of about 1,000 Greek and Etruscan items.
In terms of collaboration and outreach, a couple of points came across clearly amongst the case studies and discussions:
- The museum can be used to teach biodiversity and conservationism
- Using the items in the museum, re-creations of important research can be done (and are being done). For instance, it was museum collections of bird eggs that helped researchers figure out that eggshells were thinning due to DDT ingestion by birds
- Collaboration between researchers at the university and the museum can lead to truly interesting work being done. Showcasing university research in the museum, engaging with schools and the wider community, and performing research with the help of the museum are the sorts of things that were discussed.
I like having a museum on my (work) doorstep, and hope to find some way to work with it. Enjoy your visit!
Congratulations to the Newcastle Uni iGEM Team 2008!
November 10, 2008
Congratulations, Bug Busters! You didn't just get a gold star, you got a gold award!
Though I was not involved, many of my friends were part of the Newcastle University iGEM 2008 team, either as supervisors or students. You can read more on the Newcastle University iGEM entry wiki page. Of the 84
teams competing, only 16 won gold medals, including, from the UK, Edinburgh,
Imperial and Newcastle.
From the overview of the team's wiki page:
"We aimed to develop a diagnostic biosensor for detecting pathogens.
We wanted this to be cheaply and readily available for deployment in
areas where access to medical resources, such as refrigeration and
sophisticated laboratories, is limited or absent. We chose to use Bacillus subtilis
as a method of delivery due to its ability to sporulate. The sensor
bacteria could then be dried down as spores, which are very stable and
extremely resilient to hostile environmental conditions, and rehydrated
when required. The ambient temperature of much of the developing world
is ideal for the growth of Bacillus spp. without the use of incubation equipment.
Gram-positive bacteria communicate using quorum communication
peptides. Research has shown that these peptides are extremely
strain-specific. We chose to engineer B. subtilis 168 to detect
four Gram-positive pathogens by their quorum communication peptides.
The different combinations of quorum communication peptides would be
sensed by the engineered bacterium, and this signal converted into a
visual output as fluorescent proteins such as mCherry, GFP, CFP and
YFP." Read more.
Well done!
P.S. Looks like kudos to my old alma mater, Rice University, too! Congrats!
3 Bioinformatics Research Associate Positions: Newcastle University
December 18, 2007
There are three bioinformatics jobs (one in pure bioinformatics, one in network analysis, and another in modelling/mathematical biology) currently available within CISBAN, an interdisciplinary centre studying the systems biology of ageing and nutrition. The full particulars are posted both on Nature Jobs and on the Newcastle University Job Vacancies web pages.
Below are links to the various job advertisements, as well as summaries of the jobs themselves. This is a summary of the three Nature Jobs postings, put together on a single page for easy perusal. The closing date for all of these positions is 11 January 2008. This is a great opportunity, though I may be speaking from a biased perspective as I work at CISBAN and find it an interesting and challenging workplace.
-
Centre for Integrated Systems Biology of Ageing and Nutrition, Institute for Ageing and Health
Research Positions
Level F £25,134 – £32,796 p.a.
Level G: £33,779 – £40,335 p.a.We seek scientists to join CISBAN, an exciting new research centre established following a major award (£6.4m) from BBSRC and EPSRC,
to participate in studies of the mechanisms responsible for ageing and
how they are affected by nutrition. Ageing is recognised
internationally as a ‘grand challenge’ and is a field prioritised for
growth. This post offer opportunities to work in an intensely
multidisciplinary, world-class centre and contribute to the development
and application of systems science.Research Associate (Bioinformation/Computing Scientist – Applications)
To
develop and maintain the computing software and hardware infrastructure
for systems biology, including a central web portal integrating
applications for data capture, storage and visualisation and high
performance computing systems and databases, including a large Linux
cluster.Job reference: A1091R
Posts are tenable until 30 September 2010.
Enquiries for the post may be directed to Dr Anil Wipat, School of Computing Science (email: anil.wipat@ncl.ac.uk)
Further particulars for this post can be found on the University’s web page at http://www.ncl.ac.uk/vacancies/list.phtml?category=Research.Applications should be submitted by 11 January 2008 to Professor Tom Kirkwood, CISBAN Director,
Institute for Ageing and Health, Henry Wellcome Laboratory for
Biogerontology Research, Newcastle University, Newcastle upon Tyne NE4 6BE (email: tom.kirkwood@ncl.ac.uk).
Committed to Equal Opportunities -
Centre for Integrated Systems Biology of Ageing and Nutrition, Institute for Ageing and Health
Research Positions
Level F £25,134 – £32,796 p.a.
Level G: £33,779 – £40,335 p.a.We seek scientists to join CISBAN, an exciting new research centre established following a major award (£6.4m) from BBSRC and EPSRC,
to participate in studies of the mechanisms responsible for ageing and
how they are affected by nutrition. Ageing is recognised
internationally as a ‘grand challenge’ and is a field prioritised for
growth. This post offer opportunities to work in an intensely
multidisciplinary, world-class centre and contribute to the development
and application of systems science.Research Associate (Bioinformatician – Network Analysis)
To
research and develop novel methods of representing and integrating
molecular and cellular data as networks and apply this methodology to
identify novel proteins and elucidate novel pathways involved in the
process of cellular ageing and senescence.Job reference: A1090R
Posts are tenable until 30 September 2010.
Enquiries for the post may be directed to Dr Anil Wipat, School of Computing Science (email: anil.wipat@ncl.ac.uk)
Further particulars for this post can be found on the University’s web page at http://www.ncl.ac.uk/vacancies/list.phtml?category=Research.Applications should be submitted by 11 January 2008 to Professor Tom Kirkwood, CISBAN Director,
Institute for Ageing and Health, Henry Wellcome Laboratory for
Biogerontology Research, Newcastle University, Newcastle upon Tyne NE4 6BE (email: tom.kirkwood@ncl.ac.uk).Committed to Equal Opportunities
-
Centre for Integrated Systems Biology of Ageing and Nutrition, Institute for Ageing and Health
Research Positions
Level F £25,134 – £32,796 p.a.
Level G: £33,779 – £40,335 p.a.We seek scientists to join CISBAN, an exciting new research centre established following a major award (£6.4m) from BBSRC and EPSRC,
to participate in studies of the mechanisms responsible for ageing and
how they are affected by nutrition. Ageing is recognised
internationally as a ‘grand challenge’ and is a field prioritised for
growth. This post offer opportunities to work in an intensely
multidisciplinary, world-class centre and contribute to the development
and application of systems science.Research Associate (Modeller/Mathematical Biologist)
To
develop models of molecular and cellular mechanisms of ageing and to
explore links between ageing, development and evolution from a
life-course perspective. This post will also involve collaboration
within the EU Network of Excellence LifeSpan, linking development and ageing.Job Ref: A1092R
Posts are tenable until 30 September 2010.
Enquiries for the post may be directed to to Professor Tom Kirkwood, Institute for Ageing and Health (email: tom.kirkwood@ncl.ac.uk) Further particulars for this post can be found on the University’s web page.
Applications should be submitted by 11 January 2008 to Professor Tom Kirkwood, CISBAN Director,
Institute for Ageing and Health, Henry Wellcome Laboratory for
Biogerontology Research, Newcastle University, Newcastle upon Tyne NE4 6BE (email:* tom.kirkwood@ncl.ac.uk).Committed to Equal Opportunities
Of GelML and MFO
October 8, 2007
A couple of papers from here at Newcastle University have appeared over the past couple of weeks. Here's a summary of them both.
- Data Standards
From "An Update on Data Standards for Gel Electrophoresis" in Practical Proteomics Issue 1, September 2007, and by Andrew R. Jones and Frank Gibson.
From the abstract: "We report on standards development by the Gel Analysis Workgroup of the
Proteomics Standards Initiative. The workgroup develops reporting
requirements, data formats and controlled vocabularies for experimental
gel electrophoresis, and informatics performed on gel images. We
present a tutorial on how such resources can be used and how the
community should get involved with the on-going projects. Finally, we
present a roadmap for future developments in this area."
Provides a summary of ongoing work in the Gel electrophoresis and Gel informatics fields in terms of data and metadata standardization. This includes work on MIAPE GE and MIAPE GI, two checklists for minimal information required on these types of experiments and analyses. For both GE and GI, there are data formats (GelML and GelInfoML, respectively, both extensions of FuGE) and a suggested controlled vocabulary (sepCV). More information can be found on http://www.psidev.info.
Frank works in the CARMEN neuroscience project here at Newcastle, and Andy is in Liverpool and works on, among other things, FuGE. CARMEN collaborates with the SyMBA project, which was originally developed by me and a few others within Neil Wipat's Integrative Bioinformatics Group here at Newcastle but which is now a sourceforge project at http://symba.sf.net. Andy Jones is a co-author with me, Neil Wipat, Matt Pocock and Olly Shaw on an upcoming SyMBA paper. - Semantic Data Integration
A paper that was presented at the Integrative Bioinformatics Conference 2007 by me and my co-authors, Matt Pocock and Neil Wipat, is now available from the Journal of Integrative Bioinformatics website.
Allyson L. Lister, Matthew Pocock, Anil Wipat. Integration of
constraints documented in SBML, SBO, and the SBML Manual facilitates
validation of biological models. Journal of Integrative Bioinformatics,
4(3):80, 2007.
Integrative Bioinformatics 2007 Day 2: Multi-value networks, Banks et al.
September 11, 2007
Other than where specified, these are my notes from the IB07 Conference, and not expressions of opinion. Any errors are probably just due to my
own misunderstanding.
Talk about multi-value networks, high-level petri nets, and the differences with boolean networks. Formal methods are required to model and analyse complex regulatory interactions. Boolean networks offer a good starting point, but are often too simplistic. Multi-value networks (MVNs) are qualitative, and are seen as a middle ground between differential equation models and boolean networks.
He has applied high-level petri net techniques and a wide range of analysis tools. In MVNs, entities assume a range of values (o…n). Each entity has a neighbourhood of other entities that affect it, and the behaviour of each entity is described using state tables. However, we can't really analyse this: that's where Petri nets come in. They have a graphical notation with mathematical semantics and can model choice, synchronization and concurrency. They have an expressive framework with data types and equational description of behaviour. There are a wide range of analysis techniques and tool support, e.g. model checking. Petri nets use a kind of tokenizing system.
Their approach was as follows. They have defined a set of state transition tables that completely define the model. Equational definitions are extracted from these tables, and then a Petri net is constructed. They also use multi-value logic minimalization applied to each state transition table to simplify the information from the tables. Construction of the high-level Petri net begins with a single place for each entity connected to central transition. Transition encodes equational specification of network behaviour. Each placed "x" is connected to the transition node with input arch "x and output arc x".
They showed how this worked through carbon starvation in E.coli. Exponential growth occurs where there is sufficient carbon, but they enter a stationary phase when the carbon is depleted. The model is validated by checking known properties. Then, you can look at dynamic properties. A mutant analysis was also done, where you can "knockout" or overexpress key genes and observe the effect.
Finally, they do a model comparison with the Boolean network equivalent of this model. There are differences, which leads to some interesting questions: how much detail is required in the model? Is the model representable in the boolean domain?
My opinion: A great, interesting talk that flowed well and was easy to understand. Slides were a little overfull, but it didn't detract. A natural speaker.
Questionnaire Design
March 2, 2007
I spent today in a 1-day
course on Questionnaire Design organized by the Newcastle University Staff Development Unit, and run by Dr. Pamela Campanelli, a Survey Methods
consultant and UK Chartered Statistician. While I won’t recreate her slides
here, as that would be long, irrelevant and possibly infringe some copyrights,
I wanted to present some of the most interesting comments she had to make on the design and analysis of questionnaires and the responses returned.
I signed up to this course as my PhD project includes, as one of its
(smaller) objectives, the comparison of the perceived level of collaboration
between the various research groups within the Centre I belong to both before
and after my PhD project is made available. Part of that project is to provide
an application accessible to all researchers that will
automatically use the output of certain research groups to inform the research
of other groups. (Yes, I am being deliberately vague here.)
In summary, the ability to provide my target audience with a simple, clear
questionnaire that will additionally produce responses that can be
statistically analyzed in a useful manner is important. As I have no previous
experience writing a questionnaire, a crash-course seemed like a good idea.
Forgive any errors in the points that follow: I am sure they are all due to my
lack of comprehension rather than to the quality of the training course!
Of most relevance to me Pam mentioned that, when designing
a questionnaire that will be given at multiple time points (i.e. before and
after my work is available to the researchers), to ensure that the
changes in the responses are not due to questionnaire design, make sure that you use an identical
questionnaire every time you provide it.
The most important thing I learnt from the day’s training
is this: always think very carefully
about what you want to ask, and ensure that every question you ask has a
relevant objective and is written with an eye for balancing brevity and clarity
(with clarity being the more important of the two). For instance, in English
“you” may be plural or singular, and which is intended should be made clear.
Equally, words like “doctor” have many meanings: your GP, your specialist, a
PhD. Some may even check “yes” to a question asking if they have seen their
doctor if they have been to the surgery/office and seen the nurse, or even
if they have chatted with their doctor on a chance meeting at the grocery
store.
Pam mentioned a resource that has been useful to her in the
past, called the CASS Question Bank (http://qb.soc.surrey.ac.uk).
This presents – for free – the information in the
data archive. Not only might a question you wish to use already be written,
but in some cases you can see how often such a question was answered (and
perhaps also the frequencies of each possible answer). It should be noted,
however, that just because a question or questionnaire has been published doesn’t
mean it is perfect. Also, there is no “ideal response rate” for questionnaires that
can be applied across the board. Instead, the rate will naturally differ
between country and even academic discipline (or other grouping). Further, the
people who actually respond to questionnaires have different traits than those
who don’t respond (when under their own recognizance).
Incentives were also discussed, as I had toyed with the
idea of encouraging people to fill out my questionnaire by having a prize draw
for respondents for chocolate. Interestingly, Pam mentioned that prize draws
can be the worst of the incentive choices available. One study (sorry, I didn’t
catch the reference) examined promised a guaranteed prize of great value as
opposed to giving a much smaller prize before
the respondent filled out the form. The control response rate (no incentives)
was 50%. Where the respondents were guaranteed $50 if they sent back the form,
the response rate rose to 57%. However, when $5 was included in the initial
posting with the questionnaire, the response rate rose to 67%! Whether it was
the respondent’s belief in reciprocity or their feelings of guilt, it seems
that providing the carrot at the same time as the stick was useful. On a
smaller scale, including a tea bag (as was done by a PhD student) proved popular as well.
Memory is often overestimated. Reports vary about how large
working memory is, but I’ve both 7 +/- 2 items and 5 +/-
2 items were mentioned. As Pam suggested, imagine a scenario where you are at a restaurant and
the waiter is telling you the specials. Most people find it difficult to keep
more than 5 or 6 specials in their head: after that, they start forgetting the
earlier items. This holds just as true for self-completion questionnaires (which
I’m interested in), and questionnaires in general. Therefore, the more clauses
in a question, or the more radio buttons in a range of possible responses, the
less likely that the responder will answer with their “correct” answer. In a
similar vein, you should try not to force respondents to do mathematics in
their head (“How often per day, on average, do you visit the coffee lounge at work?”).
The more mathematics you make them do, the less likely their answer will be the
one they intended. Instead, a couple of simpler questions from which the designer can calculate the value is better.
She also says that the most common problem she encounters
is trying to answer too many questions with a single item, with her example being “Would you like
to be rich and famous?”: this sentence is alright for those who want either
both or neither, but is not appropriate for those who want one or the other.
What is most interesting are the social aspects of
questionnaire design. If you have a range of 5 possible answers for a question
(very positive, generally positive, neutral, generally negative, very
negative), you need to decide whether you want to force your respondents to
take a side. To do this, you remove the
“neutral” option, forcing the respondents to get off the fence. You should also be
sparing in your use of “don’t know” as an option, as many people will use that
in preference to thinking about the question. Also, in many cases it is simply
not appropriate: for instance, “don’t know” is not really
applicable to the question “How happy are you with your new TV?”. Further, vague,
subjective quantifiers should be avoided wherever possible. Words like “often”,
“sometimes” and “rarely” mean different things to different people. Instead,
measuring frequencies with words like “everyday” and “about once a week” are
better, though they may not be suitable if the respondent’s behavior is not
regular. Questions using these words must be written clearly so that
respondents can make a decision easily. Finally, numeric scales should at a
minimum have the midpoint and the two extremes named with appropriate adjectives.
If, for instance, you have the range 0-10 and have not marked 5 as the
midpoint, some people may mistake the scale for a unipolar (any number over 0
is positive) rather than a bipolar one (any number over 5 is positive). The course covered many more topics than I've mentioned here. Included below were the references she recommended for further reading.
References Suggested (the
starred reference was the one she mentioned the most)
Tourangeau
et al. (2000), The Psychology of Survey Response.
Fowler,
F.J. Jr. (1995), Improving Survey Questions: Design and Evaluation, : Sage.
(*)
Dillman, D. (2007), Mail and Internet Surveys: The Tailored Design Method,
2nd Edition, :
Wiley
Fowler, F. J. Jr. (2002), Survey Research Methods. 3rd
Edition, :
Sage.
Czala, Ronald and Blair, J (2005), Designing Surveys – a
guide to decisions and procedures. : Pine Forge
Press.