Meetings & Conferences

Questionnaire Design

I spent today in a 1-day
course on Questionnaire Design organized by the Newcastle University Staff Development Unit, and run by Dr. Pamela Campanelli, a Survey Methods
consultant and UK Chartered Statistician. While I won’t recreate her slides
here, as that would be long, irrelevant and possibly infringe some copyrights,
I wanted to present some of the most interesting comments she had to make on the design and analysis of questionnaires and the responses returned.

          I signed up to this course as my PhD project includes, as one of its
(smaller) objectives, the comparison of the perceived level of collaboration
between the various research groups within the Centre I belong to both before
and after my PhD project is made available. Part of that project is to provide
an application accessible to all researchers that will
automatically use the output of certain research groups to inform the research
of other groups. (Yes, I am being deliberately vague here.)
In summary, the ability to provide my target audience with a simple, clear
questionnaire that will additionally produce responses that can be
statistically analyzed in a useful manner is important. As I have no previous
experience writing a questionnaire, a crash-course seemed like a good idea.
Forgive any errors in the points that follow: I am sure they are all due to my
lack of comprehension rather than to the quality of the training course!

          Of most relevance to me Pam mentioned that, when designing
a questionnaire that will be given at multiple time points (i.e. before and
after my work is available to the researchers), to ensure that the
changes in the responses are not due to questionnaire design, make sure that you use an identical
questionnaire every time you provide it

          The most important thing I learnt from the day’s training
is this: always think very carefully
about what you want to ask, and ensure that every question you ask has a
relevant objective and is written with an eye for balancing brevity and clarity
(with clarity being the more important of the two). For instance, in English
“you” may be plural or singular, and which is intended should be made clear.
Equally, words like “doctor” have many meanings: your GP, your specialist, a
PhD. Some may even check “yes” to a question asking if they have seen their
doctor if they have been to the surgery/office and seen the nurse, or even
if they have chatted with their doctor on a chance meeting at the grocery

          Pam mentioned a resource that has been useful to her in the
past, called the CASS Question Bank (
This presents – for free – the information in the
data archive. Not only might a question you wish to use already be written,
but in some cases you can see how often such a question was answered (and
perhaps also the frequencies of each possible answer). It should be noted,
however, that just because a question or questionnaire has been published doesn’t
mean it is perfect. Also, there is no “ideal response rate” for questionnaires that
can be applied across the board. Instead, the rate will naturally differ
between country and even academic discipline (or other grouping). Further, the
people who actually respond to questionnaires have different traits than those
who don’t respond (when under their own recognizance).

          Incentives were also discussed, as I had toyed with the
idea of encouraging people to fill out my questionnaire by having a prize draw
for respondents for chocolate. Interestingly, Pam mentioned that prize draws
can be the worst of the incentive choices available. One study (sorry, I didn’t
catch the reference) examined promised a guaranteed prize of great value as
opposed to giving a much smaller prize before
the respondent filled out the form. The control response rate (no incentives)
was 50%. Where the respondents were guaranteed $50 if they sent back the form,
the response rate rose to 57%. However, when $5 was included in the initial
posting with the questionnaire, the response rate rose to 67%! Whether it was
the respondent’s belief in reciprocity or their feelings of guilt, it seems
that providing the carrot at the same time as the stick was useful. On a
smaller scale, including a tea bag (as was done by a PhD student) proved popular as well.

          Memory is often overestimated. Reports vary about how large
working memory is, but I’ve both 7 +/- 2 items and 5 +/-
2 items were mentioned. As Pam suggested, imagine a scenario where you are at a restaurant and
the waiter is telling you the specials. Most people find it difficult to keep
more than 5 or 6 specials in their head: after that, they start forgetting the
earlier items. This holds just as true for self-completion questionnaires (which
I’m interested in), and questionnaires in general. Therefore, the more clauses
in a question, or the more radio buttons in a range of possible responses, the
less likely that the responder will answer with their “correct” answer. In a
similar vein, you should try not to force respondents to do mathematics in
their head (“How often per day, on average, do you visit the coffee lounge at work?”).
The more mathematics you make them do, the less likely their answer will be the
one they intended. Instead, a couple of simpler questions from which the designer can calculate the value is better.

          She also says that the most common problem she encounters
is trying to answer too many questions with a single item, with her example being “Would you like
to be rich and famous?”: this sentence is alright for those who want either
both or neither, but is not appropriate for those who want one or the other.

          What is most interesting are the social aspects of
questionnaire design. If you have a range of 5 possible answers for a question
(very positive, generally positive, neutral, generally negative, very
negative), you need to decide whether you want to force your respondents to
take a side. To do this, you remove the
“neutral” option, forcing the respondents to get off the fence. You should also be
sparing in your use of “don’t know” as an option, as many people will use that
in preference to thinking about the question. Also, in many cases it is simply
not appropriate: for instance, “don’t know” is not really
applicable to the question “How happy are you with your new TV?”. Further, vague,
subjective quantifiers should be avoided wherever possible. Words like “often”,
“sometimes” and “rarely” mean different things to different people. Instead,
measuring frequencies with words like “everyday” and “about once a week” are
better, though they may not be suitable if the respondent’s behavior is not
regular. Questions using these words must be written clearly so that
respondents can make a decision easily. Finally, numeric scales should at a
minimum have the midpoint and the two extremes named with appropriate adjectives.
If, for instance, you have the range 0-10 and have not marked 5 as the
midpoint, some people may mistake the scale for a unipolar (any number over 0
is positive) rather than a bipolar one (any number over 5 is positive). The course covered many more topics than I've mentioned here. Included below were the references she recommended for further reading.

References Suggested (the
starred reference was the one she mentioned the most)

et al. (2000), The Psychology of Survey Response.

F.J. Jr. (1995), Improving Survey Questions: Design and Evaluation, : Sage.

Dillman, D. (2007), Mail and Internet Surveys: The Tailored Design Method,
2nd Edition, :

          Fowler, F. J. Jr. (2002), Survey Research Methods. 3rd
Edition, :

          Czala, Ronald and Blair, J (2005), Designing Surveys – a
guide to decisions and procedures.
: Pine Forge

Read and post comments |
Send to a friend


CISBAN Meetings & Conferences

North East Regional e-Science Centre/Digital Curation Centre Collaborative Workshop

North East Regional e-Science Centre/Digital
Curation Centre
Collaborative Workshop was on today, the 5th of February and Newcastle University. The DCC's main role is to "support and promote continuing improvement in the quality of data curation and of associated digital preservation". The aim of the NEReSC is to identify, fund and support
high-quality projects with leading industrial and academic partners. The NEReSC was established in July 2001, funded by the
DTI through the UK Core e-Science
programme, to provide expertise in e-Science and to instigate and run a set
of industrially focused projects.

The first two speakers, Paul Watson and Liz Lyon, gave short introductions about their respective organizations. Paul Watson is the head of NEReSC, and Liz Lyon is the Associate Director for Community Development at the DCC.

Liz spoke of how the DCC are interested in seeing what work is being done at Newcastle University in the context of digital curation and preservation, and perhaps developing partnerships with like-minded projects at the University. The DCC has already held 2 conferences on the subject of digital curation, the last one being last November (2006) in Glasgow. At that conference they also launched the electronic journal "International Journal of Digital Curation". It is a good move, as curation and data preservation are can be difficult to publish on in the more standard biology journals.

Paul Watson outlined the incredible need of the scientific community to have reliable archives of published data. He mentioned his so-called "Bowker's Standard" Scientific Data Life-Cycle, which is less of a life-cycle and really more of a gradual tailing-off. Step one is collect data, step two is publish the data, and step 3 is to gradually loose the original data as machines get turned off and students leave for greener pastures. It is humorous, but does show a real problem in the life sciences. Data for published articles should be preserved: otherwise, it means published papers draw conclusions from unpublished data, other groups are unable to reproduce an experiment, and the data cannot be re-used.

After these introductory speeches, there were 3 talks from Newcastle researchers on projects that involve archiving and curation. First, I spoke on the CISBAN data management strategy, which included an introduction to the CISBAN Data Portal and Integrator (slides for the DPI are available through that link). Then, Paul Watson spoke again, this time on CARMEN. There are a multitude of neuroscience data (molecular, anatomical, neurophysiological, and behavioural to name just a few categories) in many different locations with a variety of restrictions on their publishing and availability. There are a few efforts underway to try to unify data formats and archiving, but it is difficult to overcome the cultural (multiple communities acting independently; concerns from researchers about the consequences of sharing data) as well as technical (multiple proprietary data formats; the great volume of data; the need for standarized detailed metadata) barriers. Hopefully CARMEN and sister efforts such as BIRN and Neuro Commons (via Scientific Commons I believe, but don't quote me on it!) will be able to make real strides in this area in the coming years. Then, Patrick Olivier spoke on his work at the Culture Lab, part of the Institute of Ageing and Health at Newcastle University. They research ways of having the humanities, social sciences and the arts inform and aid computing, and vice versa.

The afternoon was scheduled for presentations from the DCC and general discussion. Unfortunately, I had a prior engagement with another meeting, and had to bow out. However, there was lots of energy in the morning, with many people from both groups asking questions and getting involved. Digital curation, archiving and preservation is an area which every research group should be interested in. It is very easy to forget that, unless you have some sort of data policy in your group, chances are that the data sitting on your computer is JUST on your computer, and is therefore precariously stored indeed.

Read and post comments |
Send to a friend


Meetings & Conferences Semantics and Ontologies Standards

3rd OBI Workshopy: Day 5

Today was mainly a wrap-up session: we went over the Milestones and Development Branches of OBI to make sure everyone was happy with the work planned for the coming months. I've also made an OBI Google Calendar.

Read and post comments |
Send to a friend


Meetings & Conferences Standards

OBI Workshop Day 3-4

You can see my post on day 3 here:

And on day 4 here: 

Meetings & Conferences Semantics and Ontologies Standards

3rd OBI Workshop: Day 4

Day 4 consisted mainly of discussions regarding the nature of a study versus investigation. These terms themselves are loaded, and what one person thinks of and calls a study may be exactly what others call investigation. In the end, after about 3/4 of the day devoted to the subject, the answer was…. ah, but that would be telling! Read through the notes of the day, and then at the end, all will be revealed.

  • those on the instrument branch (and more generally, all OBI development branches where appropriate) should remember the following:
    • don't ignore what the vendors are doing – they'll have term lists
    • don't ignore what the communities have done (ie metabolomics and proteomics)
  • The full specification for who is assigned to which branches, and how we will physically create the branches, is available on the branch OBI Wiki page.

Study Design Use Case – Drawing terms from the Community – Phillippe Rocca-Serra

  • Very interesting, but I wasn't able to take many notes due to updating the OBI wiki on the Branch implementation. If anyone has any good notes, either let me know and I'll include them here, or add them yourself as a comment!
  • The take-home message is that trying to reconcile many groups' different meanings for the same words is difficult, but drawing a large set of terms from literature and communities is very important.
  • By showing us what he has done, it gives all branch groups an idea of the work involved in working on their branches.

Branch best-practices

  • If a relation is missing while developing a branch, should it be referred to the relation ontology immediately?
    •  Try and define it as a group before submitting to RO, and ask for help if required. definition includes restrictions on where these relations can connect to and from.
    • far too many ontology projects collapse because they have far to many relations.
  • Should put a use-case of OBI on the wiki, for example, post use-cases of composition of study design
  • When branches create terms, at the same time as they make the terms, useful community alternative terms should be added. When you get an OBI term, you will immediately get many other associated terms.
  • Not only would you have a special interface for the biologists, but you would also be able to have computer people have a special structured interface to the literature via these alternative terms

Finding a real home for Information_Entities

  • At the moment BFO has qualities, information entities (generically dependent) and realizable entities (specifically dependent) as children of dependent continuant (DC). But generically dependent is not the same as the current definition of DC, and therefore needs to be made a sibling called generically_dependent_continuant.
    • Digital objects can exist in many different places and be identical. No other DC objects can be identical in this way.
    • DC and GDC will be disjoint.

Solving Study versus Investigation

  • If we are doing OBI, then we need an investigation design
  • Consensus was quickly reached on what an Investigation is:
    • Investigation is a process in which data are gathered from one or more protocol applications with the goal of reaching conclusions. It is guided by the realization of a investigation_design.
  • However, whether or not a study was necessary at all was a real debate.
    • One group thought a study was just a smaller version of an Investigation, and could therefore be dealt with by having investigations within investigations.
    • The other defined a study as everything done to materials prior to actually doing the assays
  • Therefore, a child of protocol_application was created for this purpose.
    • A sample_preparation_for_assay is a protocol_application including material_enrollments and biomaterial_transformations. There is alternative_term for this called "study".

Read and post comments |
Send to a friend


Meetings & Conferences Semantics and Ontologies Standards

3rd OBI Workshop: Day 3

Today was a highly informative combination of talks and further improvement of OBI. Hopefully, you'll find these musings on the day's work helpful at either jogging your own memory of the events, or in giving you an idea what went on in our heads.

Outside OBO
Ontologies – How do we integrate and/or make use of them?

  • Can we, at the moment or in future, place
    parent classes for all OBO ontologies in OBI? Definitely not now, as they don't share the same ULO (Upper Level Ontology). Some work is being done by the OBO-UBO group on mapping OBO ontologies to ULOs like BFO. (See the OBO-UBO web page for more information)

    • In a related question, should all OBO
      ontologies use BFO? It would make integration a much more straightforward process. In my opinion, this would be a great idea in the long term, however practicalities may prevent it. 🙂

  • Should things like
    BioTop ( be integrated
    into OBO, under BFO but before OBI? In my opinion (though today was the first time I have read about BioTop so it isn't the most informed one), in our case probably not, as resolving the three may be problematic. However, some terms or ideas might be useful to share.

Formal OWL, aka making OBI Formally correct

  • Should be assigned
    to someone/some people for later, after more classes have been
    created. There is simply too much flux in the file at the moment. Get the graphs in place first, perhaps working on some
    complex relations as you go. Further, the definitions must explicitly hold information
    on creating these relations, irrespective of whether or not you make the relationships as you go or at the end.

  • BFO and OBI use
    different metadata tags, and there should be a
    shared set of tags.

    • The metadata tags
      used in BFO are part of snap/span, I think. Would need to bring up the idea of metadata resolution (if possible, and we all agree it should be pursed) with that group too.

  • Barry Smith will bring OBI's information object and plan terms to the BFO group.

  • A milestone has been added (see the OBI Wiki) to
    hammer out exact implementation of the metadata list, and to work
    with other communities as appropriate (e.g. BFO, OBO Foundry =
    Barry, M Ashburner, Suzie, Chris M.).

Clinical Trial
Ontology – Simona Carini
& Barry Smith


  • Rctbank is a
    clinical trail db – information on all published clinical trials.
    (from journal articles)

  • Its purpose is to provide enough
    information to allow evaluation of these trials

  • RCT = randomized
    controlled trials

  • Epoch and Clinical
    Trial Ontology (CTO) are the other two that are being developed.

  • Barry Smith is involved in CTO, and therefore is built with OBI
    in mind, but is still very small

  • RCT and Epoch
    aren’t close to being OBO/OBI compliant.

    • Developed

    • Their choices are
      in conflict with the choices we’ve made

    • that does NOT mean that they aren't imminently useful (which they are), just that merging would be problematic
  • There has been
    agreement between Epoch and RCT that all should work towards a CTO
    that will work within the OBI framework

    • This necessary
      reconciliation is one of the goals of the CTO workshop in May.

  • There are people
    claiming to develop a CTO but it is actually a CT database
    ontology (I missed the name of the people being referred to here). It isn’t
    the same beast. Understanding the data is not equivalent to
    understanding the processes in a trial.

RCT Schema – Barry

  • Built
    independently of OWL or protégé, and is more correctly
    a database schema, though it is called an ontology.

  • Top-level class:

    • 2o study

    • Trial-details

    • Trial

    • Concept

      • Subclasses

  • Not the right way
    to do it – it is unbalanced: no place for a study, though is a
    place for a 2o study.

  • 2o study seems to
    be at the wrong level in the hierarchy

  • it is unclear what
    trial details means

  • When the same term (or portion of a term) is repeated
    over and over, it is often the a sign of a mistake, of redundancy

  • One of the
    children of population concept is population.

    • An ontology is
      important for reasoning using the is_a hierarchy, which can be reasoned
      over: Population is NOT a population concept and is NOT a concept

    • Reasoning is
      blocked here “from both directions”

    • Further, a recruitment
      flowchart is not a population concept

  • These things, like
    population concept, are headers/labels/conveniences, but they are not
    ontological forms. Some options for restructuring could be the following two things:

  • Population/protocol/design
    is_a continuant is_a entity

  • Trial is_a
    occurrent is_a entity

  • Not all RCT terms have

Epoch Ontology (Dave
Parrish in charge of it) – Barry Smith

  • There are parts of
    this ontology that don’t belong in the CTO, but do belong in OBI

  • Originally
    developed to support the immune tolerance network (ITN), a big
    clinical trial resource: they fund, implement, monitor and assess
    clinical trials, and provide data services.

    • Informatics dept
      of ITN perform operations (generation and collection) -> data
      management -> analysis

  • They have an
    ontology of the kind of analytical steps their software needs to
    perform, and it helps them configure the software application.

  • For example, elements are claimed to be
    nouns, and represent the physical objects of the system. Classes of
    elements are domain types, containers, relationships. These are not
    physical objects always – they’re sometimes processes. Also,
    they are not always nouns.

  • Fits in with the
    community milestones, i.e. we could get many terms from the clinical trials community.

Branches have been assigned. See the OBI Branches Wiki Page for up to date information.


  • Mapping between
    current terms in various OBO ontologies to BFO

    • E.g. GO
      biological process is_a span:process

  • Gramene has
    already developed an environmental ontology in a plant context,
    which we should remember and hopefully incorporate useful terms in the first round of community term dates.

More general

  • Have moved all terms
    that would fall under PATO out of the ontology, e.g. state and
    anything under quality.

  • Do we really need
    "in vitro state" as well as "in vitro"? Terms such as
    these are always tied to objects like cells – these are not design
    as much as the state of the cells.

    • Is in vivo
      a location or a state? You can take in vitro cells and put
      them into “vivo”, and they are still in vitro cells,
      which means in vitro is a BFO quality.

  • The interior of
    your gut is the site for your gut bacteria. The interior of gut (IG)
    is also a type/node in the FMA (as a location). IG has qualities
    (shape, etc). In addition to these qualities it has others that
    determine its roles (having certain pressure, pH value). How to
    distinguish what FMA means from what an environment ontology means?

  • If we remove
    in-vivo_state, we run into problems with multiple inheritance. We
    needed to separate out the state of a biomaterial from the
    biomaterial itself, i.e. don’t have in-vivo_material as a child of

  • What terms do we
    need to use to describe diseases?

    • Disease (hook for
      disease ontology), disease_symptoms, disease_stages,

  • Ended up going through the entire ontology, resolving many problems. There is a new OWL file, but it is not yet ready for public consumption therefore it won't be posted here until it is available from the official OBI pages.

There is general consensus among the workshop attendees that a very large amount of work is getting done, and there is a lot of positive feeling that the Milestones developed this week are giving us hard dates for inclusion of many more terms. The addition of terms can only truly start once the high-level structure has been decided, and this workshop has moved in great leaps and bounds towards a final structure of the higher levels of OBI. The "higher levels" have been generally defined at this meeting as the top two levels of OBI below BFO. This is what was completed today: the two levels directly below BFO have been studied by the group and cleaned.

Read and post comments |
Send to a friend


Meetings & Conferences Standards

OBI Workshop Day 2

At the moment I am trying out an alternative blogging site, and so you can see this post on that site.

Meetings & Conferences Standards

OBI Workshop Day 1

At the moment I am trying out an alternative blogging site, and so you can see this post on that site.

Meetings & Conferences Training Camp Day 1

Attached to the end of the BioSysBio conference, the 2nd Training camp is running from 13-15 January, 2007, with about 25-30 people in attendance The first day involved a series of talks, followed by a discussion of how to increase the amount of collaboration between and other organizations (among other things). After the work finished, there was a lively meal at a nearby italian restaurant (which included, partway through dinner, a travelling troupe of accordionist, guitarist and tambourinist – though the tambourine was mainly used for collecting donations!). What follows is a summary of what was discussed on Day 1.


  • Nicolas Le Novère – BioModels projects. The EBI side
    • MIRIAM (Minimum Information Requested In the Annotation of biochemical Models) was launched in 2004, and has three parts:
      • reference correspondence. Your model must be in a public machine-readable format and must be described in a single location (this is the reference description). The structure of the model must reflect the biological processes listed in the reference description. All quantitative attributes must be defined, and finally the model should get the same result as that described in the reference description.
      • attribute annotation. Your model must be named, and the reference description must have a citation. Both the authors of the model and the model creator must be credited, and the date and time of creation and last modification, as well as a statement about the terms of distribution must be included.
      • external resource annotation. To unambiguously relate a piece of knowledge to a model constituent, the referenced info must be described using a triplet of data-type, identifier and qualifier. To aid in this process, the community must agree on a set of standard valid data types.
    • The MIRIAM database at the ebi stores a list of data resources and their URIs. It is accessible via web browser and via web services, and via xml download.
  • Michael Hucka – SBML, a summary of current happenings and plans
    • The ability to exchange models is critical, and requires a common file format (which didn’t happen until 2000).
    • SBML is defined and created using UML, and then released as XML schema. Therefore, it is targeted at XML but mostly independent of it. Intended to be machine-readable, not human-readable.
    • SBML Levels are meant to co-exist. Each level introduces new features and enriches existing features. In this way, it gains power at the expense of simplicity.
      • Level 1 is mostly basic compartmental modeling.
      • Level 2 has more features including user-defined functions, events, types, initial conditions and constraints.
      • Level 3 in development.
    • SBML now the de facto standard, and supported by >100 systems, accepted by journals including Nature, BMC and PloS. It is also used in textbooks (such as Darren Wilkinson’s book on Stochastic Modelling in Systems Biology) and courses.
    • It is important to capture meaning in a model by associating ontology terms to objects within the SBML model. When not using ontologies, the terms are completely unregulated and not very useful for programmatic analysis.
  • Herbert Sauro – Human-readable model definition language (and Frank Bergmann)
    • Meant to generate new sbml rather than combine existing SBML models.
    • A model is made up of modules, which define a rate process. A module describes a process by a set of one or more reactions. State variables inside a module are either local or accessible through the module interface.
    • Targeting synthetic biology people as well as systems biology people. For biologists and computer scientists and engineers.
    • (“Editor’s” note: some from these groups may not be immediately comfortable with this language, which looks a little like pseudo-code.)
  • Lena Strömbäck Standards, exchange and databases
    • Provided an overview of her work on comparison of all currently-used data exchange formats for molecular interactions, including BioPax, PSI MI and SBML.
  • Dagmar Koehn Working with different pathway standards
    • There are already applications out there which perform various conversions relevant to the systems biology modelling world, including:
      • comparing xml schemas: Clio, COMA++
      • comparing owl models: SAMBO, COMA++, Protege
      • comparing different formats: some converters for XML to OWL, PSI MI to bioPAX.
    • She has worked on creating an application to perform this third comparison.
    • Has discovered that transformation is only possible in one direction, from xml schema to owl model, due to the fact that OWL is a semantically-aware format, while xml is just syntactic. She is developing an application that can deal with XML to OWL to XML again.

The discussion covered the possibility of talking with the FuGE and OBI people on linking SBML models to the experiments they are connected to; the need for a formal way to describe the simulation process, as “run the simulation for 1000 sec” can give completely different results to “run the simulation for 1 sec”; the possibility of using evidence tagging for annotation sections of SBML by extending the GO annotation attribution as the people at Newcastle University are doing.

Meetings & Conferences

3rd Integrative Bioinformatics Workshop Day 2 (5 September 2006)

The second day of IB2006 was the longest of the three days, and the only “full” day. From my point of view only, the talks were more relevant and interesting to my work. The second evening was also the conference dinner, which was very sociable and the conversations continued straight through the dinner and into late in the night back at the conference hotel. But back to the day itself: there was a fantastic keynote by Pedro Mendes, and a number of other interesting talks. The highlights are presented below.

Top-down modeling of biochemical networks, a grand challenge of systems biology (Pedro Mendes)

Systems biology, in his view, is the study of a system through synthesis or analysis, using quantitative and/or high-throughput data. Origins of systems biology as early as 1940s, but with a large amount of work done in the 1969s-70s. It didn’t really take off during this time due to lack of computing power and lack of experimental “ability” for getting the large amounts of data required.

Pedro is interested in the top-down modeling approach because there is a large amount of data, with a lot of numbers, and people naturally want to make models from them. Many people think this isn’t the way to build models, but he believes otherwise. In bottom-up modeling you start with a small number of known reactions and variables, while in top-down modeling you start at a coarse-grained level, with loads of data and you try to work “backwards” (compared to traditionaly modeling procedures) to find the steps that will produce the HTP data you started with. In other words, it derives elementary parts from studying the whole.

BASIS (Colin Gillespie)

Colin gave an interesting talk on the availability and usefullness of a web-based stochastic simulator. You can create, access and run sbml models via web services (and a web-page front-end to the web services). Their aim is to make their own models available to other researchers and also to provide a framework for others to build their own models. In general, their models can be envisaged as networks of individual biochemical mechanisms. Each mechanism is represented by a system of chemical equations, quantified by substrate and product concentrations and the associated reaction rates. The connected series of reactions are then simulated in time. Simulation may be stochastic or deterministic depending on species concentration. They have funding for another 6 years and are planning many additions to the tool.

Multi-model inference of network properties from incomplete data (Michael Stumpf)
Estimates for rates of false positives range from 20 to 60%, and in connection with this he recalls a quote he read at one time stating that gene expression is as close to scientific fraud as is accepted by the scientific establishment. At least at the moment, it appears to be a trade off between data quality and data quantity. In other words, you must take noise into account in any analytical work you do.

For most species, you only have interaction data for a subset of the proteome. Missing such data means that you can get quite different networks (currently known versus “actual” network). This affects summary statistics, among many others. They discovered that generally, inference for networks comprising less than 80% of the full graph should be treated with caution, however above that value the inference model developed is very useful. Given a subnet it is possible to predict some properties of the true network if we know the sampling process. (independent of the process by which the network has grown). For different data sets, there seems to be a huge difference between different experimental labs, and how each has mapped parts of the interactome. However, overall this is a good way of estimating total interactome size by performing this test on multiple subnets from different PPI experiments. There are limitations, though: it ignores multiple splice variants and domain architecture, so any organism affected by these will not necessarily have as good a result. By interrogating all these different models, and averaging over that, useful estimates of total interactome size is possible. Useful estimates can even be retrieved when using partial data as long as the number of nodes is at least 1000.

Other interesting talks included Stuart Moodie’s discussion of the current state of affairs in standardizing systems biology graphical notation and visualization (sbgn, kitano and others), Jab Baumbach’s work on performing knowledge ‘transfers’ for transcriptional regulatory networks from a model species to 3 other similar species important to human pathogen studies, Jan Kuentzer’s biological information system using both C++ and Java called BN++, an eye-opening overview of the current status of biocomputing at Singapore’s Biopolis by Gunaretnam Rajagopal), a lovely swooping demo of a targeted projection pursuit tool for gene expression visualization by Joe Faith, and a wonderfully presented (which in my mind equates to “easily understood” because of her skill as a speaker) statistical talk on modeling microarray data and interpreting and communicating biological results by Yvonne Pittelkow. (Yes, a couple of those were from day one, but they still deserved a mention!)