Categories
Meetings & Conferences Semantics and Ontologies Software and Tools

TT47: Semantic Data Integration for Systems Biology Research (ISMB 2009)

Chris Rawlings, Also speaking: Catherine Canevet and Paul Fisher

BBSRC-funded research collaboration in Newcastle, Manchester, and Rothamsted : ONDEX and Taverna. Demo: Integration and augmentation of yeast metabolome model (Nature Biotech October 2008 26(10). Presented: Taverna and ONDEX. In ONDEX, everything can be seen as a network. To help with this, ONDEX contains an ontology of concept classes, relation types, and additional properties. Their example is yeast jamboree data integration. They have both specific (e.g. KEGG) and generic (e.g. tab delimited) parsers to load in data.

When ONDEX works with Taverna, instead of using the pipeline manager you use the ONDEX web services and access ONDEX from Taverna. This means you can use Taverna to pull in data into ONDEX. So, first parse jamboree data into ONDEX and remove currency metabolites (e.g. ATP, NAD). Add publications to the graph, from which domain experts can view and manually curate that data. Finally, annotate the graph using network analysis results. Then switch to taverna and identify orphans discovered in ONDEX. Retrieve the enzymes relating to the orphans and assemble the PubMed query and then add hits back to the ONDEX graph. Finally, have a look at the completed visualization. Use the ONDEX pipeline manager to upload data – it’s all in a GUI, which is good.

Then followed a live demo.

FriendFeedDiscussion

Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!

Categories
Outreach

The Great North Museum: encouraging collaboration, teaching and outreach

Share photos on twitter with Twitpic

This week I attended a great two-hour session run by the brand-spanking new Great North Museum (GNM) designed to encourage collaboration between Newcastle University researchers and the GNM. In addition, ideas for using this type of collaboration in the form of outreach to the community (e.g. schoolkids) was welcome. There have already been some useful research collaborations between the university and the museum, and they want to encourage even more.

The GNM was formed from a number of museums (e.g. the Hancock, and the Hatton Gallery) and under the auspices of many different groups including Newcastle University (a full list is available). It opened its doors last week, over the school holidays. I work in the university building that sits just across the street from the GNM: Hancock building, and every time I looked there was a queue stretching down to the road. You can see an example of this on Simon’s Twitpic (pictured above). It has received more than 67,000 visitors in its first week. Congratulations! I have to say that the museum is really impressive from the outside, and looks great on the inside. I haven’t given myself the full tour yet, but I will be doing so soon.

While at the event today, I learned some interesting things about the contents of the GNM, and I thought it might be of general interest. The GNM has over 500,000 items in its collection, of which there is only space for 3,500 to be displayed, even with the revamp of the museums. They have a taxidermist on-site, as they still get roadkill and the occasional other type of animal to prepare for the collection.

Their collection covers a wide array of natural history and archaeology, and includes:

  • birds and bird eggs, including a Great Auk egg
  • an extensive collection of molluscs, including 1000s of type specimens
  • sea slug specimens and figures
  • insects, most of which are stored in their original victorian cabinets
  • an osteology collection which includes moa, great hawks and dodos
  • game heads
  • botany specimens and drawings, including an extensive herbarium with lichens and north-eastern seaweed
  • paleozoology, including a carboniferous tetrapod (crocodile-like amphibian), with predominately local geology with lots of type material, some of which is on display – recent improvements in display cases’ environments now allow this
  • paleobotany including a big fossilized tree trunk, a bunch of specimens from the 1830s and 100s of thin sections of fossils
  • minerals
  • ethnography material, including some original items from Captain Cook
  • Egyptology
  • extensive Roman archaelogy from Hadrian’s Wall
  • prehistoric archaeology
  • Anglo-saxon and medieval collections
  • Greek and Etruscan art and archaeology
  • fine art in the Hatton collections and original Bewick prints and blocks
  • a large archive which includes letters from people like Mary Anning, Richard Owen and Charles Darwin

The oldest item in the archaeology collection is a 11,000-year-old paleolithic flint blade found in the region. There is also a prehistoric gallery at the GNM, and the Hadrian’s Wall gallery is the largest at the GNM. The museum also houses the Shefton collection of about 1,000 Greek and Etruscan items.

In terms of collaboration and outreach, a couple of points came across clearly amongst the case studies and discussions:

  1. The museum can be used to teach biodiversity and conservationism
  2. Using the items in the museum, re-creations of important research can be done (and are being done). For instance, it was museum collections of bird eggs that helped researchers figure out that eggshells were thinning due to DDT ingestion by birds
  3. Collaboration between researchers at the university and the museum can lead to truly interesting work being done. Showcasing university research in the museum, engaging with schools and the wider community, and performing research with the help of the museum are the sorts of things that were discussed.

I like having a museum on my (work) doorstep, and hope to find some way to work with it. Enjoy your visit!

Categories
Uncategorized

Congratulations to the Newcastle Uni iGEM Team 2008!

Congratulations, Bug Busters! You didn't just get a gold star, you got a gold award!

Though I was not involved, many of my friends were part of the Newcastle University iGEM 2008 team, either as supervisors or students. You can read more on the Newcastle University iGEM entry wiki page. Of the 84
teams competing, only 16 won gold medals, including, from the UK, Edinburgh,
Imperial and Newcastle.

From the overview of the team's wiki page:

"We aimed to develop a diagnostic biosensor for detecting pathogens.
We wanted this to be cheaply and readily available for deployment in
areas where access to medical resources, such as refrigeration and
sophisticated laboratories, is limited or absent. We chose to use Bacillus subtilis
as a method of delivery due to its ability to sporulate. The sensor
bacteria could then be dried down as spores, which are very stable and
extremely resilient to hostile environmental conditions, and rehydrated
when required. The ambient temperature of much of the developing world
is ideal for the growth of
Bacillus spp. without the use of incubation equipment.

Gram-positive bacteria communicate using quorum communication
peptides. Research has shown that these peptides are extremely
strain-specific. We chose to engineer
B. subtilis 168 to detect
four Gram-positive pathogens by their quorum communication peptides.
The different combinations of quorum communication peptides would be
sensed by the engineered bacterium, and this signal converted into a
visual output as fluorescent proteins such as mCherry, GFP, CFP and
YFP."
Read more.

Well done!

P.S. Looks like kudos to my old alma mater, Rice University, too! Congrats!

Read and post comments |
Send to a friend

original

Categories
CISBAN

3 Bioinformatics Research Associate Positions: Newcastle University

There are three bioinformatics jobs (one in pure bioinformatics, one in network analysis, and another in modelling/mathematical biology) currently available within CISBAN, an interdisciplinary centre studying the systems biology of ageing and nutrition. The full particulars are posted both on Nature Jobs and on the Newcastle University Job Vacancies web pages.

Below are links to the various job advertisements, as well as summaries of the jobs themselves. This is a summary of the three Nature Jobs postings, put together on a single page for easy perusal. The closing date for all of these positions is 11 January 2008. This is a great opportunity, though I may be speaking from a biased perspective as I work at CISBAN and find it an interesting and challenging workplace.

  1. Centre for Integrated Systems Biology of Ageing and Nutrition, Institute for Ageing and Health

    Research Positions

    Level F £25,134 – £32,796 p.a.
    Level G: £33,779 – £40,335 p.a.

    We seek scientists to join CISBAN, an exciting new research centre established following a major award (£6.4m) from BBSRC and EPSRC,
    to participate in studies of the mechanisms responsible for ageing and
    how they are affected by nutrition. Ageing is recognised
    internationally as a ‘grand challenge’ and is a field prioritised for
    growth. This post offer opportunities to work in an intensely
    multidisciplinary, world-class centre and contribute to the development
    and application of systems science.

    Research Associate (Bioinformation/Computing Scientist – Applications)

    To
    develop and maintain the computing software and hardware infrastructure
    for systems biology, including a central web portal integrating
    applications for data capture, storage and visualisation and high
    performance computing systems and databases, including a large Linux
    cluster.

    Job reference: A1091R

    Posts are tenable until 30 September 2010.

    Enquiries for the post may be directed to Dr Anil Wipat, School of Computing Science (email: anil.wipat@ncl.ac.uk)
    Further particulars for this post can be found on the University’s web page at http://www.ncl.ac.uk/vacancies/list.phtml?category=Research.

    Applications should be submitted by 11 January 2008 to Professor Tom Kirkwood, CISBAN Director,
    Institute for Ageing and Health, Henry Wellcome Laboratory for
    Biogerontology Research, Newcastle University, Newcastle upon Tyne NE4 6BE (email:
    tom.kirkwood@ncl.ac.uk).
    Committed to Equal Opportunities

  2. Centre for Integrated Systems Biology of Ageing and Nutrition, Institute for Ageing and Health

    Research Positions

    Level F £25,134 – £32,796 p.a.
    Level G: £33,779 – £40,335 p.a.

    We seek scientists to join CISBAN, an exciting new research centre established following a major award (£6.4m) from BBSRC and EPSRC,
    to participate in studies of the mechanisms responsible for ageing and
    how they are affected by nutrition. Ageing is recognised
    internationally as a ‘grand challenge’ and is a field prioritised for
    growth. This post offer opportunities to work in an intensely
    multidisciplinary, world-class centre and contribute to the development
    and application of systems science.

    Research Associate (Bioinformatician – Network Analysis)

    To
    research and develop novel methods of representing and integrating
    molecular and cellular data as networks and apply this methodology to
    identify novel proteins and elucidate novel pathways involved in the
    process of cellular ageing and senescence.

    Job reference: A1090R

    Posts are tenable until 30 September 2010.

    Enquiries for the post may be directed to Dr Anil Wipat, School of Computing Science (email: anil.wipat@ncl.ac.uk)
    Further particulars for this post can be found on the University’s web page at http://www.ncl.ac.uk/vacancies/list.phtml?category=Research.

    Applications should be submitted by 11 January 2008 to Professor Tom Kirkwood, CISBAN Director,
    Institute for Ageing and Health, Henry Wellcome Laboratory for
    Biogerontology Research, Newcastle University, Newcastle upon Tyne NE4 6BE (email:
    tom.kirkwood@ncl.ac.uk).

    Committed to Equal Opportunities

  3. Centre for Integrated Systems Biology of Ageing and Nutrition, Institute for Ageing and Health

    Research Positions

    Level F £25,134 – £32,796 p.a.
    Level G: £33,779 – £40,335 p.a.

    We seek scientists to join CISBAN, an exciting new research centre established following a major award (£6.4m) from BBSRC and EPSRC,
    to participate in studies of the mechanisms responsible for ageing and
    how they are affected by nutrition. Ageing is recognised
    internationally as a ‘grand challenge’ and is a field prioritised for
    growth. This post offer opportunities to work in an intensely
    multidisciplinary, world-class centre and contribute to the development
    and application of systems science.

    Research Associate (Modeller/Mathematical Biologist)

    To
    develop models of molecular and cellular mechanisms of ageing and to
    explore links between ageing, development and evolution from a
    life-course perspective. This post will also involve collaboration
    within the EU Network of Excellence LifeSpan, linking development and ageing.

    Job Ref: A1092R

    Posts are tenable until 30 September 2010.

    Enquiries for the post may be directed to to Professor Tom Kirkwood, Institute for Ageing and Health (email: tom.kirkwood@ncl.ac.uk) Further particulars for this post can be found on the University’s web page.

    Applications should be submitted by 11 January 2008 to Professor Tom Kirkwood, CISBAN Director,
    Institute for Ageing and Health, Henry Wellcome Laboratory for
    Biogerontology Research, Newcastle University, Newcastle upon Tyne NE4 6BE (email:* tom.kirkwood@ncl.ac.uk).

    Committed to Equal Opportunities

Read and post comments |
Send to a friend

original

Categories
CISBAN Data Integration Semantics and Ontologies Software and Tools Standards

Of GelML and MFO

A couple of papers from here at Newcastle University have appeared over the past couple of weeks. Here's a summary of them both.

  • Data Standards
    From "An Update on Data Standards for Gel Electrophoresis" in Practical Proteomics Issue 1, September 2007, and by Andrew R. Jones and Frank Gibson.
    From the abstract: "We report on standards development by the Gel Analysis Workgroup of the
    Proteomics Standards Initiative. The workgroup develops reporting
    requirements, data formats and controlled vocabularies for experimental
    gel electrophoresis, and informatics performed on gel images. We
    present a tutorial on how such resources can be used and how the
    community should get involved with the on-going projects. Finally, we
    present a roadmap for future developments in this area."
    Provides a summary of ongoing work in the Gel electrophoresis and Gel informatics fields in terms of data and metadata standardization. This includes work on MIAPE GE and MIAPE GI, two checklists for minimal information required on these types of experiments and analyses. For both GE and GI, there are data formats (GelML and GelInfoML, respectively, both extensions of FuGE) and a suggested controlled vocabulary (sepCV). More information can be found on http://www.psidev.info.
    Frank works in the CARMEN neuroscience project here at Newcastle, and Andy is in Liverpool and works on, among other things, FuGE. CARMEN collaborates with the SyMBA project, which was originally developed by me and a few others within Neil Wipat's Integrative Bioinformatics Group here at Newcastle but which is now a sourceforge project at http://symba.sf.net. Andy Jones is a co-author with me, Neil Wipat, Matt Pocock and Olly Shaw on an upcoming SyMBA paper.
  • Semantic Data Integration
    A paper that was presented at the Integrative Bioinformatics Conference 2007 by me and my co-authors, Matt Pocock and Neil Wipat, is now available from the Journal of Integrative Bioinformatics website.
    Allyson L. Lister, Matthew Pocock, Anil Wipat. Integration of
    constraints documented in SBML, SBO, and the SBML Manual facilitates
    validation of biological models
    . Journal of Integrative Bioinformatics,
    4(3):80, 2007.

Read and post comments |
Send to a friend

original

Categories
Meetings & Conferences

Integrative Bioinformatics 2007 Day 2: Multi-value networks, Banks et al.

Other than where specified, these are my notes from the IB07 Conference, and not expressions of opinion. Any errors are probably just due to my
own misunderstanding. 🙂

Talk about multi-value networks, high-level petri nets, and the differences with boolean networks. Formal methods are required to model and analyse complex regulatory interactions. Boolean networks offer a good starting point, but are often too simplistic. Multi-value networks (MVNs) are qualitative, and are seen as a middle ground between differential equation models and boolean networks.

He has applied high-level petri net techniques and a wide range of analysis tools. In MVNs, entities assume a range of values (o…n). Each entity has a neighbourhood of other entities that affect it, and the behaviour of each entity is described using state tables. However, we can't really analyse this: that's where Petri nets come in. They have a graphical notation with mathematical semantics and can model choice, synchronization and concurrency. They have an expressive framework with data types and equational description of behaviour. There are a wide range of analysis techniques and tool support, e.g. model checking. Petri nets use a kind of tokenizing system.

Their approach was as follows. They have defined a set of state transition tables that completely define the model. Equational definitions are extracted from these tables, and then a Petri net is constructed. They also use multi-value logic minimalization applied to each state transition table to simplify the information from the tables. Construction of the high-level Petri net begins with a single place for each entity connected to central transition. Transition encodes equational specification of network behaviour. Each placed "x" is connected to the transition node with input arch "x and output arc x".

They showed how this worked through carbon starvation in E.coli. Exponential growth occurs where there is sufficient carbon, but they enter a stationary phase when the carbon is depleted. The model is validated by checking known properties. Then, you can look at dynamic properties. A mutant analysis was also done, where you can "knockout" or overexpress key genes and observe the effect.

Finally, they do a model comparison with the Boolean network equivalent of this model. There are differences, which leads to some interesting questions: how much detail is required in the model? Is the model representable in the boolean domain?

My opinion: A great, interesting talk that flowed well and was easy to understand. Slides were a little overfull, but it didn't detract. A natural speaker.

Read and post comments
|
Send to a friend

original

Categories
Meetings & Conferences

Questionnaire Design

I spent today in a 1-day
course on Questionnaire Design organized by the Newcastle University Staff Development Unit, and run by Dr. Pamela Campanelli, a Survey Methods
consultant and UK Chartered Statistician. While I won’t recreate her slides
here, as that would be long, irrelevant and possibly infringe some copyrights,
I wanted to present some of the most interesting comments she had to make on the design and analysis of questionnaires and the responses returned.

          I signed up to this course as my PhD project includes, as one of its
(smaller) objectives, the comparison of the perceived level of collaboration
between the various research groups within the Centre I belong to both before
and after my PhD project is made available. Part of that project is to provide
an application accessible to all researchers that will
automatically use the output of certain research groups to inform the research
of other groups. (Yes, I am being deliberately vague here.)
In summary, the ability to provide my target audience with a simple, clear
questionnaire that will additionally produce responses that can be
statistically analyzed in a useful manner is important. As I have no previous
experience writing a questionnaire, a crash-course seemed like a good idea.
Forgive any errors in the points that follow: I am sure they are all due to my
lack of comprehension rather than to the quality of the training course!

          Of most relevance to me Pam mentioned that, when designing
a questionnaire that will be given at multiple time points (i.e. before and
after my work is available to the researchers), to ensure that the
changes in the responses are not due to questionnaire design, make sure that you use an identical
questionnaire every time you provide it
.

          The most important thing I learnt from the day’s training
is this: always think very carefully
about what you want to ask, and ensure that every question you ask has a
relevant objective and is written with an eye for balancing brevity and clarity
(with clarity being the more important of the two). For instance, in English
“you” may be plural or singular, and which is intended should be made clear.
Equally, words like “doctor” have many meanings: your GP, your specialist, a
PhD. Some may even check “yes” to a question asking if they have seen their
doctor if they have been to the surgery/office and seen the nurse, or even
if they have chatted with their doctor on a chance meeting at the grocery
store.

          Pam mentioned a resource that has been useful to her in the
past, called the CASS Question Bank (http://qb.soc.surrey.ac.uk).
This presents – for free – the information in the
data archive. Not only might a question you wish to use already be written,
but in some cases you can see how often such a question was answered (and
perhaps also the frequencies of each possible answer). It should be noted,
however, that just because a question or questionnaire has been published doesn’t
mean it is perfect. Also, there is no “ideal response rate” for questionnaires that
can be applied across the board. Instead, the rate will naturally differ
between country and even academic discipline (or other grouping). Further, the
people who actually respond to questionnaires have different traits than those
who don’t respond (when under their own recognizance).

          Incentives were also discussed, as I had toyed with the
idea of encouraging people to fill out my questionnaire by having a prize draw
for respondents for chocolate. Interestingly, Pam mentioned that prize draws
can be the worst of the incentive choices available. One study (sorry, I didn’t
catch the reference) examined promised a guaranteed prize of great value as
opposed to giving a much smaller prize before
the respondent filled out the form. The control response rate (no incentives)
was 50%. Where the respondents were guaranteed $50 if they sent back the form,
the response rate rose to 57%. However, when $5 was included in the initial
posting with the questionnaire, the response rate rose to 67%! Whether it was
the respondent’s belief in reciprocity or their feelings of guilt, it seems
that providing the carrot at the same time as the stick was useful. On a
smaller scale, including a tea bag (as was done by a PhD student) proved popular as well.

          Memory is often overestimated. Reports vary about how large
working memory is, but I’ve both 7 +/- 2 items and 5 +/-
2 items were mentioned. As Pam suggested, imagine a scenario where you are at a restaurant and
the waiter is telling you the specials. Most people find it difficult to keep
more than 5 or 6 specials in their head: after that, they start forgetting the
earlier items. This holds just as true for self-completion questionnaires (which
I’m interested in), and questionnaires in general. Therefore, the more clauses
in a question, or the more radio buttons in a range of possible responses, the
less likely that the responder will answer with their “correct” answer. In a
similar vein, you should try not to force respondents to do mathematics in
their head (“How often per day, on average, do you visit the coffee lounge at work?”).
The more mathematics you make them do, the less likely their answer will be the
one they intended. Instead, a couple of simpler questions from which the designer can calculate the value is better.

          She also says that the most common problem she encounters
is trying to answer too many questions with a single item, with her example being “Would you like
to be rich and famous?”: this sentence is alright for those who want either
both or neither, but is not appropriate for those who want one or the other.

          What is most interesting are the social aspects of
questionnaire design. If you have a range of 5 possible answers for a question
(very positive, generally positive, neutral, generally negative, very
negative), you need to decide whether you want to force your respondents to
take a side. To do this, you remove the
“neutral” option, forcing the respondents to get off the fence. You should also be
sparing in your use of “don’t know” as an option, as many people will use that
in preference to thinking about the question. Also, in many cases it is simply
not appropriate: for instance, “don’t know” is not really
applicable to the question “How happy are you with your new TV?”. Further, vague,
subjective quantifiers should be avoided wherever possible. Words like “often”,
“sometimes” and “rarely” mean different things to different people. Instead,
measuring frequencies with words like “everyday” and “about once a week” are
better, though they may not be suitable if the respondent’s behavior is not
regular. Questions using these words must be written clearly so that
respondents can make a decision easily. Finally, numeric scales should at a
minimum have the midpoint and the two extremes named with appropriate adjectives.
If, for instance, you have the range 0-10 and have not marked 5 as the
midpoint, some people may mistake the scale for a unipolar (any number over 0
is positive) rather than a bipolar one (any number over 5 is positive). The course covered many more topics than I've mentioned here. Included below were the references she recommended for further reading.

References Suggested (the
starred reference was the one she mentioned the most)

Tourangeau
et al. (2000), The Psychology of Survey Response.

Fowler,
F.J. Jr. (1995), Improving Survey Questions: Design and Evaluation, : Sage.

(*)
Dillman, D. (2007), Mail and Internet Surveys: The Tailored Design Method,
2nd Edition, :
Wiley

          Fowler, F. J. Jr. (2002), Survey Research Methods. 3rd
Edition, :
Sage.

          Czala, Ronald and Blair, J (2005), Designing Surveys – a
guide to decisions and procedures.
: Pine Forge
Press.

Read and post comments |
Send to a friend

original