Whole-Genome Reference Networks for the Community

ResearchBlogging.org

Srinivasan et al use this paper as a call to the community to begin the development of whole-genome reference networks for key model organisms. This paper is a combination of a review (in that it summarizes methods of network generation and analysis) and a call to arms, stating that reference networks are needed. It begins by describing systems biology as “the science of quantitatively defining and analyzing” functional modules, or components of biological systems.

There are many different definitions of systems biology (see here, here, here, here and here, just to name a few), but generally it seems the twin pillars of data integration and study – at various levels of granularity – of biological systems are present in most of them. A focus on integration and top-down research rather than the more traditional reductionist point of view is also often mentioned.

The authors then divide systems biology into three broad categories: high-level networks of the interactome or metabolome, deterministic models of kinetics and diffusion, and finally stochastic models of variation in cell lines. This division would be slightly clearer if they specified continuous deterministic models and discrete stochastic models. I realize that these adjectives are generally assumed for these model types, but as it is their discrete- or continuous-ness that increases the complexity of the models, it is something that would be useful to include.

They collapse many different types of network data into a single global interaction network, stating that it would be prohibitively expensive to try to prise out all of the sub-graphs, as variables such as time or sub-cellular location are often not simple to pull out on their own. This “lowest common denominator” method of network generation is not ideal, but does provide more information than, they attest, a simple genome sequence. In their networks, nodes represent proteins and edge weights are probabilities of association between proteins.

Noise is a real problem in most of these high-throughput data sets, and such data sets are not all created equal: one group may make a very good gene expression data set, and another may not. How can variable quality of data be dealt with? Early efforts focused on integrating multiple networks and only taking those nodes and edges that were present in more than one network. After that, methods of network generation that used “gold standards” created better integrated networks.

Descriptions of network analyses (rather than network creation) focus on network alignment and experiment prioritization. The latter is a general term for pulling out elements of the network that haven’t been experimentally verified, such as likely additions to known pathways or important disease genes. The former is an interesting extrapolation to networks of sequence alignments for genomes. In network alignments, conserved modules of nodes are identified if they have “both conserved primary sequences and conserved pair-wise interactions between species”. They specifically mention Graemlin, which is a tool they have developed that can identify conserved functional modules across multiple networks.

Finally, they suggest that the reference networks should show only those reactions present in the “‘average cell’ of a given organism near the median of the norm of reaction”.

While they acknowledge that, like the reference human genome sequence, such a creation is a “useful fiction”, it is my opinion that finding the average cell will be much more difficult, and perhaps less illuminating, than its equivalent in the sequencing world. Further, describing what is “normal” is something that is truly difficult, and will vary from species to species. The PATO / quality ontology people (http://obofoundry.org/cgi-bin/detail.cgi?quality) have known about the problems facing the “average” phenotype for a while now. I do, however, like their idea of storing the reference networks using RDF, as that seems a fitting format for networks. Overall, a laudable goal but one which will need some more thinking about. I’ve tried to run Graemlin using one of their example searches, and it didn’t run (at least today), and the main author’s website won’t load for me to today, though one of the other author’s pages did work.

All-in-all, a useful review of recent network methods in bioinformatics, and an interesting goal described. Low-noise reference networks for key model organisms, together with the annotation tracks that would describe deviations from the norm is a good idea.

Topics for discussion (aka leading questions): More fine-grained reference implementations are available, such as Reactome. Reactome provides a curated database of human biological pathways, with inferred orthologous events for 22 other organisms. Do we need reference networks when we’re gradually growing our knowledge of reference pathways? Are reference networks of “normal” organisms states helpful? How do we define average? Would the median of the norm of a reaction be different under different environmental conditions? What if what one group considers an average cell differs from another group’s average cell? Having reference networks would mean easier comparisons of different network analysis programs. Would this end up being a major use of the networks? Would such comparisons just lead to network analysis programs that fit the reference network, but not work in a generic manner? What do others think?

Srinivasan, B.S., Shah, N.H., Flannick, J.A., Abeliuk, E., Novak, A.F., Batzoglou, S. (2007). Current progress in network research: toward reference networks for key model organisms. Briefings in Bioinformatics, 8(5), 318-332. DOI: 10.1093/bib/bbm038

Advertisements

Whole-Genome Reference Networks for the Community


Srinivasan et al use this paper as a call to the community to begin the development of whole-genome reference networks for key model organisms. This paper is a combination of a review (in
that it summarizes methods
of network generation and analysis) and a call to arms, stating that
reference networks are needed. It begins by describing systems biology as "the science of quantitatively
defining and analyzing" functional modules, or components of
biological systems.

There are many different definitions of systems
biology (see here, here, here, here and here,
just to name a few), but generally it seems the twin pillars of data
integration and study – at various levels of granularity – of
biological systems are present in most of them. A focus on integration
and top-down research rather than the more traditional reductionist
point of view is also often mentioned.

The authors then divide systems
biology into three broad categories: high-level networks of the
interactome or metabolome, deterministic models of kinetics and
diffusion, and finally stochastic models of variation in cell lines.
This division would be slightly clearer if they specified continuous deterministic models and discrete
stochastic models. I realize that these adjectives are generally
assumed for these model types, but as it is their discrete- or
continuous-ness that increases the complexity of the models, it is
something that would be useful to include.

They collapse many
different types of network data into a single global interaction
network, stating that it would be prohibitively expensive to try to
prise out all of the sub-graphs, as variables such as time or
sub-cellular location are often not simple to pull out on their own.
This "lowest common denominator" method of network generation is not
ideal, but does provide more information than, they attest, a simple
genome sequence. In their networks, nodes represent proteins and edge
weights are probabilities of association between proteins.

Noise is
a real problem in most of these high-throughput data sets, and such
data sets are not all created equal: one group may make a very good
gene expression data set, and another may not. How can variable quality
of data be dealt with? Early efforts focused on integrating multiple
networks and only taking those nodes and edges that were present in
more than one network. After that, methods of network generation that
used "gold standards" created better integrated networks.

Descriptions
of network analyses (rather than network creation) focus on network
alignment and experiment prioritization. The latter is a general term
for pulling out elements of the network that haven't been
experimentally verified, such as likely additions to known pathways or
important disease genes. The former is an interesting extrapolation to
networks of sequence alignments for genomes. In network alignments,
conserved modules of nodes are identified if they have "both conserved
primary sequences and conserved pair-wise interactions
between species". They specifically mention Graemlin, which is a tool
they have developed that can identify conserved functional modules across multiple networks.

Finally, they suggest that the reference networks should show only those reactions present in the "‘average cell’ of a given organism near the median of the norm of reaction".

While they acknowledge that, like the reference human genome sequence,
such a creation is a "useful fiction", it is my opinion that finding
the average cell will be much more difficult, and perhaps less
illuminating, than its equivalent in the sequencing world. Further,
describing what is "normal" is something that is truly difficult, and
will vary from species to species. The PATO / quality ontology people
(http://obofoundry.org/cgi-bin/detail.cgi?quality) have known about the
problems facing the "average" phenotype for a while now. I do, however, like their
idea of storing the reference networks using RDF, as that seems a
fitting format for networks. Overall, a laudable goal but one which
will need some more thinking about. I've tried to run Graemlin
using one of their example searches, and
it didn't run (at least today), and the main author's website won't load for me to today, though one of
the other author's pages
did work.

All-in-all, a useful review of recent network methods in bioinformatics, and an interesting goal described. Low-noise reference networks for key model organisms, together with the annotation tracks that would describe deviations from the norm is a good idea.

Topics for discussion (aka leading questions): More fine-grained reference implementations are available, such as Reactome. Reactome provides a curated database of human biological pathways, with inferred orthologous events for 22 other organisms. Do we need reference networks when we're gradually growing our knowledge of reference pathways? Are reference networks of "normal" organisms states helpful? How do we define average? Would the median of the norm of a reaction be different under different environmental conditions? What if what one group considers an average cell differs from another group's average cell? Having reference networks would mean easier comparisons of different network analysis programs. Would this end up being a major use of the networks? Would such comparisons just lead to network analysis programs that fit the reference network, but not work in a generic manner? What do others think?



Srinivasan, B.S., Shah, N.H., Flannick, J.A., Abeliuk, E., Novak, A.F., Batzoglou, S. (2007). Current progress in network research: toward reference networks for key model organisms. Briefings in Bioinformatics, 8(5), 318-332. DOI: 10.1093/bib/bbm038

Read and post comments
|
Send to a friend

original

3 Bioinformatics Research Associate Positions: Newcastle University

There are three bioinformatics jobs (one in pure bioinformatics, one in network analysis, and another in modelling/mathematical biology) currently available within CISBAN, an interdisciplinary centre studying the systems biology of ageing and nutrition. The full particulars are posted both on Nature Jobs and on the Newcastle University Job Vacancies web pages.

Below are links to the various job advertisements, as well as summaries of the jobs themselves. This is a summary of the three Nature Jobs postings, put together on a single page for easy perusal. The closing date for all of these positions is 11 January 2008. This is a great opportunity, though I may be speaking from a biased perspective as I work at CISBAN and find it an interesting and challenging workplace.

  1. Centre for Integrated Systems Biology of Ageing and Nutrition, Institute for Ageing and Health

    Research Positions

    Level F £25,134 – £32,796 p.a.
    Level G: £33,779 – £40,335 p.a.

    We seek scientists to join CISBAN, an exciting new research centre established following a major award (£6.4m) from BBSRC and EPSRC,
    to participate in studies of the mechanisms responsible for ageing and
    how they are affected by nutrition. Ageing is recognised
    internationally as a ‘grand challenge’ and is a field prioritised for
    growth. This post offer opportunities to work in an intensely
    multidisciplinary, world-class centre and contribute to the development
    and application of systems science.

    Research Associate (Bioinformation/Computing Scientist – Applications)

    To
    develop and maintain the computing software and hardware infrastructure
    for systems biology, including a central web portal integrating
    applications for data capture, storage and visualisation and high
    performance computing systems and databases, including a large Linux
    cluster.

    Job reference: A1091R

    Posts are tenable until 30 September 2010.

    Enquiries for the post may be directed to Dr Anil Wipat, School of Computing Science (email: anil.wipat@ncl.ac.uk)
    Further particulars for this post can be found on the University’s web page at http://www.ncl.ac.uk/vacancies/list.phtml?category=Research.

    Applications should be submitted by 11 January 2008 to Professor Tom Kirkwood, CISBAN Director,
    Institute for Ageing and Health, Henry Wellcome Laboratory for
    Biogerontology Research, Newcastle University, Newcastle upon Tyne NE4 6BE (email:
    tom.kirkwood@ncl.ac.uk).
    Committed to Equal Opportunities

  2. Centre for Integrated Systems Biology of Ageing and Nutrition, Institute for Ageing and Health

    Research Positions

    Level F £25,134 – £32,796 p.a.
    Level G: £33,779 – £40,335 p.a.

    We seek scientists to join CISBAN, an exciting new research centre established following a major award (£6.4m) from BBSRC and EPSRC,
    to participate in studies of the mechanisms responsible for ageing and
    how they are affected by nutrition. Ageing is recognised
    internationally as a ‘grand challenge’ and is a field prioritised for
    growth. This post offer opportunities to work in an intensely
    multidisciplinary, world-class centre and contribute to the development
    and application of systems science.

    Research Associate (Bioinformatician – Network Analysis)

    To
    research and develop novel methods of representing and integrating
    molecular and cellular data as networks and apply this methodology to
    identify novel proteins and elucidate novel pathways involved in the
    process of cellular ageing and senescence.

    Job reference: A1090R

    Posts are tenable until 30 September 2010.

    Enquiries for the post may be directed to Dr Anil Wipat, School of Computing Science (email: anil.wipat@ncl.ac.uk)
    Further particulars for this post can be found on the University’s web page at http://www.ncl.ac.uk/vacancies/list.phtml?category=Research.

    Applications should be submitted by 11 January 2008 to Professor Tom Kirkwood, CISBAN Director,
    Institute for Ageing and Health, Henry Wellcome Laboratory for
    Biogerontology Research, Newcastle University, Newcastle upon Tyne NE4 6BE (email:
    tom.kirkwood@ncl.ac.uk).

    Committed to Equal Opportunities

  3. Centre for Integrated Systems Biology of Ageing and Nutrition, Institute for Ageing and Health

    Research Positions

    Level F £25,134 – £32,796 p.a.
    Level G: £33,779 – £40,335 p.a.

    We seek scientists to join CISBAN, an exciting new research centre established following a major award (£6.4m) from BBSRC and EPSRC,
    to participate in studies of the mechanisms responsible for ageing and
    how they are affected by nutrition. Ageing is recognised
    internationally as a ‘grand challenge’ and is a field prioritised for
    growth. This post offer opportunities to work in an intensely
    multidisciplinary, world-class centre and contribute to the development
    and application of systems science.

    Research Associate (Modeller/Mathematical Biologist)

    To
    develop models of molecular and cellular mechanisms of ageing and to
    explore links between ageing, development and evolution from a
    life-course perspective. This post will also involve collaboration
    within the EU Network of Excellence LifeSpan, linking development and ageing.

    Job Ref: A1092R

    Posts are tenable until 30 September 2010.

    Enquiries for the post may be directed to to Professor Tom Kirkwood, Institute for Ageing and Health (email: tom.kirkwood@ncl.ac.uk) Further particulars for this post can be found on the University’s web page.

    Applications should be submitted by 11 January 2008 to Professor Tom Kirkwood, CISBAN Director,
    Institute for Ageing and Health, Henry Wellcome Laboratory for
    Biogerontology Research, Newcastle University, Newcastle upon Tyne NE4 6BE (email:* tom.kirkwood@ncl.ac.uk).

    Committed to Equal Opportunities

Read and post comments |
Send to a friend

original