Integrative Bioinformatics 2007, Day 3: Searls Keynote

David Searls (GlaxoSmithKline Pharmaceuticals, USA)

Other than where specified, these are my notes from the IB07 Conference, and not expressions of opinion. Any errors are probably just due to my
own misunderstanding.

A metaphor for SB: the organizing paradigms of systems and languages map neatly to each other (the "parts list" is the vocabulary, or lexicon). "Connectivity" is the rule-based syntax that determines how words may be arranged. "Function" is the subject matter of semantics. Analogous organizing paradigms can be found in a number of related domains, including systems (componentry, connectivity & behaviour: these match the previous set of 3 words). The equivalent triple for proteins are sequence, structure & function. In some ways, you can think of proteins as systems themselves. The semantics, or meaning, of a system is separate from its pragmatics, which is what it does, usually in the larger context of a discourse. This matches the pure function of a protein and it's actual role.

How does complexity arise in biological networks? Pleiotropy & redundancy. Networks ramify by overlapping function. Pleiotropy (multifunctionality) is common, as in the case of "moonlighting" proteins. Redundancy of function is the flip side of pleiotropy, and such redundancy (full or partial) contributes robustness. Linguists have similar terms for wordnets: polysemy and synonymy.

Network Emergent properties:In connecting pathways into networks, it has been suggested that important novel properties emerge. This idea has started to take hold in the biology community. The phrase can be traced back to the early part of the 20th century and the idea of the unity of science. Reductionism: science generally proceeds by reduction to fundamental components and behaviours. Emergence: complex systems are thought to demonstrate this, such that "the whole is greater than the sum of the parts", and such behaviour could not be predicted a priori. Reductionism seems to be "under fire" at the moment: something completely new and different is the best thing. However, it is actually fair to say that systems biology seems to say that reductionism will no longer do by itself.

The 19th-century logician Gottlob Frege set up competing principles of "meaning"; firstly, compositionality (the meaning of the whloe is a function of the meaning of the parts), and secondly contextuality (no meaning can exist independently of the overall context). Contextuality can be dealt with in a compositional context if you know how much context will be necessary. For instance, substrings have variable pronunciations: "ough" has 6 different pronunciations, but looking at the letters around the set of letters, you can determine pronunciation. But how many more letters do we need to look at? Same thing happens with whole words: does vs does. How do we use the context to determine pronunciation here? In proteins, the string ASVKQVS is part of a beta-sheet in an amino-peptidase, and an alpha-helix in a guanylate kinase.

From a compositional viewpoint, examine the example of artificial neural networks, where you imitate life to try to "learn" functions of many variables. Minksy & Papert showed that early nets couldn't classify some functions, such as exclusive-OR (that is, X or Y but not both), but adding a "hidden layer" of neurons fixed this. This seems to be a case for emergent properties. But is it really? If you design from scratch, ab initio, you can get it, so it is just a case of simple logic. However, could emergence simply be a matter of scale? Would imponderable properties arise in larger hidden layers?

There are some interesting parallels between neural network research of 20 years ago and omic datasets of 5 years ago. For instance in NNs there was a belief/concern that a net is a "black box" whose arhitecture is opaque to interpretation (then they worked on rule extraction). for Omics, profiles may emerge in the absence of any clues to the mechanism. Secondly, in NN if hidden layers are too big, nets tend not to generalize buyt just to memorize (overfit). In Omics, high-d data with few samples can allow statistical artifact. Finally, in NN Nets learn differently upon being retrained (nondeterminism). Luckily, these concerns in NN faded over time.

In what ways might complex biological systems resist reductionist description? Firstly Dependency (too highly interconnected to afford discrete, mechanistic explanations), and secondly, ambiguity (too pleiotropic and nondeterministic for definitive or tractable analysis). There is also dependency, of course, in biological systems: nucleotide base pairs embody dependencies in structural RNAs. 2o structure is an abstraction of this. Dependencies are "stretched" by linearizing the primary sequence. Also, side-chain interactions embody dependencies in folded protein chains. 2o structure is a modular abstraction. Dependencies are parallel / antiparallel orientations and chirality.

Folding ambiguity example: Attenuators use alternative RNA 2o structure by exploiting the syntactic ambiguity of the underlying grammar. He then introduced the Chomsky Hierarchy, but it was quite a complex table and cannot be reproduced here. Not only is the Chomsky Hierarchy useful for understanding modularity, but ICs are abstracted hierarchical modules and should also be considered.

Rosetta Stone Proteins: proteins that interact or participate in the same pathway are often fused. Catalogues of fusions can predict function. Circuit design has steadily evolved to higher levels of abstraction & modularity: standard cell VLSI design used libraries of validated, reusable circuit building blocks. Full custom is reserved for optimization, and hardware description languages (HDLs) lets chips be deigned like writing software. Hard and software are a continuum, therefore. Microcode, programmable gate arrays, etc. Some bioinformaticists have written psuedocode to describe biological pathways. In 1968 computer scientist Edsger Dijkstra wrote a now-classic short note entitled "GOTO considered harmful". In it he critcized programming constructs that allow undisciplined jumps in flow of control leading to so-called "spaghetti code", which made larger programs unwieldy. Therefore he helped to launch the structured programming movement, which enforced a strictly nested modularity for more manageable growth, debugging, modification, etc.

Does nature write spaghetti code? Well looks like pasta to the uninitiated 😉 However, if you actually look how things are put together, you'll notice it probably doesn't. Protein domains combine predominately by concatenation or insertion, as seen in pyruvate kinase. Do proteins interleave? Very rarely do proteins seem to have interleaved domain structures, like D-maltodextrin binding protein with three inter-domain crossings (perhaps due to a translocation?) So, it seems quite rare. This puts biological structure and human language at the same level in the Chomsky hierarchy.

Organizing paradigms for linguistics can readily extend to proteins and systems. Abstracted, hierarchical modularity is a means to support "controlled" growth in complexity, in both design and evolution. The Chomsky hierarchy offers tools to measure and analyze this complexity. Proteins and systems form a continuum, exhibiting both compositionality and contextuality (but emergence…? Perhaps not).

The "Computational Thinking" Movement has been growing in recent years, and such work could help people who aren't used to thinking of modularization.

My opinion: Fantastic talk! Great way to start the day.

Read and post comments |
Send to a friend



Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s