Thomas Lengauer, Max-Planck Institute for Informatics
Background in mathematics and Computer Science. A very good example of a cross-disciplinary researcher. He is a founding member of the ECCB.
Start when the drugs are available on the marketplace and they support personalized medicine, and which drugs to give to AIDS patients. AIDS has killed over 25 million people since 1981 and 33 million infected with HIV as of 2007 (Source UNAIDS). AIDS awareness campaigns have waned in recent years, and as a consequence there is an increase in infection rates again. AIDS virus has a small genome, only 10,000 nucleotides. Attaches via surface proteins, integrates into cell and sheds capsid, exposes RNA genome, and then reverse transcriptase makes double-strand DNA which is then transported into the nucleus and is then spliced into the genome and is then an inseperable part of the infected cell. Can sit for a long time until the cell divides, and then the cell machinery builds the viral particle. The virus borrows a bit of the cellular membrane for its shell, and then there is a maturation phase. The protease is important at this stage. And then the virus eventually becomes infective again. The AIDS virus is by far the most well-understood virus.
There are a number of drugs that blocks the fusion of the virus with the cell, 17 blocking reverse transcriptase, etc. It is extremely dynamic in the rate of its evolution. The AIDS patient can have a turnover of 10 billion virus particles per day – and there are many variants of the virus – a drug may be effective against the wt, but then the minority population will grow. So, what do you do? This is the main medical question. We don’t have a drug cocktail that can catch all of them – no drug therapy works forever. In the drug therapy, you combine different classes of drugs with HAART.
In the past, they’ve built mutation tables – global collection of clinical experience. An expert group will build this table, as there may be resistance and they don’t want to subject patients to now-useless drugs: this also has limited expressivity. Expert systems can help with this (medical communities call this algorithms, which is wrong: they are rule-based expert systems!). Interdependencies between mutations cannot be captured my butation tables. Rule-based expert systems do exactly that. Is this kind of resistance analysis objective?
Experimental resistance data includes: phenotypic data (extract virus and culture and expose to drugs in rising concentrations; curve comparison can figure out which drugs the resistance occurs at; this is called resistance factor; but this data is too expensive and too slow to make for clinicians), and genotypic data (id the genome of the viral variant).
Analyzing the current virus. They’re doing multivariate statistical learning with additional traditional techniques. The training data is the genotype-phenotype pairs of 1000+ HIV variants. Quality criteria is predictive power and interpretability. Then there is regression and classification (grouping into categories). Classification comes with cut-offs (resistant/susceptible), but things aren’t always that simple. The classic interpretive model is a decision tree, where you find the mutation that best separates resistant form susceptible viruses. Continue analogously in the two resulting data subsets. Example: protease inhibitor Saquinavir. http://www.geno2pheno.org, so far most used clinical tool.
Genotype is aligned to the wt and mutations are identified. Using linear SVM for regression: a line for each drug and have est resistance factor, and normalization with Z-score, and the scored mutations. Some mutations, which confer resistance to some things (e.g. 76V) actual confers re-sensitisation and therefore would have a positive effect.
Estimating the Viral Evolution. Often the virus follows specific mutational paths into resistance, and these are partly known from clinical practice. Can they find such paths in their database? They have lots of patients, but only a few time points on each patient (no longitudinal data). the TAM1 path is found by seeing the virus does *not* follow every possible path (Thanks Ruchira – missed that in the talk). They model the viral evolution to the resistance by tree structure, where every tree represents several alternatives for viral evolution. One tree collects the noise in the data, and the results can be mapped along a timeline. In this way you can get a probability of resistance in a quantitative time frame. Therapy optimization with THEO.
They have since gone European with their data: Euresist database (started Oct 2008). They built 3 prediction engines, of which THEO is one. Error in classifying the therapy into effective/not effective without THEO is above 24% and with THEO is 15%. Practices and labs that treat 2/3 of the AIDS patients in Germany use the geno2pheno software, and the server for it is accessed from about 30 countries.
Coreceptor usage. 1% of Caucasian population does not have this coreceptor. These people cannot be infected by HIV. A couple of different coreceptors are targeted, and the virus can switch once you’re in therapy. People with the CCR5 deletion don’t get AIDS, so the virus first goes through here, but it later switches to the other one (sentence from Ruchira, as I missed it – see FF discussion below for source).
Genotypic prediction of viral tropism: input around 35 aa of the V3 loop of the viral surgace protein gp120. Output is the score that is the larger the more likely the virus uses CXCR4. The method used is SVM. Accuracy increases if structural data is added. There are two kinds of data: clonal (in research setting) and bulk data (clinical routine). Sensitivity goes from 80% to 40% if you move from clonal to bulk data. Adding info on clinical correlates raises prediction accuracy. The power of predicting clinical follow-up is much higher than of predicting tropism phenotype. go away from sanger sequencing to increase accuracy – use ultra-deep sequencing. It yields 1000s of sequences per patient sample.
Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!