Keynote: A global view on protein expression based on the Human Protein Atlas (ISMB 2009)
Mathias Uhlen, Royal Institute of Technology
Introduction: Works a lot on affinity reagents. Invented and developed pyrosequencing technology (http://en.wikipedia.org/wiki/Pyrosequencing) now used in 454.
Systematic biology introduction. 18th century – biologist. 19th – chemist (1/3 of elements discovered in Sweden in this century). 20th – physicists and at the end, computer scientist. He’d now like to say that the 21st century is the century of medicine. He spends most of his time on proteins, and is more complicated from a computational POV, but that does make it more fun. An impressive log-scale plot of number of bases sequenced since 1965. Pyrosequencing in 1998 all the way through to PacBio in 2010. Therefore can talk about personalized genomics. Bioinformatics is the key in the new era of genomics.
Systems biology /omics is going to be fantastic in the next 10 years. Genomics will continue to be a fundamental resource. Image of contradictory sign in Paris: you know where you want to go, but not how to get there. So there is a real need for protein probes (antibodies) – this isn’t easy, and a nice article about this in Nature July 7 2007 “The generation game”. Therefore they have the Human Antibody Initiative (HAI). One of the efforts at the HAI is to look at commercially-available antibodies and analyzed >5000 antibodies from 51 commercial companies, and looked at the success rate. Some companies were virtually 100% and others 0% worked. About 50% of the antibodies seem to work (verage success rate) Berglund et al. 2008.
So therefore they developed antibodypedia, which is a portal for validated antibodies. If you have 2 antibodies, you can compare results in various assay platforms so he wants to develop paired antibodies for every protein target. Will take a while. Also, perform epitope mapping of antibodies (Nature Methods 2008). Epitope mapping leads to therapeutic targets, including Her2 (Rockberg and Uhlen 2009 Molecular Oncology in press).
The HPR project. Human Proteome Resource. http://www.proteinatlas.org/ . Public multidisciplinary resource invovling systematic generation of antibodies. 65 researchers at KTH Stockholm, 25 in Uppsala, 15 in India, and a couple other places incl China. It’s a factory of generating clones -> protein factory -> immunization -> affinity purification -> human protein atlas portal. The gene factory does about 200 clones per week, and is in full production. Generating 2 TB data per week.
The antigen design uses PRESTIGE, which is a bioinformatics approach to select antigens using the protein epitope signature tag (PrEST). 19832 genes initiated. They have an automated annotation system for cells, and use pathologists for tissues. They also work on HT subcellular localization. With confocal microscopy can get “exquisite” resolution. Fantastic images, but difficult to scale up to HT. They have a SVM that seems to be able to annotate 28 different parts of the cell. Current weekly output: 50 proteins for week 50000 images. All goes into the HPA.
The Human Protein Atlas update. All data publicly available. Expression data not downloadable, but hope to change that in future. About 2/3 of data comes from their own antibodies, and 1/3 from commerical antibodies. Have doubled the number of antibodies last year. Have about 1/3 of the human proteins based on UniProt. About a further 50% of the human genes are in the pipeline. About 11% have been started and failed, so need to start again. Therefore only 6 % they haven’t started, but will start this year. Most recent release: 7 mln images. The next 5 yrs are also about getting the paired antibodies mentioned earlier. All antibodies available to the public via Prestige Antibodies.
They’ve also started on the Rodent Brain Protein Atlas. The majority of antibodies developed for the human system also work for rodents.
Global expression analysis. how many proteins are expressed in a given cell? How large is the proteome? Ensembl thinks that the genes are up to 23,000, but UniProt thinks 20,000, but the number is probably with that (for genes coding for proteins).How many cells does a particular protein express in? Housekeeping proteins – in lots of cells. Tissue-specific proteins – in few cells. They do analysis using various subfractions of antibodies. Looked at global expression in 45 human cell lines. Look at global expression in 3 cell lines via immunofluorescence (IF). Very few proteins are cell-type specific confirmed by expression profiles in those 3 cell lines via IF and via cytoscape visualization. Look at the tissues and see instead the same level of expression in all three tissues, and more that are only expressed in one tissue, but it is still less than is expressed in all (50% are expressed in all 3 tissues). <2% expressed exclusively in a single cell type (84 proteins), including some previously uncharacterized ones. PROSPECTS: PROteomics SPECification in Time and Space. Look at MCF-7 – human breast cancer cell line. Also working on next-generation sequencing of cDNAs from U2-OS human cell line.
A high fraction of all proteins expressed everywhere – few cell-specific proteins and group-specific proteins. Global expression profiling harmonzing well with the current concept of embryology and histology. Tissue specificty is acieved by precise regulation of protein levels in space and time.
Biobank profiling (translational medicine). How to use the above for biomarker discovery. Important to find biomarkers for early detection of disease. Have used suspension bead arrays. Looked at patients with different kidney diseases. Took 4 hrs to run 26000 assays looking at targets and the signals from the assay. The good ones you do the analysis of with the plasma. Then identify biomarkers very quickly. Found 2 really good candidate biomarkers for the disease, which needs to be tested now in larger clinical cohorts. Next-gen plasma profiling for biomarker discovery… They’re part of ENGAGE.
Nature, “The big ome” – 24 April 2008, editorial – he found it balanced. “Proteomics Ponders Prime Time”, Science, 26 September 2008, in response to the Nature article.
Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!