Metrics for the Human Proteome Project 2013-2014 and strategies for finding missing proteins.

Lane L, Bairoch A, Beavis RC, Deutsch EW, Gaudet P, Lundberg E, Omenn GS

J. Proteome Res. 13 (1) 15-20 [2014-01-03; online 2013-12-25]

One year ago the Human Proteome Project (HPP) leadership designated the baseline metrics for the Human Proteome Project to be based on neXtProt with a total of 13,664 proteins validated at protein evidence level 1 (PE1) by mass spectrometry, antibody-capture, Edman sequencing, or 3D structures. Corresponding chromosome-specific data were provided from PeptideAtlas, GPMdb, and Human Protein Atlas. This year, the neXtProt total is 15,646 and the other resources, which are inputs to neXtProt, have high-quality identifications and additional annotations for 14,012 in PeptideAtlas, 14,869 in GPMdb, and 10,976 in HPA. We propose to remove 638 genes from the denominator that are "uncertain" or "dubious" in Ensembl, UniProt/SwissProt, and neXtProt. That leaves 3844 "missing proteins", currently having no or inadequate documentation, to be found from a new denominator of 19,490 protein-coding genes. We present those tabulations and web links and discuss current strategies to find the missing proteins.

