TY - JOUR
T1 - Profiling the Human Phosphoproteome to Estimate the True Extent of Protein Phosphorylation
AU - Kalyuzhnyy, Anton
AU - Eyers, Patrick A.
AU - Eyers, Claire E.
AU - Bowler-Barnett, Emily
AU - Martin, Maria J.
AU - Sun, Zhi
AU - Deutsch, Eric W.
AU - Jones, Andrew R.
N1 - Publisher Copyright:
© 2022 American Chemical Society. All rights reserved.
PY - 2022/6/3
Y1 - 2022/6/3
N2 - Public phosphorylation databases such as PhosphoSitePlus (PSP) and PeptideAtlas (PA) compile results from published papers or openly available mass spectrometry (MS) data. However, there is no database-level control for false discovery of sites, likely leading to the overestimation of true phosphosites. By profiling the human phosphoproteome, we estimate the false discovery rate (FDR) of phosphosites and predict a more realistic count of true identifications. We rank sites into phosphorylation likelihood sets and analyze them in terms of conservation across 100 species, sequence properties, and functional annotations. We demonstrate significant differences between the sets and develop a method for independent phosphosite FDR estimation. Remarkably, we report estimated FDRs of 84, 98, and 82% within sets of phosphoserine (pSer), phosphothreonine (pThr), and phosphotyrosine (pTyr) sites, respectively, that are supported by only a single piece of identification evidence-the majority of sites in PSP. We estimate that around 62 000 Ser, 8000 Thr, and 12 000 Tyr phosphosites in the human proteome are likely to be true, which is lower than most published estimates. Furthermore, our analysis estimates that 86 000 Ser, 50 000 Thr, and 26 000 Tyr phosphosites are likely false-positive identifications, highlighting the significant potential of false-positive data to be present in phosphorylation databases.
AB - Public phosphorylation databases such as PhosphoSitePlus (PSP) and PeptideAtlas (PA) compile results from published papers or openly available mass spectrometry (MS) data. However, there is no database-level control for false discovery of sites, likely leading to the overestimation of true phosphosites. By profiling the human phosphoproteome, we estimate the false discovery rate (FDR) of phosphosites and predict a more realistic count of true identifications. We rank sites into phosphorylation likelihood sets and analyze them in terms of conservation across 100 species, sequence properties, and functional annotations. We demonstrate significant differences between the sets and develop a method for independent phosphosite FDR estimation. Remarkably, we report estimated FDRs of 84, 98, and 82% within sets of phosphoserine (pSer), phosphothreonine (pThr), and phosphotyrosine (pTyr) sites, respectively, that are supported by only a single piece of identification evidence-the majority of sites in PSP. We estimate that around 62 000 Ser, 8000 Thr, and 12 000 Tyr phosphosites in the human proteome are likely to be true, which is lower than most published estimates. Furthermore, our analysis estimates that 86 000 Ser, 50 000 Thr, and 26 000 Tyr phosphosites are likely false-positive identifications, highlighting the significant potential of false-positive data to be present in phosphorylation databases.
KW - PeptideAtlas
KW - PhosphoSitePlus
KW - UniProt
KW - database
KW - evolutionary conservation
KW - false discovery rate
KW - mass spectrometry
KW - phosphopeptides
KW - phosphoproteomics
KW - phosphorylation
KW - phosphosites
KW - proteome
KW - proteomics
UR - http://www.scopus.com/inward/record.url?scp=85130741106&partnerID=8YFLogxK
U2 - 10.1021/acs.jproteome.2c00131
DO - 10.1021/acs.jproteome.2c00131
M3 - Article
C2 - 35532924
AN - SCOPUS:85130741106
SN - 1535-3893
VL - 21
SP - 1510
EP - 1524
JO - Journal of Proteome Research
JF - Journal of Proteome Research
IS - 6
ER -