TY - JOUR
T1 - Detection of the Arabidopsis Proteome and Its Post-translational Modifications and the Nature of the Unobserved (Dark) Proteome in PeptideAtlas
AU - van Wijk, Klaas J.
AU - Leppert, Tami
AU - Sun, Zhi
AU - Kearly, Alyssa
AU - Li, Margaret
AU - Mendoza, Luis
AU - Guzchenko, Isabell
AU - Debley, Erica
AU - Sauermann, Georgia
AU - Routray, Pratyush
AU - Malhotra, Sagunya
AU - Nelson, Andrew
AU - Sun, Qi
AU - Deutsch, Eric W.
N1 - Publisher Copyright:
© 2023 American Chemical Society.
PY - 2024/1/5
Y1 - 2024/1/5
N2 - This study describes a new release of the Arabidopsis thaliana PeptideAtlas proteomics resource (build 2023-10) providing protein sequence coverage, matched mass spectrometry (MS) spectra, selected post-translational modifications (PTMs), and metadata. 70 million MS/MS spectra were matched to the Araport11 annotation, identifying ∼0.6 million unique peptides and 18,267 proteins at the highest confidence level and 3396 lower confidence proteins, together representing 78.6% of the predicted proteome. Additional identified proteins not predicted in Araport11 should be considered for the next Arabidopsis genome annotation. This release identified 5198 phosphorylated proteins, 668 ubiquitinated proteins, 3050 N-terminally acetylated proteins, and 864 lysine-acetylated proteins and mapped their PTM sites. MS support was lacking for 21.4% (5896 proteins) of the predicted Araport11 proteome: the “dark” proteome. This dark proteome is highly enriched for E3 ligases, transcription factors, and for certain (e.g., CLE, IDA, PSY) but not other (e.g., THIONIN, CAP) signaling peptides families. A machine learning model trained on RNA expression data and protein properties predicts the probability that proteins will be detected. The model aids in discovery of proteins with short half-life (e.g., SIG1,3 and ERF-VII TFs) and for developing strategies to identify the missing proteins. PeptideAtlas is linked to TAIR, tracks in JBrowse, and several other community proteomics resources.
AB - This study describes a new release of the Arabidopsis thaliana PeptideAtlas proteomics resource (build 2023-10) providing protein sequence coverage, matched mass spectrometry (MS) spectra, selected post-translational modifications (PTMs), and metadata. 70 million MS/MS spectra were matched to the Araport11 annotation, identifying ∼0.6 million unique peptides and 18,267 proteins at the highest confidence level and 3396 lower confidence proteins, together representing 78.6% of the predicted proteome. Additional identified proteins not predicted in Araport11 should be considered for the next Arabidopsis genome annotation. This release identified 5198 phosphorylated proteins, 668 ubiquitinated proteins, 3050 N-terminally acetylated proteins, and 864 lysine-acetylated proteins and mapped their PTM sites. MS support was lacking for 21.4% (5896 proteins) of the predicted Araport11 proteome: the “dark” proteome. This dark proteome is highly enriched for E3 ligases, transcription factors, and for certain (e.g., CLE, IDA, PSY) but not other (e.g., THIONIN, CAP) signaling peptides families. A machine learning model trained on RNA expression data and protein properties predicts the probability that proteins will be detected. The model aids in discovery of proteins with short half-life (e.g., SIG1,3 and ERF-VII TFs) and for developing strategies to identify the missing proteins. PeptideAtlas is linked to TAIR, tracks in JBrowse, and several other community proteomics resources.
KW - Arabidopsis
KW - E3 ligases
KW - PeptideAtlas
KW - ProteomeXchange
KW - machine learning
KW - post-translational modifications
KW - signaling peptides
UR - http://www.scopus.com/inward/record.url?scp=85179174463&partnerID=8YFLogxK
U2 - 10.1021/acs.jproteome.3c00536
DO - 10.1021/acs.jproteome.3c00536
M3 - Article
AN - SCOPUS:85179174463
SN - 1535-3893
VL - 23
SP - 185
EP - 214
JO - Journal of Proteome Research
JF - Journal of Proteome Research
IS - 1
ER -