TY - JOUR
T1 - A chromosome-centric human proteome project (C-HPP) to characterize the sets of proteins encoded in chromosome 17
AU - Liu, Suli
AU - Im, Hogune
AU - Bairoch, Amos
AU - Cristofanilli, Massimo
AU - Chen, Rui
AU - Deutsch, Eric W.
AU - Dalton, Stephen
AU - Fenyo, David
AU - Fanayan, Susan
AU - Gates, Chris
AU - Gaudet, Pascale
AU - Hincapie, Marina
AU - Hanash, Samir
AU - Kim, Hoguen
AU - Jeong, Seul Ki
AU - Lundberg, Emma
AU - Mias, George
AU - Menon, Rajasree
AU - Mu, Zhaomei
AU - Nice, Edouard
AU - Paik, Young Ki
AU - Uhlen, Mathias
AU - Wells, Lance
AU - Wu, Shiaw Lin
AU - Yan, Fangfei
AU - Zhang, Fan
AU - Zhang, Yue
AU - Snyder, Michael
AU - Omenn, Gilbert S.
AU - Beavis, Ronald C.
AU - Hancock, William S.
PY - 2013/1/4
Y1 - 2013/1/4
N2 - We report progress assembling the parts list for chromosome 17 and illustrate the various processes that we have developed to integrate available data from diverse genomic and proteomic knowledge bases. As primary resources, we have used GPMDB, neXtProt, PeptideAtlas, Human Protein Atlas (HPA), and GeneCards. All sites share the common resource of Ensembl for the genome modeling information. We have defined the chromosome 17 parts list with the following information: 1169 protein-coding genes, the numbers of proteins confidently identified by various experimental approaches as documented in GPMDB, neXtProt, PeptideAtlas, and HPA, examples of typical data sets obtained by RNASeq and proteomic studies of epithelial derived tumor cell lines (disease proteome) and a normal proteome (peripheral mononuclear cells), reported evidence of post-translational modifications, and examples of alternative splice variants (ASVs). We have constructed a list of the 59 missing proteins as well as 201 proteins that have inconclusive mass spectrometric (MS) identifications. In this report we have defined a process to establish a baseline for the incorporation of new evidence on protein identification and characterization as well as related information from transcriptome analyses. This initial list of missing proteins that will guide the selection of appropriate samples for discovery studies as well as antibody reagents. Also we have illustrated the significant diversity of protein variants (including post-translational modifications, PTMs) using regions on chromosome 17 that contain important oncogenes. We emphasize the need for mandated deposition of proteomics data in public databases, the further development of improved PTM, ASV, and single nucleotide variant (SNV) databases, and the construction of Web sites that can integrate and regularly update such information. In addition, we describe the distribution of both clustered and scattered sets of protein families on the chromosome. Since chromosome 17 is rich in cancer-associated genes, we have focused the clustering of cancer-associated genes in such genomic regions and have used the ERBB2 amplicon as an example of the value of a proteogenomic approach in which one integrates transcriptomic with proteomic information and captures evidence of coexpression through coordinated regulation.
AB - We report progress assembling the parts list for chromosome 17 and illustrate the various processes that we have developed to integrate available data from diverse genomic and proteomic knowledge bases. As primary resources, we have used GPMDB, neXtProt, PeptideAtlas, Human Protein Atlas (HPA), and GeneCards. All sites share the common resource of Ensembl for the genome modeling information. We have defined the chromosome 17 parts list with the following information: 1169 protein-coding genes, the numbers of proteins confidently identified by various experimental approaches as documented in GPMDB, neXtProt, PeptideAtlas, and HPA, examples of typical data sets obtained by RNASeq and proteomic studies of epithelial derived tumor cell lines (disease proteome) and a normal proteome (peripheral mononuclear cells), reported evidence of post-translational modifications, and examples of alternative splice variants (ASVs). We have constructed a list of the 59 missing proteins as well as 201 proteins that have inconclusive mass spectrometric (MS) identifications. In this report we have defined a process to establish a baseline for the incorporation of new evidence on protein identification and characterization as well as related information from transcriptome analyses. This initial list of missing proteins that will guide the selection of appropriate samples for discovery studies as well as antibody reagents. Also we have illustrated the significant diversity of protein variants (including post-translational modifications, PTMs) using regions on chromosome 17 that contain important oncogenes. We emphasize the need for mandated deposition of proteomics data in public databases, the further development of improved PTM, ASV, and single nucleotide variant (SNV) databases, and the construction of Web sites that can integrate and regularly update such information. In addition, we describe the distribution of both clustered and scattered sets of protein families on the chromosome. Since chromosome 17 is rich in cancer-associated genes, we have focused the clustering of cancer-associated genes in such genomic regions and have used the ERBB2 amplicon as an example of the value of a proteogenomic approach in which one integrates transcriptomic with proteomic information and captures evidence of coexpression through coordinated regulation.
KW - Chromosome 17 parts list
KW - Chromosome-centric Human Proteome Project
KW - ERBB2
KW - Oncogene
UR - http://www.scopus.com/inward/record.url?scp=84874081594&partnerID=8YFLogxK
U2 - 10.1021/pr300985j
DO - 10.1021/pr300985j
M3 - Review article
C2 - 23259914
AN - SCOPUS:84874081594
SN - 1535-3893
VL - 12
SP - 45
EP - 57
JO - Journal of Proteome Research
JF - Journal of Proteome Research
IS - 1
ER -