The statistics of experimental and putative PTM sites in dbPTM

Due to the inaccessibility of database contents in several online PTM resources, a total eleven biological databases related to PTMs are integrated in dbPTM. To solve the heterogeneity among the data collected from different sources, the reported modification sites are mapped to the UniProtKB protein entries using sequence comparison. With the high-throughput of mass spectrometry-based methods in post-translational proteomics, this update also includes manually curated MS/MS-identified peptides associated with PTMs from research articles through a literature survey. First, a table list of PTM-related keywords is constructed by referring to the UniProtKB PTM list ( and the annotations of RESID. Then, all fields in the PubMed database are searched based on the keywords of the constructed table list. This is then followed by downloading the full text of the research articles. For the various experiments of proteomic identification, a text-mining system is developed to survey full-text literature that potentially describes the site-specific identification of modified sites.Furthermore, in order to determine the locations of PTMs on a full-length protein sequence, the experimentally verified MS/MS peptides are then mapped to UniProtKB protein entries based on its database identifier (ID) and sequence identity. In the process of data mapping, MS/MS peptides that cannot align exactly to a protein sequence are discarded. Finally, each mapped PTM site is attributed with a corresponding literature (PubMed ID). All types of PTM were categorized by the modified amino acid, including positive set with tab-delimited format. Positive set contains UniProt ID, modified position, PTM description, and the sequence with upstream 6 amino acids to downstream 6 amino acids. However, some types of PTM, which were occurred in N-terminal or C-terminal protein, were extracted the sequences with window length 0 ~ +10 or -10 ~ 0 (position 0 is modified site), respectively.

PTM TypeNumber of experimental SitesNumber of literaturesDownload
Phosphorylation571,03252,381WindowsMAC / Linux
Acetylation137,44221,251WindowsMAC / Linux
Ubiquitination118,4951,130WindowsMAC / Linux
Succinylation17,59662WindowsMAC / Linux
Methylation17,4838,806WindowsMAC / Linux
Malonylation8,73614WindowsMAC / Linux
N-linked Glycosylation7,9161,842WindowsMAC / Linux
O-linked Glycosylation6,3403,785WindowsMAC / Linux
Sumoylation5,450178WindowsMAC / Linux
S-nitrosylation4,203324WindowsMAC / Linux
Glutathionylation4,16192WindowsMAC / Linux
Amidation2,907896WindowsMAC / Linux
Hydroxylation1,725285WindowsMAC / Linux
Pyrrolidone carboxylic acid908529WindowsMAC / Linux
Glutarylation7673WindowsMAC / Linux
Palmitoylation1,094382WindowsMAC / Linux
Gamma-carboxyglutamic acid43987WindowsMAC / Linux
Crotonylation3686WindowsMAC / Linux
Oxidation35924WindowsMAC / Linux
Myristoylation279182WindowsMAC / Linux
C-linked Glycosylation25517WindowsMAC / Linux
Sulfation251120WindowsMAC / Linux
Formylation25040WindowsMAC / Linux
Citrullination12219WindowsMAC / Linux
GPI-anchor8247WindowsMAC / Linux
Nitration7715WindowsMAC / Linux
S-diacylglycerol5748WindowsMAC / Linux
Carboxylation4038WindowsMAC / Linux
Lipoylation3529WindowsMAC / Linux
Carbamidation221WindowsMAC / Linux
Neddylation114WindowsMAC / Linux
Pyruvate96WindowsMAC / Linux
S-linked Glycosylation65WindowsMAC / Linux

The Benchmark Data Set for PTM Analyses

Owing to the labor-intensive MS/MS-based experiments, a variety of computational methods have been proposed to identify putative PTM sites based on protein sequence. With numerous PTM prediction methods, it is difficult to determine a best prediction tool merely according to their cross-validation performances. Although most of these studies have provided independent testing results for their prediction methods, there is no standard dataset for the evaluation of predictive powers among various PTM prediction tools. Therefore, this update compiles non-homologous benchmark datasets to evaluate the predictive power for PTM sites prediction tools, that provides suggestions to users with the need to predict PTM sites with high sensitivity (Sn), high specificity (Sp), or balanced Sn and Sp.

PTM TypeNumber of proteinsNumber of positive sitesNumber of negative sitesDownload
Phosphorylation by CDK1,0201,50329,823WindowsMAC / Linux
Phosphorylation by MAPK8571,27022,436WindowsMAC / Linux
Phosphorylation by PKA9051,20929,813WindowsMAC / Linux
Phosphorylation by PKC69194324,207WindowsMAC / Linux
Phosphorylation by CK251181915,387WindowsMAC / Linux
Phosphorylation by CAMKL45455620,129WindowsMAC / Linux
Phosphorylation by GSK29139710,328WindowsMAC / Linux
Phosphorylation by AKT35138014,617WindowsMAC / Linux
Phosphorylation by CAMK225436612,575WindowsMAC / Linux
Phosphorylation by CK11743395,808WindowsMAC / Linux
Phosphorylation by RSK2212156,985WindowsMAC / Linux
Phosphorylation by GRK771472,310WindowsMAC / Linux
Phosphorylation by PKG1261457,311WindowsMAC / Linux
Phosphorylation by DYRK1091423,470WindowsMAC / Linux
Phosphorylation by MAPKAPK1001253,096WindowsMAC / Linux
Phosphorylation by DMPK991093,533WindowsMAC / Linux
Phosphorylation by PKD88973,401WindowsMAC / Linux
Phosphorylation by PDK177932,274WindowsMAC / Linux
Phosphorylation by SGK63773,057WindowsMAC / Linux
Phosphorylation by RAD5329751,560WindowsMAC / Linux
Phosphorylation by DAPK51531,284WindowsMAC / Linux
Phosphorylation by PKN2650866WindowsMAC / Linux
Phosphorylation by CAMK134442,342WindowsMAC / Linux
Phosphorylation by MLCK2034484WindowsMAC / Linux
Phosphorylation by NDR28321,096WindowsMAC / Linux
Acetylation5,64614,4078,704WindowsMAC / Linux
Citrullination66761,501WindowsMAC / Linux
C-linked Glycosylation39113159WindowsMAC / Linux
Crotonylation2011736WindowsMAC / Linux
Formylation1301721,452WindowsMAC / Linux
Gamma-carboxyglutamic acid54319553WindowsMAC / Linux
Glutarylation2177252,543WindowsMAC / Linux
Glutathionylation1,4933,5556,617WindowsMAC / Linux
Hydroxylation2011,2702,900WindowsMAC / Linux
Lipoylation2829779WindowsMAC / Linux
Malonylation2,7687,63517,371WindowsMAC / Linux
Methylation5,43814,68636,501WindowsMAC / Linux
Nitration6164983WindowsMAC / Linux
N-linked Glycosylation1,9692,5178,330WindowsMAC / Linux
O-linked Glycosylation1,2984,47037,969WindowsMAC / Linux
S-diacylglycerol235759WindowsMAC / Linux
S-nitrosylation1,4343,5925,803WindowsMAC / Linux
Succinylation2,5995,0495,526WindowsMAC / Linux
Sumoylation1,4325,19116,066WindowsMAC / Linux
Ubiquitination4,4539,7678,579WindowsMAC / Linux