The PTM-related Information
About 5% of Swiss-Prot proteins have the known tertiary structures (PDB). For proteins without known tertiary structures, two previously published tools, RVP-net (Shandar Ahmad, et al., 2003) and PSIPRED (McGuffin LJ, et al., 2000), were applied to predict the solvent accessibility and the secondary structure, respectively. RVP-net presents a feed-forward type neural network which can predict a real value ranging from 0% to 100% of Accessible Surface Areas (ASA) for amino acid residues, based on their neighborhood information. We applied the RVP-net program to fully predict the real-valued ASA for the amino acid residues of all Swiss-Prot proteins.
The Benchmark Data Set for PTM Analyses
Owing to the labor-intensive MS/MS-based experiments, a variety of computational methods have been proposed to identify putative PTM sites based on protein sequence. With numerous PTM prediction methods, it is difficult to determine a best prediction tool merely according to their cross-validation performances. Although most of these studies have provided independent testing results for their prediction methods, there is no standard dataset for the evaluation of predictive powers among various PTM prediction tools. Therefore, this update compiles non-homologous benchmark datasets to evaluate the predictive power for PTM sites prediction tools, that provides suggestions to users with the need to predict PTM sites with high sensitivity (Sn), high specificity (Sp), or balanced Sn and Sp.
The experimentally verified PTM sites
Due to the inaccessibility of database contents in several online PTM resources, a total eleven biological databases related to PTMs are integrated in dbPTM. To solve the heterogeneity among the data collected from different sources, the reported modification sites are mapped to the UniProtKB protein entries using sequence comparison. With the high-throughput of mass spectrometry-based methods in post-translational proteomics, this update also includes manually curated MS/MS-identified peptides associated with PTMs from research articles through a literature survey. First, a table list of PTM-related keywords is constructed by referring to the UniProtKB/SwissProt PTM list (http://www.uniprot.org/docs/ptmlist.txt) and the annotations of RESID. Then, all fields in the PubMed database are searched based on the keywords of the constructed table list. This is then followed by downloading the full text of the research articles. For the various experiments of proteomic identification, a text-mining system is developed to survey full-text literature that potentially describes the site-specific identification of modified sites.Furthermore, in order to determine the locations of PTMs on a full-length protein sequence, the experimentally verified MS/MS peptides are then mapped to UniProtKB protein entries based on its database identifier (ID) and sequence identity. In the process of data mapping, MS/MS peptides that cannot align exactly to a protein sequence are discarded. Finally, each mapped PTM site is attributed with a corresponding literature (PubMed ID).
The HMM Predicted PTM sites
In this update, KinasePhos-like method was applied to 20 types of PTM with enough experimentally verified PTM sites (more than 30 sites). To reduce the number of false positive predictions by, we set the predictive parameters as the values when the prediction specificity is 100% and fully detect the potential PTM sites against Swiss-Prot protein sequences. The predicted PTM sites consist of Swiss-Prot ID, modified location, PTM description, HMMER bit score, and HMMER E-value, with tab-delimited.