ProtDCal-Suite

Protein codification and applications

ProtDCal suite is a web interface containing a set of tools for studying proteins.

The main module is a software, ProtDCal, useful for data mining analysis of protein data. ProtDCal allows generating a machine-learning-friendly vector from the structural information of each protein. Thus, these vectors can be used as input for pattern recognition techniques, to develop models linking structure and activity/function data.

Additional modules provide access to tools for predicting the likeliness of protein-protein and protein-peptide interactions, for identifying enzymatic proteins and for the prediction of lysine methylation sites. Next developments will include the design of antibacterial peptides and the prediction of post translational modifications.

In this suite, we will continue incorporating new applications based on ProtDCal features. Thus, we also kindly encourage other authors who had employed ProtDCal's descriptors to develop new methods, to contact us for implementing their algorithms in the suite.


ProtDCal

This tool uses a divide-and-conquer methodology based on extracting properties from diverse groups of residues and aggregating each of them into particular descriptors. This way, a large features vector is produced, for every structure or sequence, whose elements balance local and global characteristics of the protein. Such vector can be effectively used for machine learning analyses by means of proper attribute selection and modeling techniques (SVM, Random Forest, ANN, etc).

The code is implemented in Java, which makes it ideal for a combination with powerful machine learning packages such as Weka.


Protein analysis tools

ABP-Finder

Identifies antibacterial peptides and the Gram-staining type of targeted bacteria

PPI-Affinity

Predicts the binding affinity of protein-peptide and protein-protein complexes.

Permits protein engineering

PPI-Detect

Predicts the interaction likelihood of protein-protein and protein-peptide pairs

Enzyme Identifier

Identifies enzymes from amino acid sequences (FASTA) and from 3D structures (PDB)

Pred-NGlyco

Predicts N-glycosylation sites

MethylSight

Predicts lysine methylation sites in the human proteome