PPI-Affinity

Help content

PPI-Affinity is a tool that leverages support vector machine (SVM) predictors of binding affinity to screen datasets of protein – protein/peptide complexes, as well as to generate and rank mutants of a given structure.

MODULES

This module is the direct application of the prediction models to a set of PDB files with protein-protein/peptide complexes provided by the user. The module allows the characterization, based on their binding affinity, of complexes whose structures were obtained from external sources.

Input parameters

Parameter	Description
Job name	A name to identify the job.
Email	An email address to receive job-related notifications.
Structure(s)	The three-dimensional structure(s) containing at least two protein sequences.
Ligand (protein or peptide)	The chain ID of the protein ligand in the structure(s). The ligand must be the same for all the uploaded PDB files. The ligand will be considered a peptide if it is formed by less than 30 residues, otherwise it will be considered a protein. Accordingly, the binding affinity will be estimated with the protein-peptide or protein-protein models of PPI-Affinity.
Output order	Indicates the output order according to the binding affinity values.
Applicability Domain (AD)	Filters the output complexes according to their projection into the applicability domain (AD) of the PPI-Affinity models. AD 1st-99th: The case is Out of the AD when at least a descriptor value is outside the range defined by the 1st and the 99th percentiles of the training data. AD 100th: The case is Out of the AD when at least a descriptor value is outside the range of the training data for the PPI-Affinity model.

Output

The output of the module is a table with the following information (example values shown below):

#	Instance	Score (kcal/mol)	AD 1st-99th	AD 100th
1	1an1	-12.0	In	In
2	1jtd	-11.4	In	In
3	1t6b	-10.1	In	In
4	1t63	-9.5	In	In
5	1sb0	-8.1	Out	In

where:

#: Position of the structure in the ranking
Instance: Name of the instance (protein-protein/peptide structure)
Score (kcal/mol): The binding free energy of the complex (kcal/mol)
AD 1st-99th: Projection of the instance into the applicability domain 1st-99th
AD 100th: Projection of the instance into the applicability domain 100th

The output example corresponds to a job with five structures. Each PDB file contains the chains A and B (ligand). The selected output order was "The best binders first".

Download structures

This module follows the same purpose as Module #1, however instead of receiving coordinate files with the complexes of interest, it only requires a template file (in PDB format) and a list of amino acid sequences (in FASTA format). Then, the server builds a structural homology model for each sequence in the list using MODELLER. Subsequently, all the complexes are scored based on the binding affinity predictor. This module is particularly suitable for screening structurally similar complexes for which no structures are reported.

Input parameters

Parameter	Description
Job name	A name to identify the job.
Email	An email address to receive job-related notifications.
Modeller license key	A MODELLER license key must be provided to use this module. For further information, please consult the MODELLER registration page.
Structure	The structure (a PDB file containing at least two chains) to be used as template to create homology models.
Amino acids sequences	A file in FASTA format, with the amino acids sequences of the chains contained in the template structure. FASTA headers must contain only the chain ID of the corresponding sequence. For example, the sequence file of a structure with two chains A and B could be as follows: >A NFLCVVYVAVHVIYTVNLYSSVWALTS >B ILAFLLAISLDRYL This option is suitable for those template structures that contain missing residues. The sequences must be listed in the same order as they appear in the PDB file.
Ligand (protein or peptide)	The chain ID of the ligand in the structure(s). The ligand will be considered a peptide if it is formed by less than 30 residues, otherwise it will be considered a protein. Accordingly, the binding affinity for each complex will be estimated with the protein-peptide or protein-protein models of PPI-Affinity.
Mutants	A file (in FASTA format) containing the candidate derivatives of the ligand.
Number of output derivatives	Number of derivatives to output from the list of candidates. If the field is left empty all the output candidates will be returned.
Output order	The order of the output derivatives according to their binding affinity to the target.
Applicability Domain (AD)	Filters the output complexes according to their projection into the applicability domain (AD) of the PPI-Affinity models. AD 1st-99th: The case is Out of the AD when at least a descriptor value is outside the range defined by the 1st and the 99th percentiles of the training data. AD 100th: The case is Out of the AD when at least a descriptor value is outside the range of the training data for the PPI-Affinity model.

Output

The output of the module is a table with the following information (example values shown below):

#	Instance	Score (kcal/mol)	AD 1st-99th	AD 100th
1	1aqcB.pdb-1aqc_III_D_1.pdb	-7.6	In	In
2	derivative_2.pdb	-7.7	In	In
3	derivative_5.pdb	-7.7	In	In
4	derivative_4.pdb	-7.6	In	In
5	derivative_3.pdb	-7.4	In	In
6	derivative_1.pdb	-7.2	In	In

where:

#: Position of the derivative in the ranking
Instance: Name of the derivative
Score (kcal/mol): The binding free energy of the built complex (kcal/mol)
AD 1st-99th: Projection of the instance into the applicability domain 1st-99th
AD 100th: Projection of the instance into the applicability domain 100th

The output example corresponds to a job with the file 1aqcB.pdb-1aqc_III_D_1.pdb as the template structure. The complex contains the chains A and B (ligand). A FASTA file with the amino acids sequences of five derivatives was provided by the user. The selected output order was "The best binders first". Note that the score of the reference structure will appear at the top of the ranking regardless of its value.

Download example

The module allows for the automatic generation and scoring of mutations at the interface of the complexes with the aim of optimizing the affinity of the complex.

Input parameters

Parameter	Description
Job name	A name to identify the job.
Email	An email address to receive job-related notifications.
Modeller license key	A MODELLER license key must be provided to use this module. For further information, please consult the MODELLER registration page.
Structure	The structure (a PDB file containing at least two chains) to be used as template to create homology models.
Amino acids sequences	A file in FASTA format, with the amino acids sequences of the chains contained in the template structure. FASTA headers must contain only the chain ID of the corresponding sequence. For example, the sequence file of a structure with two chains A and B could be as follows: >A NFLCVVYVAVHVIYTVNLYSSVWALTS >B ILAFLLAISLDRYL This option is suitable for those template structures that contain missing residues. The sequences must be listed in the same order as they appear in the PDB file.
Ligand (protein or peptide)	After specifying the structure, the chain IDs contained on it will be listed as: chain ID (number of residues), e.g., a structure with two chains: A (16 aa) and B (319 aa). From the showed list, the user must select the sequence to optimize. The selected ligand will be treated as a peptide if it contains less than 30 residues, otherwise will treated as a protein. Accordingly, the binding affinity will be estimated with the protein-peptide or protein-protein model of PPI-Affinity.
Number of derivatives	Number of derivatives (of the amino acid sequence of the ligand) to be generated (maximum = 10000).
Modifications	Maximum number of modifications per derivative. A modification consists of the deletion or mutation of an amino acid in the sequence, whose event relies on deletion/mutation probabilities. This means that the final number of modifications per derivative shall be a number between 0 and the number specified in this field.
Molecular weight	The maximum molecular weight of each derivative. If the value = 0, the molecular weight will not be analysed, otherwise the molecular weight of the output derivatives will not exceed the specified value.
Deletion probability	The probability of deleting a residue (default value = 0.5).
Mutation probability	The probability of mutating a residue (default value = 0.5).
Type of mutations	The types of mutations are defined by the next groups: Conservative: Polar = "NCQHSTG" Acid = "DE" Basic = "KR" Non-polar = "AILMPV" Aromatic = "WYF" Conservative extended: Polar extended = "NCQHSTGDEKR" Non-polar extended = "AILMPVWYF" Unrestrained: All residues = "NCQHSTGDEKRAILMPVWYF"
Modification probability per amino acid	Mutation/deletion probabilities per amino acid. Value = 0 indicates that the residue will not change over all the generated derivatives. Value = 1 indicates that changes in the residue are desirable. Note that only residues with at least one contact with the target chain are showed. A residue is in contact with the target protein if it has at least one residue at a distance < 10 Å, measured between carbon atoms (CA), of any residue of the target chain.
Number of output derivatives	Number of desirable derivatives to output from the final pool of derivatives. If the field is left empty all the output derivatives will be returned.
Output order	Indicates the order of the output derivatives according to their binding affinity to the target.
Applicability Domain (AD)	Filters the output complexes according to their projection into the applicability domain (AD) of the PPI-Affinity model. AD 1st-99th: The case is Out of the AD when at least a descriptor value is outside the range defined by the 1st and the 99th percentiles of the training data. AD 100th: The case is Out of the AD when at least a descriptor value is outside the range of the training data for the PPI-Affinity model.

Output

The output of the module is a table with the following information (example values shown below):

#	Instance	Score (kcal/mol)	AD 1st-99th	AD 100th
1	1aqcB.pdb-1aqc_III_D_1.pdb	-7.6	In	In
2	derivative_2.pdb	-8.1	Out	In
3	derivative_1.pdb	-7.9	Out	In
4	derivative_4.pdb	-7.5	Out	In
5	derivative_3.pdb	-7.2	Out	In
6	derivative_5.pdb	-7.1	Out	In

where:

#: Position of the derivative in the ranking
Instance: Name of the derivative
Score (kcal/mol): The binding free energy of the built complex (kcal/mol)
AD 1st-99th: Projection of the instance into the applicability domain 1st-99th
AD 100th: Projection of the instance into the applicability domain 100th

The output example corresponds to a job with the file 1aqcB.pdb-1aqc_III_D_1.pdb as the template structure. The complex contains the chains A and B (ligand). Five derivatives were generated, built and scored with the Protein Engineering module. The sequences of the derivatives were created with the default values of the tool. The selected output order was "The best binders first". Note that the score of the reference structure will appear at the top of the ranking regardless of its value.

Download example

General notes:

If the job was completed without errors but the output is empty: this possibly means that the option to filter the cases according to the Applicability Domain (AD) was selected and all the cases were found to be out of the AD of the models.

PPI-Affinity

A tool to predict and optimize the binding affinity of protein-peptide and protein-protein complexes

Help content

MODULES

Module #1: Binding Affinity predictor

Input parameters

Output

Module #2: Build & Predict

Input parameters

Output

Module #3: Protein Engineering

Input parameters

Output