EPIC-AMP

Prediction

Models' Metrics & Sample run

About

Usage Guide

AMP & MIC Predictor

Enter an amino acid sequence (between 10 and 100 valid amino acids) or upload a FASTA file to predict AMP class and MIC values.

0/100

Select Bacteria for MIC Prediction

Choose one or more bacteria for which you'd like to predict the Minimum Inhibitory Concentration (MIC) of the peptide. MIC prediction is only performed if the sequence is classified as an Antimicrobial Peptide (AMP).

E.coli P. aeruginosa S. aureus K. pneumoniae

Please enter your Email Address for detailed results:*

Estimated processing time: around 30 seconds.

Classification

The prediction will appear here.

Additional Details

The detailed report will appear here.

Classifier Metrics

The following table presents the performance metrics of our antimicrobial peptide classification model. These metrics were obtained on a held-out test set, separate from the training data.

Metric	Value	Description
Accuracy	0.963	The overall correctness of the model, representing the proportion of correctly classified sequences (both AMP and Non-AMP).
Precision	0.964	Out of all sequences predicted as AMPs, the proportion that are truly AMPs. A high precision indicates fewer false positives.
Recall	0.963	Out of all actual AMP sequences, the proportion that the model correctly identifies. A high recall indicates fewer false negatives.
F1-Score	0.963	The harmonic mean of precision and recall, providing a balanced measure of the model's performance.
Validation Accuracy	0.968	The accuracy of the model on a separate validation dataset, used during model development to tune hyperparameters.

Regressor Metrics (MIC Prediction Models)

The Minimum Inhibitory Concentration (MIC) prediction is handled by separate regressor models. Their performance metrics for each bacterium are detailed below:

Bacterium	MSE (log)	MSE	R2	MAE	Pearson	Kendall
E.coli	0.0481	0.4864	0.7023	0.1375	0.8394	0.6725
P. aeruginosa	0.0517	0.5227	0.6864	0.1233	0.8311	0.6922
S.aureus	0.0517	0.4988	0.6828	0.1472	0.8278	0.6536
K. pneumoniae	0.0538	0.4292	0.7416	0.1479	0.8693	0.7194

Model Interpretability

To understand how our model makes predictions, we use SHAP (SHapley Additive exPlanations) values globally and LIME (Local Interpretable Model-agnostic Explanations) locally. SHAP values help assess the overall contribution of each feature to model predictions across the dataset, while LIME explains the features influencing a single, specific prediction.

Global SHAP Plot

The SHAP plot below visualizes the features that most significantly influence the model's predictions *on average across all data*. Positive SHAP values generally contribute to the sequence being classified as AMP, while negative values contribute to Non-AMP classification.

Interpretation of Global SHAP Plot

Based on the SHAP plot, the model's predictions for Antimicrobial Peptides (AMPs) are influenced by a combination of sequence-based, structural, and biophysical features. Here’s a breakdown of the key features and their biological implications:

A. Sequence-Based Features

APAAC13 & APAAC5 (Amphiphilic Pseudo-Amino Acid Composition)

Description: These features encode hydrophobicity, charge, and side-chain properties, reflecting the amphiphilic nature of the peptide.
Interpretation: Higher APAAC values have a positive SHAP impact, indicating they are predictive of AMP activity. This suggests that the model values amphiphilic sequences, which are crucial for disrupting bacterial membranes.

Amino Acid Composition (M, C)

M (Methionine):

Description: Methionine is often associated with structural stability in peptides.

Interpretation: The plot suggests Methionine content has a positive influence, possibly contributing to the stability of AMP candidates.

C (Cysteine):

Description: Cysteine is known for forming disulfide bonds, which can stabilize AMP structures like defensins.

Interpretation: High cysteine content appears to have a positive SHAP impact. This could be because disulfide-stabilized AMPs often exhibit enhanced antimicrobial action.

B. Structural and Biophysical Features

HydrophobicityD3001:

Description: Represents the overall hydrophobicity of the peptide sequence.

Interpretation: Hydrophobicity is shown to be a critical feature. More hydrophobic peptides are strongly favored for AMP classification, aligning with the understanding that hydrophobicity is essential for membrane insertion and disruption.

PolarityD1001:

Description: Measures the polarity of the peptide sequence.

Interpretation: The model considers polarity, likely in balance with hydrophobicity. An optimal AMP efficacy is often linked to having both hydrophobic and polar residues to facilitate membrane interaction and solubility.

Solvent Accessibility (SolventAccessibilityD3001):

Description: Reflects how exposed the residues are to the solvent (water).

Interpretation: Solvent-accessible residues positively contribute to AMP prediction. This might indicate that exposed residues facilitate interaction with bacterial membranes or the surrounding environment.

Charge (ChargeD2001):

Description: Represents the net charge of the peptide.

Interpretation: Charge is a significant positive predictor. Most AMPs are cationic, enabling them to interact with negatively charged bacterial membranes. Higher charge, as indicated by positive SHAP values, typically enhances antimicrobial potency.

PolarizabilityD3001 & NormalizedVDWVD3001:

Description: These features relate to electronic distribution (Polarizability) and steric effects (NormalizedVDWVD) of the peptide.

Interpretation: These properties impact membrane penetration and peptide-membrane interactions. Their positive SHAP values suggest they are important for AMP activity, possibly by influencing how the peptide inserts into and destabilizes membranes.

C. Geary Autocorrelation Features

Geary autocorrelation descriptors reflect spatial properties of peptides, encoding information about the adhesion and distribution of physicochemical properties along the sequence.

GearyAuto_Hydrophobicity30:

Description: Encodes the clustering of hydrophobic residues within a spatial lag of 30.

Interpretation: Higher values of hydrophobicity autocorrelation are positively associated with AMP prediction. This likely reflects the importance of hydrophobic clustering in forming amphipathic structures, such as α-helices, which are common and effective in AMPs.

GearyAuto_Steric30 & GearyAuto_Steric29:

Description: Reflect peptide backbone flexibility and steric properties at spatial lags of 30 and 29.

Interpretation: These features have a positive SHAP impact, suggesting that a certain degree of peptide backbone flexibility may be beneficial for AMP action. Flexibility could enhance the peptide's ability to interact with and adapt to diverse bacterial membrane structures.

GearyAuto_ResidueASA30:

Description: Indicates the autocorrelation of accessible surface area of residues at a spatial lag of 30.

Interpretation: Higher autocorrelation of residue accessible surface area is favored. This could mean that a consistent pattern of residue exposure, possibly with higher exposure of charged or polar residues, improves bacterial targeting or membrane interaction.

Sample Run

You can test the model with various sequences to understand its behavior. Here are a few examples:

Test 1: Long Sequence of 99 AAs (P-AMP) - MEKAALIFIGLLLFSTCTQILAQSCNNDSDCTNLKCATKNIKCEQNKCQCLDERYIRA ISLNTRSPRCNVQSCIDHCKAIGEVIYVCFTYHCYCRKPPM

Test 2: Long Sequence of 99 AAs (Non-AMP) - MKSLLPLAILAALAVAALCYESHESMESYEVSPFTTRRNANTFISPQQRWHAK AQERVRELNKPAQEINREACDDYKLCERYALIYGYNAAYNRYFRQR

Test 3: Short Sequence of 51 AAs (P-AMP) - SLQGGAPNFPQPSQQNGGWQVSPDLGRDDKGNTRGQIEIQNKGKDHDFNAG

Test 4: Short Sequence of 50 AAs (Non-AMP) - MKPLKQKVSITLDEDVIKNLKTLAEECDRSLSQYINLILKEHLKNLDQQ

Test 5: Invalid Characters (eg. X) - MEKAALIFIGLLLFSTCTQILAQSC(XX).

About This Tool

This web application provides a user-friendly interface for classifying amino acid sequences as either Antimicrobial Peptides (AMPs) or Non-Antimicrobial Peptides (Non-AMPs). AMPs are crucial components of the innate immune system, offering defense against a wide range of pathogens.

Model Selection Criteria

To identify the most effective model, our team rigorously evaluated over 225 combinations of feature extraction and selection methods across four different machine learning models for each target organism. The evaluation included both classification and regression tasks, with a focus on predicting antimicrobial activity and estimating minimum inhibitory concentration (MIC) values. The final model was selected based on the following criteria:

High Accuracy, F1-score, and Validation Accuracy: The model achieved excellent performance metrics on a held-out test set, demonstrating its ability to generalize to unseen data.

Robustness to Sequence Length Variations: The model performs consistently well on sequences of varying lengths, within the specified range (10-100 amino acids).

Generalization Ability: The model was trained and evaluated on diverse datasets to ensure its ability to classify a broad range of AMP sequences.

Robust regression capability: Assessed using Mean Squared Error (MSE), R-squared (R²), Pearson correlation, and Kendall’s tau, the model demonstrated strong agreement between predicted and true MIC values.

Intended Use

This tool is intended for research and educational purposes. It can be a valuable resource for researchers studying antimicrobial peptides and developing new therapeutic strategies. However, it is not a replacement for laboratory-based experimental validation.

Our Team

Ali Magdi - Computational Biologist

Ahmed Amr - Computational Biologist

Omar Loay - Molecular Biologist

Prof. Eman Badr - Full Professor and Director of BCBU

Acknowledgements

We extend our gratitude to the following organizations for their support:

Bioinformatics and Computational Biology Unit at Zewail City.

The Centre for Genomics at Zewail City.

Contact

For questions, inquiries, or feedback, please reach out to us at: epicamp.sup@gmail.com

Usage Guide

Follow these steps to use the AMP Classifier:

Step 1: Prepare Your Sequence

You can either enter the amino acid sequence directly or upload a FASTA file.

Ensure your sequence contains only standard amino acid characters (ACDEFGHIKLMNPQRSTVWY).

The sequence length must be between 10 and 100 amino acids.

Step 2: Input the Sequence & Email

Direct Input: Type or paste your sequence into the text area provided.

FASTA File Upload: Click the "Choose File" button and select your .fasta, .fa, or .fna file.

The character count will update as you type or upload a sequence.

Enter Your Email: Input your email address in the "Your Email Address (for Report)" field. This is required to receive the PDF report.

Step 3: Analyze the Sequence

Click the "Submit" button.

A message indicating the estimated processing time will be visible above the buttons, and the button itself will show elapsed time.

Step 4: View the Results

Once the analysis is complete, the prediction (AMP or Non-AMP) will be displayed in the "AMP Classification" box.

A link to "Download Detailed Report (PDF)" will appear in the "Additional Details" box if the report was generated successfully.

The PDF report will also be automatically sent to the email address you provided. Check your inbox and spam folder. The status of the email will be shown below the email input field.

Step 5: Clear the Input

To analyze another sequence, click the "Clear Input" button. This will reset the input fields and results.

Troubleshooting

Invalid Characters: If you see an error message about invalid characters, double-check that your sequence contains only standard amino acid characters.

Sequence Length: Ensure your sequence is between 10 and 100 characters long.

Invalid FASTA Format: If you encounter an error with a FASTA file, make sure the file is correctly formatted.

Email Not Received: Ensure you entered your email address correctly. Also, check your spam/junk folder. If the email service encounters an issue, the "Download Detailed Report (PDF)" link should still be available if the report itself was generated.