ProteinMPNN¶
Predicts possible amino acid sequences for a given protein backbone, returning generated and top-scoring sequences.
Quick Start¶
- Prepare your input PDB structure(s).
- Set the desired model parameters (e.g., model version, number of sequences).
- Connect the node and run to generate protein sequences.
Setup Guide¶
1. Prepare Input Data¶
- Ensure your PDB input is a dictionary
{pdb_id: pdb_content}
. - Optionally, specify chains to design or amino acids to omit.
2. Configure Node Parameters¶
- Select model version and adjust generation settings (e.g., batch size, temperature).
- Set reproducibility seed if needed.
Basic Usage¶
Sequence Design for Protein Backbones¶
- Generate diverse amino acid sequences for a given protein structure.
- Restrict design to specific chains or omit certain amino acids.
- Use batch mode for multiple structures.
Configuration¶
Required Inputs¶
Field | Description | Type | Example |
---|---|---|---|
pdb | Structures to run through the model. | PDB | {"A": "..."} |
ca_only | Whether to parse CA-only structures and use CA-only models. | BOOLEAN | False |
model_name | ProteinMPNN model name. | COMBO | v_48_020 |
use_soluble_model | Whether to load weights trained on soluble proteins only. | BOOLEAN | False |
backbone_noise | Std. dev. of Gaussian noise to add to backbone atoms. | FLOAT | 0.0 |
num_seq_per_target | Number of sequences to generate per protein. | INT | 10 |
num_best_seq_per_target | Number of top-scoring sequences to return. | INT | 1 |
batch_size | Batch size. Reduce if running out of GPU memory. | INT | 1 |
max_length | Max sequence length. | INT | 200000 |
sampling_temp | Temperatures for amino acids (comma separated). | STRING | 0.1,0.2,0.3 |
omit_amino_acids | Amino acids to omit (comma separated). | STRING | X |
chains_to_design | Chains to design (comma separated, empty for all). | STRING | A,B |
seed | Base seed value for randomness. | INT | 42 |
mode | Mode to run the node in (MOCK, PROD, TEST). | COMBO | PROD |
Optional Inputs¶
None
Outputs¶
Field | Description | Example |
---|---|---|
seqs.fasta | All generated sequences. | >A_seq_0\n... |
best_seqs.fasta | Top best generated sequences. | >A_best_seq_0\n... |
Best Practices¶
Sequence Generation¶
- Use higher sampling temperatures for more diverse sequences.
- Adjust batch size based on available GPU memory.
Model Selection¶
- Choose the model version and soluble weights according to your protein type.
- For CA-only structures, ensure the model supports CA-only mode.
Troubleshooting¶
Common Issues¶
- Error: CA-SolubleMPNN model is not available yet.: Set either
ca_only
oruse_soluble_model
to False. - Error: CA-v_48_030 model is not available yet.: Use a different model or set
ca_only
to False. - num_best_seq_per_target > num_seq_per_target: Ensure the number of best sequences does not exceed total generated.
Need Help?¶
- ProteinMPNN GitHub
- Contact support for further assistance.