Skip to content

ProteinMPNN

Designs protein sequences for given 3D backbone structures using ProteinMPNN. Supports CA-only parsing, soluble-only weights, temperature-controlled sampling, residue omission, chain-specific design, batching, and multiple model variants. Returns all generated sequences and the top-scoring subset as FASTA.
Preview

Usage

Use this node to generate and rank amino acid sequences compatible with input protein structures. Provide one or more PDB structures, choose the ProteinMPNN model and options (e.g., sampling temperatures, omit residues, chains to design), then integrate the output FASTA into downstream evaluation or structure prediction workflows.

Inputs

FieldRequiredTypeDescriptionExample
pdbTruePDBProtein backbone structures to design sequences against. Typically a collection mapping structure names to PDB-formatted text.{'my_protein_A.pdb': 'ATOM ...', 'complex_AB.pdb': 'ATOM ...'}
ca_onlyTrueBOOLEANIf true, parse CA-only structures and use CA-only models.False
model_nameTruev_48_002 \| v_48_010 \| v_48_020 \| v_48_030ProteinMPNN model variant to use.v_48_020
use_soluble_modelTrueBOOLEANIf true, use weights trained on soluble proteins only.False
backbone_noiseTrueFLOATStandard deviation of Gaussian noise added to backbone atoms to increase sequence diversity (0.0–1.0).0.1
num_seq_per_targetTrueINTHow many sequences to generate per input structure.10
num_best_seq_per_targetTrueINTHow many top-scoring sequences to select per input after generation. Must be <= num_seq_per_target.1
batch_sizeTrueINTBatch size for generation. Reduce if you encounter GPU memory limits.1
max_lengthTrueINTMaximum total sequence length allowed.200000
sampling_tempTrueSTRINGComma-separated list of temperatures controlling sampling diversity (e.g., 0.2,0.25,0.5). Higher values increase diversity.0.1,0.15,0.2,0.25,0.3
omit_amino_acidsTrueSTRINGComma-separated list of one-letter amino acid codes to exclude from generation.C,M,W
chains_to_designTrueSTRINGComma-separated list of chain identifiers to design. Leave empty to design all chains.A,B
seedTrueINTBase random seed for reproducibility.42
modeTrueMOCK \| PROD \| TESTExecution mode. MOCK returns predefined sequences for quick demos, TEST minimizes generation for speed, PROD runs full generation.PROD

Outputs

FieldTypeDescriptionExample
seqs.fastaFASTAAll generated sequences across temperatures and samples in FASTA format.>my_protein_A_0 ACDEFGHIK... >my_protein_A_1 ...
best_seqs.fastaFASTATop-scoring subset of generated sequences per input structure in FASTA format.>my_protein_A_best_0 ACDEFGHIK...

Important Notes

  • CA-only and soluble-only configurations cannot be used together. Set either ca_only or use_soluble_model to false.
  • The model v_48_030 is not available for CA-only. If ca_only is true, choose another model.
  • num_best_seq_per_target must be less than or equal to num_seq_per_target.
  • In TEST mode, num_seq_per_target is forced to 1 to speed up runs.
  • sampling_temp, omit_amino_acids, and chains_to_design accept comma-separated inputs; they are internally normalized for the service.
  • Timeout scales with the number of input structures; very large batches or long sequences can require more time.
  • Batch size influences memory usage; lower it if encountering resource limits.

Troubleshooting

  • Error: CA-SolubleMPNN model is not available yet. Set either ca_only or use_soluble_model to False. Fix by disabling one of the two options.
  • Error: CA-v_48_030 model is not available yet. Set ca_only to False or pick another model_name. Fix by selecting v_48_002, v_48_010, or v_48_020.
  • Error: Expected num_best_seq_per_target <= num_seq_per_target. Fix by reducing num_best_seq_per_target or increasing num_seq_per_target.
  • Generated sequences seem sparse or identical: increase sampling_temp values or backbone_noise to boost diversity.
  • Out-of-memory or long runtimes: reduce batch_size, decrease num_seq_per_target, or limit max_length.
  • Unexpected or invalid list inputs: ensure sampling_temp, omit_amino_acids, and chains_to_design are comma-separated without extra spaces; e.g., 0.1,0.2 or A,B.

Example Pipelines

Example
Example