Skip to content

ProteinMPNN

Generates protein sequences conditioned on input protein backbones using the ProteinMPNN family of models. Supports full-atom (vanilla), CA-only, and soluble-protein–specialized variants with controls for sampling temperature, backbone noise, chain targeting, and reproducibility. Returns all generated sequences in a single FASTA output.
Preview

Usage

Use this node to redesign sequences for one or more input protein structures. Provide PDB backbones and select a ProteinMPNN variant that matches your structure type (vanilla for full-atom, ca_only for CA-only PDBs, soluble for soluble-focused design). Tune sampling temperature for diversity, optionally omit specified amino acids, and target specific chains. Integrate after structure generation/refinement or for sequence optimization pipelines; the node outputs a consolidated FASTA with all sequences.

Inputs

FieldRequiredTypeDescriptionExample
pdbTruePDBOne or more protein structures to design against. Expects a mapping of names to PDB contents; each entry is treated as an independent target.{"my_target": "ATOM ... END"}
model_nameTrueSTRINGProteinMPNN model variant to use. Vanilla models use all atoms, CA-only models accept CA-only structures, and soluble models bias toward soluble proteins.vanilla_v_48_020
backbone_noiseTrueFLOATStandard deviation of Gaussian noise added to backbone atoms to promote sequence diversity and robustness. Range 0.0–1.0.0.1
num_seq_per_targetTrueINTNumber of sequences to generate per input structure.10
max_lengthTrueINTMaximum allowed sequence length. Acts as a guardrail for very large designs.200000
sampling_tempTrueSTRINGComma-separated temperature schedule for sampling amino acids (e.g., "0.2, 0.25, 0.5"). Higher values increase diversity. At least one value is required.0.1
omit_amino_acidsTrueSTRINGComma-separated list of amino acids to prohibit during design (e.g., "A,C"). Use single-letter codes.X
chain_listTrueSTRINGComma-separated list of chain IDs to design. Leave empty to design all chains.A,B
seedTrueINTBase random seed for reproducibility. Unsigned 32-bit range.42
modeTrueSTRINGExecution mode. MOCK returns bundled example results, PROD runs the managed service, TEST runs quickly with minimal outputs.PROD

Outputs

FieldTypeDescriptionExample
seqs.fastaFASTAFASTA-formatted sequences generated for all input structures and requested samples.>my_target_design_0 MADEUPSEQUENCE... >my_target_design_1 MADEUPSEQUENCE...

Important Notes

  • Model name must include a valid version suffix (e.g., v_48_002, v_48_010, v_48_020, v_48_030) and a supported prefix: vanilla, ca_only, or soluble.
  • If mode is TEST, the node forces num_seq_per_target to 1 to ensure fast execution.
  • Sampling temperature cannot be empty; at least one numeric value is required.
  • Set chain_list to target only specific chains; leave empty to design all chains present in the PDB.
  • Use ca_only models only with CA-only PDB inputs; use vanilla for full-atom PDBs.
  • Large pdb sets scale the execution timeout with the number of targets.

Troubleshooting

  • Invalid model name error: Ensure model_name matches the pattern v_48 where prefix is one of vanilla, ca_only, or soluble.
  • Sampling temperature empty: Provide at least one value (e.g., "0.1"); use commas to specify a schedule.
  • Unexpected residues or omissions: Verify omit_amino_acids is a comma-separated list of one-letter codes (e.g., "A,C").
  • Design limited to some chains only: Confirm chain_list is set correctly (e.g., "A,B") or left empty to include all chains.
  • Timeouts or long runtimes: Reduce num_seq_per_target, simplify sampling_temp, or submit fewer pdb targets at once.