Skip to content

ProteinMPNN

Generates protein sequences conditioned on input protein backbones using the ProteinMPNN family of models. Supports vanilla, CA-only, and soluble-trained variants, with controls for backbone noise, sampling temperature, and residue/chain constraints. Can run against a remote SaaS service, use mock data, or a test mode that minimizes runtime.
Preview

Usage

Use this node when you have one or more protein structures and want to design compatible amino acid sequences. Provide PDB structures, choose a ProteinMPNN model variant, and tune parameters like sampling temperature and number of sequences per target. Integrate it after structure preparation and before downstream evaluation or scoring nodes that analyze or filter designed sequences.

Inputs

FieldRequiredTypeDescriptionExample
pdbTruePDBProtein structure(s) to design against. Provide one or more PDB entries to generate sequences conditioned on their backbones.{'design_1': '', 'design_2': ''}
model_nameTrueSTRINGProteinMPNN model variant to use. Vanilla uses all atoms; CA-only parses CA-only structures; soluble uses weights trained on soluble proteins. Must match one of the supported names.vanilla_v_48_020
backbone_noiseTrueFLOATStandard deviation of Gaussian noise added to backbone atoms during design. Higher values can increase diversity.0.1
num_seq_per_targetTrueINTNumber of sequences to generate per input structure.10
max_lengthTrueINTMaximum total sequence length permitted for processing.200000
sampling_tempTrueSTRINGComma-separated temperatures controlling sampling for amino acids. Higher values increase diversity. Will be parsed and applied internally.0.1,0.15,0.2,0.25,0.3
omit_amino_acidsTrueSTRINGComma-separated amino acids to forbid in generated sequences. Use single-letter codes.C,M,W
chain_listTrueSTRINGComma-separated list of chains to design. Leave empty to design all chains.A,B
seedTrueINTBase seed used for stochastic components to enable reproducibility.42
modeTrueSTRINGExecution mode. MOCK returns predefined results, PROD calls the live service, TEST minimizes runtime by reducing generated sequences.PROD

Outputs

FieldTypeDescriptionExample
seqs.fastaFASTAAll generated sequences in FASTA format for all provided structures and temperatures.>design_1_temp0.1_seq0 ACDEFGHIKLMNPQRSTVWY >design_1_temp0.2_seq1 ACDFFGHIKLMNPQKSTVWY

Important Notes

  • Model name must follow supported patterns, e.g., vanilla_v_48_002, ca_only_v_48_010, soluble_v_48_030. Invalid names will cause an error.
  • In TEST mode, num_seq_per_target is forced to 1 to speed up execution.
  • Sampling temperature cannot be empty; provide at least one value (e.g., 0.1).
  • Leave chain_list empty to design all chains; otherwise specify chains as a comma-separated list (e.g., A,B).
  • Use omit_amino_acids with single-letter codes (e.g., C,M). These residues will not appear in outputs.
  • Backbone noise adds Gaussian jitter to backbone atoms; set to 0.0 to disable.
  • MOCK mode returns deterministic, precomputed data useful for UI testing and workflow prototyping.

Troubleshooting

  • Invalid model name: Ensure model_name is one of the supported options (e.g., vanilla_v_48_020).
  • Empty sampling_temp: Provide a comma-separated list such as 0.1 or 0.1,0.2,0.25.
  • Unexpected amino acids appearing: Verify omit_amino_acids uses single-letter codes separated by commas and contains no spaces.
  • No sequences for specific chains: If you intended to design only some chains, ensure chain_list is set (e.g., A,B). Leave empty to target all chains.
  • Too few sequences in TEST mode: TEST forces num_seq_per_target to 1; switch to PROD or MOCK and adjust num_seq_per_target.
  • Input size too large: Reduce max_length or split large targets if the combined sequence length approaches limits.