Skip to content

ProteinMPNN

Generates protein sequences conditioned on an input protein backbone structure using the ProteinMPNN family of models. Supports vanilla (all-atom), CA-only, and soluble-protein variants, with configurable sampling temperature, amino-acid omissions, chain targeting, and backbone noise. Can operate in MOCK (precomputed), PROD (service-backed), or TEST (fast minimal) modes.

Usage

Use this node to design sequences for one or more input PDB backbones. Choose the appropriate ProteinMPNN variant (vanilla, ca_only, or soluble) and set sampling temperatures to control diversity. Provide optional constraints such as omitting specific amino acids or restricting design to certain chains. In larger workflows, feed a PDB dictionary into this node and route the resulting FASTA sequences to downstream evaluation or structure prediction.

Inputs

FieldRequiredTypeDescriptionExample
pdbTruePDBDictionary of PDB structures to design against. Keys are structure identifiers (e.g., filenames or IDs), values are PDB text content.{"targetA": "ATOM ...\nEND\n", "targetB": "ATOM ...\nEND\n"}
model_nameTrueSTRINGProteinMPNN model variant to use. Vanilla models use all atoms; ca_only models expect CA-only structures; soluble models are trained on soluble proteins.vanilla_v_48_020
backbone_noiseTrueFLOATStandard deviation of Gaussian noise added to backbone atoms to encourage robustness and diversity.0.1
num_seq_per_targetTrueINTNumber of sequences to generate per input structure.10
max_lengthTrueINTMaximum sequence length allowed for generated designs.200000
sampling_tempTrueSTRINGComma-separated temperatures controlling sampling diversity for amino acids. Higher values yield more diverse sequences.0.1,0.15,0.2
omit_amino_acidsTrueSTRINGComma-separated list of one-letter amino acid codes to exclude from generation.C,M,W
chain_listTrueSTRINGComma-separated list of chain IDs to design. If empty, all chains are designed.A,B
seedTrueINTBase random seed for reproducibility across sampling.42
modeTrueSTRINGExecution mode. MOCK: returns predefined mock outputs. PROD: runs the live service. TEST: runs quickly with minimal settings.PROD

Outputs

FieldTypeDescriptionExample
seqs.fastaFASTAGenerated protein sequences in FASTA format for all input structures and samples.>targetA_design_0\nMKT...\n>targetA_design_1\nGAS...\n>targetB_design_0\nVLK...

Important Notes

  • Model selection: The model_name must be one of the supported variants (e.g., vanilla_v_48_002/010/020/030, ca_only_v_48_002/010/020, soluble_v_48_002/010/020/030). CA-only models require CA-only structures.
  • Sampling temperatures: sampling_temp must not be empty. Provide a comma-separated list (e.g., 0.1,0.15,0.2).
  • Chain targeting: Leave chain_list empty to design all chains; otherwise, specify chains as a comma-separated list (e.g., A,B).
  • Omitted residues: omit_amino_acids uses one-letter codes; listed residues will not be sampled.
  • TEST mode behavior: In TEST mode, num_seq_per_target is forced to 1 to speed up runs.
  • Timeout scaling: Service timeout scales with the number of input structures; large pdb sets will take longer.
  • Backbone noise: Increasing backbone_noise can improve robustness but may affect fit to the original backbone.
  • Sequence length: Designs exceeding max_length will be disallowed.

Troubleshooting

  • Invalid model name: If you see an error about model_name, ensure it matches one of the provided options and formatting (e.g., vanilla_v_48_020).
  • Empty sampling_temp: Provide at least one temperature (e.g., 0.1). Comma separation is required for multiple values.
  • Incorrect chain IDs: If designs target wrong chains or fail, verify chain_list matches the chains present in the PDBs. Leave empty to design all.
  • CA-only mismatch: Using a ca_only model with an all-atom PDB (or vice versa) can cause failures. Pick the model that matches your structure type.
  • Service timeout or no output: For large pdb sets or high num_seq_per_target, increase patience or reduce inputs. Try TEST mode to validate configuration quickly.
  • Unexpected amino acids: If disallowed residues appear, confirm omit_amino_acids uses correct one-letter codes separated by commas.
  • Fewer sequences than expected: In TEST mode, output is intentionally limited to 1 per target. In other modes, check max_length and constraints that may limit sampling.