ProteinMPNN¶

Generates protein sequences conditioned on input protein backbones using the ProteinMPNN family of models. Supports full-atom (vanilla), CA-only, and soluble-protein–specialized variants with controls for sampling temperature, backbone noise, chain targeting, and reproducibility. Returns all generated sequences in a single FASTA output.

Usage¶

Use this node to redesign sequences for one or more input protein structures. Provide PDB backbones and select a ProteinMPNN variant that matches your structure type (vanilla for full-atom, ca_only for CA-only PDBs, soluble for soluble-focused design). Tune sampling temperature for diversity, optionally omit specified amino acids, and target specific chains. Integrate after structure generation/refinement or for sequence optimization pipelines; the node outputs a consolidated FASTA with all sequences.

Inputs¶

Field	Required	Type	Description	Example
pdb	True	PDB	One or more protein structures to design against. Expects a mapping of names to PDB contents; each entry is treated as an independent target.	{"my_target": "ATOM ... END"}
model_name	True	STRING	ProteinMPNN model variant to use. Vanilla models use all atoms, CA-only models accept CA-only structures, and soluble models bias toward soluble proteins.	vanilla_v_48_020
backbone_noise	True	FLOAT	Standard deviation of Gaussian noise added to backbone atoms to promote sequence diversity and robustness. Range 0.0–1.0.	0.1
num_seq_per_target	True	INT	Number of sequences to generate per input structure.	10
max_length	True	INT	Maximum allowed sequence length. Acts as a guardrail for very large designs.	200000
sampling_temp	True	STRING	Comma-separated temperature schedule for sampling amino acids (e.g., "0.2, 0.25, 0.5"). Higher values increase diversity. At least one value is required.	0.1
omit_amino_acids	True	STRING	Comma-separated list of amino acids to prohibit during design (e.g., "A,C"). Use single-letter codes.	X
chain_list	True	STRING	Comma-separated list of chain IDs to design. Leave empty to design all chains.	A,B
seed	True	INT	Base random seed for reproducibility. Unsigned 32-bit range.	42
mode	True	STRING	Execution mode. MOCK returns bundled example results, PROD runs the managed service, TEST runs quickly with minimal outputs.	PROD

Outputs¶

Field	Type	Description	Example
seqs.fasta	FASTA	FASTA-formatted sequences generated for all input structures and requested samples.	>my_target_design_0 MADEUPSEQUENCE... >my_target_design_1 MADEUPSEQUENCE...

Important Notes¶

Model name must include a valid version suffix (e.g., v_48_002, v_48_010, v_48_020, v_48_030) and a supported prefix: vanilla, ca_only, or soluble.
If mode is TEST, the node forces num_seq_per_target to 1 to ensure fast execution.
Sampling temperature cannot be empty; at least one numeric value is required.
Set chain_list to target only specific chains; leave empty to design all chains present in the PDB.
Use ca_only models only with CA-only PDB inputs; use vanilla for full-atom PDBs.
Large pdb sets scale the execution timeout with the number of targets.

Troubleshooting¶

Invalid model name error: Ensure model_name matches the pattern v_48 where prefix is one of vanilla, ca_only, or soluble.
Sampling temperature empty: Provide at least one value (e.g., "0.1"); use commas to specify a schedule.
Unexpected residues or omissions: Verify omit_amino_acids is a comma-separated list of one-letter codes (e.g., "A,C").
Design limited to some chains only: Confirm chain_list is set correctly (e.g., "A,B") or left empty to include all chains.
Timeouts or long runtimes: Reduce num_seq_per_target, simplify sampling_temp, or submit fewer pdb targets at once.