ProteinMPNN¶

Designs protein sequences for given 3D backbone structures using ProteinMPNN. Supports CA-only parsing, soluble-only weights, temperature-controlled sampling, residue omission, chain-specific design, batching, and multiple model variants. Returns all generated sequences and the top-scoring subset as FASTA.

Usage¶

Use this node to generate and rank amino acid sequences compatible with input protein structures. Provide one or more PDB structures, choose the ProteinMPNN model and options (e.g., sampling temperatures, omit residues, chains to design), then integrate the output FASTA into downstream evaluation or structure prediction workflows.

Inputs¶

Field	Required	Type	Description	Example
pdb	True	PDB	Protein backbone structures to design sequences against. Typically a collection mapping structure names to PDB-formatted text.	{'my_protein_A.pdb': 'ATOM ...', 'complex_AB.pdb': 'ATOM ...'}
ca_only	True	BOOLEAN	If true, parse CA-only structures and use CA-only models.	False
model_name	True	v_48_002 \\| v_48_010 \\| v_48_020 \\| v_48_030	ProteinMPNN model variant to use.	v_48_020
use_soluble_model	True	BOOLEAN	If true, use weights trained on soluble proteins only.	False
backbone_noise	True	FLOAT	Standard deviation of Gaussian noise added to backbone atoms to increase sequence diversity (0.0–1.0).	0.1
num_seq_per_target	True	INT	How many sequences to generate per input structure.	10
num_best_seq_per_target	True	INT	How many top-scoring sequences to select per input after generation. Must be <= num_seq_per_target.	1
batch_size	True	INT	Batch size for generation. Reduce if you encounter GPU memory limits.	1
max_length	True	INT	Maximum total sequence length allowed.	200000
sampling_temp	True	STRING	Comma-separated list of temperatures controlling sampling diversity (e.g., 0.2,0.25,0.5). Higher values increase diversity.	0.1,0.15,0.2,0.25,0.3
omit_amino_acids	True	STRING	Comma-separated list of one-letter amino acid codes to exclude from generation.	C,M,W
chains_to_design	True	STRING	Comma-separated list of chain identifiers to design. Leave empty to design all chains.	A,B
seed	True	INT	Base random seed for reproducibility.	42
mode	True	MOCK \\| PROD \\| TEST	Execution mode. MOCK returns predefined sequences for quick demos, TEST minimizes generation for speed, PROD runs full generation.	PROD

Outputs¶

Field	Type	Description	Example
seqs.fasta	FASTA	All generated sequences across temperatures and samples in FASTA format.	>my_protein_A_0 ACDEFGHIK... >my_protein_A_1 ...
best_seqs.fasta	FASTA	Top-scoring subset of generated sequences per input structure in FASTA format.	>my_protein_A_best_0 ACDEFGHIK...

Important Notes¶

CA-only and soluble-only configurations cannot be used together. Set either ca_only or use_soluble_model to false.
The model v_48_030 is not available for CA-only. If ca_only is true, choose another model.
num_best_seq_per_target must be less than or equal to num_seq_per_target.
In TEST mode, num_seq_per_target is forced to 1 to speed up runs.
sampling_temp, omit_amino_acids, and chains_to_design accept comma-separated inputs; they are internally normalized for the service.
Timeout scales with the number of input structures; very large batches or long sequences can require more time.
Batch size influences memory usage; lower it if encountering resource limits.

Troubleshooting¶

Error: CA-SolubleMPNN model is not available yet. Set either ca_only or use_soluble_model to False. Fix by disabling one of the two options.
Error: CA-v_48_030 model is not available yet. Set ca_only to False or pick another model_name. Fix by selecting v_48_002, v_48_010, or v_48_020.
Error: Expected num_best_seq_per_target <= num_seq_per_target. Fix by reducing num_best_seq_per_target or increasing num_seq_per_target.
Generated sequences seem sparse or identical: increase sampling_temp values or backbone_noise to boost diversity.
Out-of-memory or long runtimes: reduce batch_size, decrease num_seq_per_target, or limit max_length.
Unexpected or invalid list inputs: ensure sampling_temp, omit_amino_acids, and chains_to_design are comma-separated without extra spaces; e.g., 0.1,0.2 or A,B.