ProteinMPNN¶
Designs protein sequences for given 3D backbone structures using ProteinMPNN. Supports CA-only parsing, soluble-only weights, temperature-controlled sampling, residue omission, chain-specific design, batching, and multiple model variants. Returns all generated sequences and the top-scoring subset as FASTA.

Usage¶
Use this node to generate and rank amino acid sequences compatible with input protein structures. Provide one or more PDB structures, choose the ProteinMPNN model and options (e.g., sampling temperatures, omit residues, chains to design), then integrate the output FASTA into downstream evaluation or structure prediction workflows.
Inputs¶
| Field | Required | Type | Description | Example | 
|---|---|---|---|---|
| pdb | True | PDB | Protein backbone structures to design sequences against. Typically a collection mapping structure names to PDB-formatted text. | {'my_protein_A.pdb': 'ATOM ...', 'complex_AB.pdb': 'ATOM ...'} | 
| ca_only | True | BOOLEAN | If true, parse CA-only structures and use CA-only models. | False | 
| model_name | True | v_48_002 \| v_48_010 \| v_48_020 \| v_48_030 | ProteinMPNN model variant to use. | v_48_020 | 
| use_soluble_model | True | BOOLEAN | If true, use weights trained on soluble proteins only. | False | 
| backbone_noise | True | FLOAT | Standard deviation of Gaussian noise added to backbone atoms to increase sequence diversity (0.0–1.0). | 0.1 | 
| num_seq_per_target | True | INT | How many sequences to generate per input structure. | 10 | 
| num_best_seq_per_target | True | INT | How many top-scoring sequences to select per input after generation. Must be <= num_seq_per_target. | 1 | 
| batch_size | True | INT | Batch size for generation. Reduce if you encounter GPU memory limits. | 1 | 
| max_length | True | INT | Maximum total sequence length allowed. | 200000 | 
| sampling_temp | True | STRING | Comma-separated list of temperatures controlling sampling diversity (e.g., 0.2,0.25,0.5). Higher values increase diversity. | 0.1,0.15,0.2,0.25,0.3 | 
| omit_amino_acids | True | STRING | Comma-separated list of one-letter amino acid codes to exclude from generation. | C,M,W | 
| chains_to_design | True | STRING | Comma-separated list of chain identifiers to design. Leave empty to design all chains. | A,B | 
| seed | True | INT | Base random seed for reproducibility. | 42 | 
| mode | True | MOCK \| PROD \| TEST | Execution mode. MOCK returns predefined sequences for quick demos, TEST minimizes generation for speed, PROD runs full generation. | PROD | 
Outputs¶
| Field | Type | Description | Example | 
|---|---|---|---|
| seqs.fasta | FASTA | All generated sequences across temperatures and samples in FASTA format. | >my_protein_A_0 ACDEFGHIK... >my_protein_A_1 ... | 
| best_seqs.fasta | FASTA | Top-scoring subset of generated sequences per input structure in FASTA format. | >my_protein_A_best_0 ACDEFGHIK... | 
Important Notes¶
- CA-only and soluble-only configurations cannot be used together. Set either ca_only or use_soluble_model to false.
- The model v_48_030 is not available for CA-only. If ca_only is true, choose another model.
- num_best_seq_per_target must be less than or equal to num_seq_per_target.
- In TEST mode, num_seq_per_target is forced to 1 to speed up runs.
- sampling_temp, omit_amino_acids, and chains_to_design accept comma-separated inputs; they are internally normalized for the service.
- Timeout scales with the number of input structures; very large batches or long sequences can require more time.
- Batch size influences memory usage; lower it if encountering resource limits.
Troubleshooting¶
- Error: CA-SolubleMPNN model is not available yet. Set either ca_only or use_soluble_model to False. Fix by disabling one of the two options.
- Error: CA-v_48_030 model is not available yet. Set ca_only to False or pick another model_name. Fix by selecting v_48_002, v_48_010, or v_48_020.
- Error: Expected num_best_seq_per_target <= num_seq_per_target. Fix by reducing num_best_seq_per_target or increasing num_seq_per_target.
- Generated sequences seem sparse or identical: increase sampling_temp values or backbone_noise to boost diversity.
- Out-of-memory or long runtimes: reduce batch_size, decrease num_seq_per_target, or limit max_length.
- Unexpected or invalid list inputs: ensure sampling_temp, omit_amino_acids, and chains_to_design are comma-separated without extra spaces; e.g., 0.1,0.2 or A,B.
Example Pipelines¶
