ProteinMPNN¶
Generates protein sequences conditioned on input protein backbones using the ProteinMPNN family of models. Supports full-atom (vanilla), CA-only, and soluble-protein–specialized variants with controls for sampling temperature, backbone noise, chain targeting, and reproducibility. Returns all generated sequences in a single FASTA output.

Usage¶
Use this node to redesign sequences for one or more input protein structures. Provide PDB backbones and select a ProteinMPNN variant that matches your structure type (vanilla for full-atom, ca_only for CA-only PDBs, soluble for soluble-focused design). Tune sampling temperature for diversity, optionally omit specified amino acids, and target specific chains. Integrate after structure generation/refinement or for sequence optimization pipelines; the node outputs a consolidated FASTA with all sequences.
Inputs¶
| Field | Required | Type | Description | Example |
|---|---|---|---|---|
| pdb | True | PDB | One or more protein structures to design against. Expects a mapping of names to PDB contents; each entry is treated as an independent target. | {"my_target": "ATOM ... END"} |
| model_name | True | STRING | ProteinMPNN model variant to use. Vanilla models use all atoms, CA-only models accept CA-only structures, and soluble models bias toward soluble proteins. | vanilla_v_48_020 |
| backbone_noise | True | FLOAT | Standard deviation of Gaussian noise added to backbone atoms to promote sequence diversity and robustness. Range 0.0–1.0. | 0.1 |
| num_seq_per_target | True | INT | Number of sequences to generate per input structure. | 10 |
| max_length | True | INT | Maximum allowed sequence length. Acts as a guardrail for very large designs. | 200000 |
| sampling_temp | True | STRING | Comma-separated temperature schedule for sampling amino acids (e.g., "0.2, 0.25, 0.5"). Higher values increase diversity. At least one value is required. | 0.1 |
| omit_amino_acids | True | STRING | Comma-separated list of amino acids to prohibit during design (e.g., "A,C"). Use single-letter codes. | X |
| chain_list | True | STRING | Comma-separated list of chain IDs to design. Leave empty to design all chains. | A,B |
| seed | True | INT | Base random seed for reproducibility. Unsigned 32-bit range. | 42 |
| mode | True | STRING | Execution mode. MOCK returns bundled example results, PROD runs the managed service, TEST runs quickly with minimal outputs. | PROD |
Outputs¶
| Field | Type | Description | Example |
|---|---|---|---|
| seqs.fasta | FASTA | FASTA-formatted sequences generated for all input structures and requested samples. | >my_target_design_0 MADEUPSEQUENCE... >my_target_design_1 MADEUPSEQUENCE... |
Important Notes¶
- Model name must include a valid version suffix (e.g., v_48_002, v_48_010, v_48_020, v_48_030) and a supported prefix: vanilla, ca_only, or soluble.
- If mode is TEST, the node forces num_seq_per_target to 1 to ensure fast execution.
- Sampling temperature cannot be empty; at least one numeric value is required.
- Set chain_list to target only specific chains; leave empty to design all chains present in the PDB.
- Use ca_only models only with CA-only PDB inputs; use vanilla for full-atom PDBs.
- Large pdb sets scale the execution timeout with the number of targets.
Troubleshooting¶
- Invalid model name error: Ensure model_name matches the pattern
v_48 where prefix is one of vanilla, ca_only, or soluble. - Sampling temperature empty: Provide at least one value (e.g., "0.1"); use commas to specify a schedule.
- Unexpected residues or omissions: Verify omit_amino_acids is a comma-separated list of one-letter codes (e.g., "A,C").
- Design limited to some chains only: Confirm chain_list is set correctly (e.g., "A,B") or left empty to include all chains.
- Timeouts or long runtimes: Reduce num_seq_per_target, simplify sampling_temp, or submit fewer pdb targets at once.