ProteinMPNN¶
Generates protein sequences conditioned on provided backbone structures using the ProteinMPNN family of models. Supports vanilla (all-atom), CA-only, and soluble-only model variants, with controls for backbone noise, temperature sampling, chain targeting, and amino-acid omissions. Can run in production, mock, or test modes.

Usage¶
Use this node to design sequences for one or more protein backbones. Provide PDB structures, select a ProteinMPNN variant, and configure sequence sampling parameters. Typical workflow: prepare/clean PDBs, optionally select specific chains to design, set temperature(s) for diversity, and run in PROD for real results, TEST for quick sanity checks (1 sequence per target), or MOCK for predefined outputs.
Inputs¶
| Field | Required | Type | Description | Example |
|---|---|---|---|---|
| pdb | True | PDB | Protein backbone structure(s) to design against. Provide one or more structures in PDB format; each entry corresponds to a target design. | {'my_design_1': 'ATOM ...\nTER\nEND\n', 'my_design_2': 'ATOM ...\nTER\nEND\n'} |
| model_name | True | STRING | ProteinMPNN model variant. Prefix determines mode (vanilla, ca_only, soluble) and suffix selects the model ID. Valid options: vanilla_v_48_002/010/020/030, ca_only_v_48_002/010/020, soluble_v_48_002/010/020/030. | vanilla_v_48_020 |
| backbone_noise | True | FLOAT | Standard deviation of Gaussian noise applied to backbone atoms to induce diversity. | 0.1 |
| num_seq_per_target | True | INT | Number of sequences to generate per input structure. | 10 |
| max_length | True | INT | Maximum allowed sequence length for design. | 200000 |
| sampling_temp | True | STRING | Comma-separated list of temperatures used during amino acid sampling. Higher values increase diversity. Must not be empty. | 0.1,0.15,0.2 |
| omit_amino_acids | True | STRING | Comma-separated one-letter codes for amino acids to exclude from designs. | C,M,W |
| chain_list | True | STRING | Comma-separated list of chain IDs to design. If empty, all chains are designed. | A,B |
| seed | True | INT | Base random seed for reproducibility. | 42 |
| mode | True | STRING | Execution mode. MOCK returns predefined results, PROD calls the live service, TEST runs quickly with minimal settings (forces 1 sequence per target). | PROD |
Outputs¶
| Field | Type | Description | Example |
|---|---|---|---|
| seqs.fasta | FASTA | All generated sequences in FASTA format aggregated across inputs and samples. | >my_design_1_seq_0001 MKT...LL >my_design_1_seq_0002 GAS...VV |
Important Notes¶
- Model selection: The model_name must contain a valid prefix and an ID (e.g., vanilla_v_48_020). Prefix controls behavior (vanilla, ca_only, soluble).
- Sampling temperatures: sampling_temp must be a non-empty, comma-separated list (e.g., 0.1,0.2). Empty values will cause an error.
- Chain targeting: Leave chain_list empty to design all chains; otherwise provide comma-separated chain IDs (e.g., A,B).
- Amino acid omissions: Use one-letter codes in omit_amino_acids to exclude residues from designs.
- TEST mode: Automatically overrides num_seq_per_target to 1 for faster runs.
- Multiple structures: You can submit multiple PDBs in one run; timeouts scale with the number of inputs.
- Output: Only the concatenated FASTA of all sequences is returned.
Troubleshooting¶
- Invalid model name: Ensure model_name matches one of the supported options (e.g., vanilla_v_48_020).
- Empty sampling_temp: Provide at least one temperature (e.g., 0.1). The field cannot be blank.
- Chain IDs not found: Verify chain_list matches chains present in your PDBs or leave it empty to design all chains.
- Excessive sequence length: If designs fail due to limits, reduce max_length or split very large structures.
- Unexpected residues: Add them to omit_amino_acids (e.g., C) if certain residues should not appear.
- Reproducibility issues: Set seed to a fixed integer. Note that higher backbone_noise or multiple temperatures increase variability.
- Mock/test differences: MOCK returns predefined data; TEST limits output to 1 sequence per target; use PROD for real generation.