Boltz Protein Sequence¶
Builds a protein entry for Boltz YAML workflows. It accepts a FASTA protein sequence and optional MSA, supports multiple identical chain copies, and allows specifying residue-level modifications and cyclic peptides. The node outputs a structured protein specification ready to be combined with other Boltz components.

Usage¶
Use this node to define a protein entity for downstream Boltz prediction or diffusion nodes. Provide a chain ID and a protein sequence (FASTA). Optionally add an A3M MSA, list additional chain IDs if the same sequence should be replicated across chains (e.g., B,C), declare post-translational modifications, and mark the protein as cyclic if applicable. Chain together with ligand, constraint, template, and property builders, then feed into a Boltz combine/predict node.
Inputs¶
| Field | Required | Type | Description | Example |
|---|---|---|---|---|
| chain_id | True | STRING | Primary chain identifier for this protein. If the same sequence should appear on multiple chains, specify them via 'multiple_chains'. | A |
| sequence | True | FASTA | Protein sequence in FASTA format. A header line starting with '>' is optional; if present, it is used as the internal sequence name. The sequence must not be empty. | >my_protein MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQ |
| msa | False | A3M | Multiple sequence alignment in A3M format. Leave empty to use an 'empty' MSA placeholder. Can be provided directly as A3M text or as a mapping from ID to A3M content produced by an upstream MSA node. | >seq1/1-100 MKTAYIAKQRQISFVKSHF--RQLEERLGLIEVQ >seq2/1-100 MK-AYIAKQRQISFVKSHF--RQLEERLGLIEVQ |
| modifications | False | STRING | Residue modifications, one per line, in the format 'position:ccd_code'. Positions are 1-based. Lines with invalid formats are ignored. | 5:SEP 42:MSE |
| cyclic | False | BOOLEAN | Mark the protein as cyclic. | false |
| multiple_chains | False | STRING | Comma-separated list of additional chain IDs that should be identical copies of the provided sequence. | B,C |
Outputs¶
| Field | Type | Description | Example |
|---|---|---|---|
| protein_sequence | * | A list containing a single structured protein specification for Boltz YAML. The structure includes keys like 'protein.id' (string or list of chain IDs), 'protein.sequence', optional 'protein.msa', optional 'protein.modifications', and optional 'protein.cyclic'. | [{"protein": {"id": ["A","B","C"], "sequence": "MKTAYIAK...", "_sequence_name": "my_protein", "msa": "empty", "modifications": [{"position": 5, "ccd": "SEP"}], "cyclic": false}}] |
Important Notes¶
- FASTA handling: If a header (starting with '>') is present, it is used as the internal sequence name; otherwise a default name is applied.
- MSA default: When 'msa' is not provided or empty, the node sets the MSA to 'empty'.
- MSA input forms: The node accepts either raw A3M text or a mapping {id: a3m_content}; in the latter case, the first entry is used.
- Multiple chains: Use 'multiple_chains' (e.g., B,C) to replicate the same sequence across chains; the output protein.id becomes a list.
- Modifications format: Each line must be 'position:ccd_code' with a 1-based integer position; invalid lines are ignored.
- Cyclic proteins: Set 'cyclic' to true to indicate a cyclic peptide.
- Required sequence: The protein sequence must be non-empty after parsing; otherwise the node raises an error.
Troubleshooting¶
- Error: Protein sequence is required: Ensure 'sequence' is provided and not just a header; include at least one amino-acid character line.
- Invalid modification format: Verify each line follows 'position:ccd_code' and positions are integers (e.g., '5:SEP').
- Unexpected MSA behavior: If passing a dict from an upstream node, confirm it contains at least one entry. If empty or not provided, the node sets MSA to 'empty'.
- Chain ID issues: Provide a single base 'chain_id' (e.g., A) and add replicas in 'multiple_chains' as a comma-separated list without spaces or with spaces trimmed (e.g., B,C).
- Sequence name mismatch: If you rely on sequence names across components, ensure the FASTA header aligns with expectations; otherwise a default name is used.