Skip to content

Boltz Protein Sequence

Builds a protein entry for Boltz YAML workflows. It accepts a FASTA protein sequence and optional MSA, supports multiple identical chain copies, and allows specifying residue-level modifications and cyclic peptides. The node outputs a structured protein specification ready to be combined with other Boltz components.
Preview

Usage

Use this node to define a protein entity for downstream Boltz prediction or diffusion nodes. Provide a chain ID and a protein sequence (FASTA). Optionally add an A3M MSA, list additional chain IDs if the same sequence should be replicated across chains (e.g., B,C), declare post-translational modifications, and mark the protein as cyclic if applicable. Chain together with ligand, constraint, template, and property builders, then feed into a Boltz combine/predict node.

Inputs

FieldRequiredTypeDescriptionExample
chain_idTrueSTRINGPrimary chain identifier for this protein. If the same sequence should appear on multiple chains, specify them via 'multiple_chains'.A
sequenceTrueFASTAProtein sequence in FASTA format. A header line starting with '>' is optional; if present, it is used as the internal sequence name. The sequence must not be empty.>my_protein MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQ
msaFalseA3MMultiple sequence alignment in A3M format. Leave empty to use an 'empty' MSA placeholder. Can be provided directly as A3M text or as a mapping from ID to A3M content produced by an upstream MSA node.>seq1/1-100 MKTAYIAKQRQISFVKSHF--RQLEERLGLIEVQ >seq2/1-100 MK-AYIAKQRQISFVKSHF--RQLEERLGLIEVQ
modificationsFalseSTRINGResidue modifications, one per line, in the format 'position:ccd_code'. Positions are 1-based. Lines with invalid formats are ignored.5:SEP 42:MSE
cyclicFalseBOOLEANMark the protein as cyclic.false
multiple_chainsFalseSTRINGComma-separated list of additional chain IDs that should be identical copies of the provided sequence.B,C

Outputs

FieldTypeDescriptionExample
protein_sequence*A list containing a single structured protein specification for Boltz YAML. The structure includes keys like 'protein.id' (string or list of chain IDs), 'protein.sequence', optional 'protein.msa', optional 'protein.modifications', and optional 'protein.cyclic'.[{"protein": {"id": ["A","B","C"], "sequence": "MKTAYIAK...", "_sequence_name": "my_protein", "msa": "empty", "modifications": [{"position": 5, "ccd": "SEP"}], "cyclic": false}}]

Important Notes

  • FASTA handling: If a header (starting with '>') is present, it is used as the internal sequence name; otherwise a default name is applied.
  • MSA default: When 'msa' is not provided or empty, the node sets the MSA to 'empty'.
  • MSA input forms: The node accepts either raw A3M text or a mapping {id: a3m_content}; in the latter case, the first entry is used.
  • Multiple chains: Use 'multiple_chains' (e.g., B,C) to replicate the same sequence across chains; the output protein.id becomes a list.
  • Modifications format: Each line must be 'position:ccd_code' with a 1-based integer position; invalid lines are ignored.
  • Cyclic proteins: Set 'cyclic' to true to indicate a cyclic peptide.
  • Required sequence: The protein sequence must be non-empty after parsing; otherwise the node raises an error.

Troubleshooting

  • Error: Protein sequence is required: Ensure 'sequence' is provided and not just a header; include at least one amino-acid character line.
  • Invalid modification format: Verify each line follows 'position:ccd_code' and positions are integers (e.g., '5:SEP').
  • Unexpected MSA behavior: If passing a dict from an upstream node, confirm it contains at least one entry. If empty or not provided, the node sets MSA to 'empty'.
  • Chain ID issues: Provide a single base 'chain_id' (e.g., A) and add replicas in 'multiple_chains' as a comma-separated list without spaces or with spaces trimmed (e.g., B,C).
  • Sequence name mismatch: If you rely on sequence names across components, ensure the FASTA header aligns with expectations; otherwise a default name is used.