Skip to content

Boltz Protein Sequence

Builds a protein sequence entry for Boltz YAML workflows. It accepts a required protein sequence (FASTA format), supports optional MSA (A3M or plain text), residue modifications, a cyclic flag, and duplication across multiple chain IDs. FASTA headers are parsed to name the sequence automatically; MSA defaults to 'empty' when not provided.
Preview

Usage

Use this node to create a protein sequence specification before combining it with other sequences, constraints, templates, and properties in a Boltz workflow. Typical usage: set the primary chain ID and FASTA sequence, optionally attach an MSA, add residue modifications or mark as cyclic, and specify additional chain IDs when the same chain should be duplicated (e.g., homomers). Feed the resulting sequence list into a Boltz YAML Combiner or downstream prediction nodes.

Inputs

FieldRequiredTypeDescriptionExample
chain_idTrueSTRINGPrimary chain identifier for this protein (single character or label).A
sequenceTrueFASTAProtein sequence in FASTA format. If a header is present, it is used to derive the sequence name.>my_protein MSEQUENCEAA...
msaFalseA3MMultiple sequence alignment in A3M format (or plain text). If omitted or empty, the MSA is set to 'empty'.>seq1/1-100 M---SE-QUEN-CE >seq2/1-100 MABCSE-QUEN-CE
modificationsFalseSTRINGResidue-level modifications, one per line, formatted as 'position:ccd_code'. Positions must be integers; invalid lines are ignored.5:ABC 12:XYZ
cyclicFalseBOOLEANMark the protein as cyclic.true
multiple_chainsFalseSTRINGAdditional chain IDs (comma-separated) to duplicate this sequence across multiple chains.B,C

Outputs

FieldTypeDescriptionExample
protein_sequence*A list containing one protein sequence specification object suitable for Boltz YAML assembly.[{"protein": {"id": "A", "sequence": "MSEQUENCE...", "_sequence_name": "my_protein", "msa": "empty", "modifications": [{"position": 5, "ccd": "ABC"}], "cyclic": true}}]

Important Notes

  • Sequence is required and cannot be empty. If a FASTA header exists, it is used as the sequence name; otherwise a default name is assigned.
  • If MSA is provided as an uploaded A3M file, the file content is used; if omitted or blank, the node sets msa to 'empty'.
  • When multiple_chains is provided, the 'id' field becomes a list of chain IDs; otherwise it remains a single string.
  • Modifications must be provided as 'position:ccd_code' per line; non-integer positions or malformed lines are ignored.
  • This node outputs a list with a single protein entry, designed to be combined with other entries by downstream Boltz YAML assembly nodes.

Troubleshooting

  • Error: 'Protein sequence is required' or 'Protein sequence cannot be empty' — Ensure the FASTA input is provided and not blank.
  • Unexpected name or missing name — Confirm the FASTA header line starts with '>' and contains a valid identifier.
  • MSA not used — Provide a valid A3M file or non-empty text; otherwise the node sets msa to 'empty'.
  • Modifications ignored — Ensure each line follows 'position:ccd_code' with an integer position.
  • Duplicated chains not appearing — Verify multiple_chains uses comma-separated IDs without spaces (e.g., 'B,C').