Skip to content

Boltz Protein Sequence

Builds a protein sequence entry formatted for Boltz YAML workflows. It accepts a FASTA sequence, optional MSA content, modification annotations, and can assign the sequence to one or multiple chain IDs. The node also extracts a sequence name from a FASTA header when present and defaults the MSA to "empty" if not provided.
Preview

Usage

Use this node when preparing protein inputs for Boltz structure prediction or design pipelines. Provide a FASTA sequence and optionally an MSA (A3M), specify chain labeling (single or multiple identical chains), and annotate modifications or cyclicity. Connect its output into downstream Boltz configuration builders or predictors.

Inputs

FieldRequiredTypeDescriptionExample
chain_idTrueSTRINGPrimary chain identifier for this protein. If multiple identical chains are needed, add them via multiple_chains.A
sequenceTrueFASTAProtein sequence in FASTA format. A header (e.g., >name) is optional; if present, it is used as the internal sequence name.>my_protein MKTFFVAGVGLAAG...
msaFalseA3MMultiple sequence alignment content in A3M format. If omitted, the node sets MSA to "empty". Can be provided as raw text or as a single-key dict containing the content.>seq1 MKTFFVAGVGLAAG- >seq2 MKTYFV-GVGLAAG-
modificationsFalseSTRINGPer-residue modifications, one per line, in the format position:ccd_code. Positions are 1-based integers; CCD is a chemical component identifier.5:SEP 12:MSE
cyclicFalseBOOLEANMarks the protein as cyclic when true.true
multiple_chainsFalseSTRINGComma-separated additional chain IDs to create identical copies of this protein.B,C

Outputs

FieldTypeDescriptionExample
protein_sequence*A list containing one dictionary with the protein entry formatted for Boltz YAML: {"protein": {"id": , "sequence": , "_sequence_name": , "msa": , optional "modifications" and "cyclic"}}.[{"protein": {"id": ["A","B"], "sequence": "MKTFFV...", "_sequence_name": "my_protein", "msa": "empty", "modifications": [{"position": 5, "ccd": "SEP"}], "cyclic": true}}]

Important Notes

  • FASTA header handling: If the sequence starts with a FASTA header (e.g., >name), that name is captured as _sequence_name; otherwise, it defaults to "sequence".
  • MSA defaulting: If no MSA is provided, msa is set to "empty". If MSA is provided as a dict, the first key's value is used.
  • Multiple chains: Setting multiple_chains will produce an id list (e.g., ["A","B"]). Without it, id is a single string.
  • Modifications format: Each line must be position:ccd_code (e.g., 5:SEP). Lines failing integer parsing for position are ignored with a warning.
  • Cyclicity: Setting cyclic to true adds a cyclic flag to the protein entry.
  • Validation: An empty or whitespace-only sequence will raise an error.

Troubleshooting

  • Empty sequence error: Ensure the FASTA input is not blank and contains valid amino acid characters after any header.
  • Invalid modification line ignored: Check that each modification line is in the exact format position:ccd_code (e.g., 12:MSE).
  • Unexpected MSA behavior: If passing MSA as an object/dict, ensure it contains a single entry with the raw A3M text. Otherwise, provide raw A3M text.
  • Multiple chains not applied: Verify multiple_chains uses comma-separated IDs without spaces (e.g., B,C) and that chain_id is set.
  • Sequence name missing: Provide a FASTA header line (e.g., >my_protein) if you want a custom name; otherwise it will default to "sequence".