Skip to content

Boltz Protein Sequence

Builds a single protein entry for a Boltz YAML configuration. It parses a FASTA-formatted protein sequence, extracts an optional sequence name from the header, assigns one or more chain IDs, and attaches an MSA reference or uses an 'empty' MSA if none is supplied. Optional post-translational modifications and cyclicity can be specified.
Preview

Usage

Use this node to define a protein component before assembling a full Boltz YAML. Connect its output to a list combiner or directly to a YAML combiner along with other components (ligands, constraints, templates, properties). Provide the protein sequence in FASTA format, optionally include an MSA (A3M) and any modifications, then combine and submit for prediction.

Inputs

FieldRequiredTypeDescriptionExample
chain_idTrueSTRINGPrimary chain identifier for this protein. Must be unique across all chains in the final YAML.A
sequenceTrueFASTAProtein sequence in FASTA format. The header (>name) is used as the sequence name if present; otherwise a default name is assigned.>my_protein ACDEFGHIKLMNPQRSTVWY
msaFalseA3MMultiple sequence alignment in A3M format. If omitted or empty, an 'empty' MSA is recorded.{'my_alignment.a3m': '>seq1\nACD--EFG\n>seq2\nACDEQEFG'}
modificationsFalseSTRINGPost-translational or residue modifications, one per line, using 'position:ccd_code'. Positions are 1-based integers.10:SEP 25:MSE
cyclicFalseBOOLEANMarks the protein as cyclic when true.False
multiple_chainsFalseSTRINGAdditional chain IDs if creating identical copies of this protein. Comma-separated list.B,C

Outputs

FieldTypeDescriptionExample
protein_sequence*A list containing one protein entry configured for Boltz YAML. This list item can be combined with other lists and passed into the Boltz YAML combiner.[{'protein': {'id': ['A', 'B'], 'sequence': 'ACDEFGHIKLMNPQRSTVWY', '_sequence_name': 'my_protein', 'msa': 'empty', 'modifications': [{'position': 10, 'ccd': 'SEP'}], 'cyclic': False}}]

Important Notes

  • Chain IDs must be unique across the final YAML. If you add multiple chains here (A,B,C), ensure no other node reuses those IDs.
  • FASTA header parsing: if the sequence starts with '>name', that name is used; otherwise a default name is assigned.
  • MSA handling: if an A3M is provided it is attached; if omitted or blank, 'empty' is recorded to signal no MSA.
  • Modifications must use 'position:ccd_code' per line (e.g., '10:SEP'). Invalid lines are ignored.
  • Multiple chains create identical copies of this protein associated with the listed chain IDs.

Troubleshooting

  • Protein sequence is required: Ensure the 'sequence' input contains FASTA text. Empty input will cause an error.
  • Invalid modification format: Use 'position:ccd_code' with a 1-based integer position, one per line.
  • Duplicate chain ID error later: If the YAML combiner reports duplicate IDs, adjust 'chain_id' or 'multiple_chains' here (and in other nodes) to maintain uniqueness.
  • MSA not applied: Verify the A3M input is provided as a valid A3M resource. If blank or malformed, the node will record 'empty' MSA.
  • Unexpected sequence name: If no FASTA header is present, a default name is used. Add a '>name' header to control the sequence name.