Skip to content

Boltz Sequence Builder

Creates a single molecular sequence entry formatted for Boltz YAML. Supports protein, DNA, RNA, and ligand entities, including chain assignment, optional multiple identical chains, modifications, and polymer-specific options like MSA and cyclic polymers. Returns a list containing one sequence mapping ready to be combined into a full Boltz YAML configuration.
Preview

Usage

Use this node to define each chain/entity in a Boltz job. For proteins/DNA/RNA, provide a sequence (plain text or FASTA). For ligands, specify either a SMILES string or a CCD code. If you need identical copies of the same chain, list additional chain IDs. Connect outputs to a list combiner and then to the Boltz YAML combiner for prediction.

Inputs

FieldRequiredTypeDescriptionExample
entity_typeTrueCHOICEType of molecular entity to define. Determines which other inputs are required and how the sequence is interpreted.protein
chain_idTrueSTRINGPrimary chain identifier for this entity. Must be unique across all chains in the final YAML.A
sequenceFalseSTRINGAmino acid or nucleotide sequence for protein/DNA/RNA. Can be plain sequence characters or FASTA-formatted text. If FASTA-formatted, the first header is used to name the sequence and the content after headers is extracted.MKTAYIAKQRQISFVKSHFSRQ
sequence_fastaFalseFASTASequence in FASTA format for protein/DNA/RNA as an alternative to 'sequence'. If provided, it takes precedence and the header is used to name the sequence.>my_protein MKTAYIAKQRQISFVKSHFSRQ
msaFalseMSAMultiple Sequence Alignment for proteins in A3M or CSV format. Leave empty to use an 'empty' MSA.>seq1 MKTAYIAK- >seq2 MKTAFIAKQ
ligand_smilesFalseSTRINGSMILES string for ligand entities. Provide this OR 'ligand_ccd' (not both).CCO
ligand_ccdFalseSTRINGThree-letter CCD code for ligand entities. Provide this OR 'ligand_smiles' (not both).ATP
modificationsFalseSTRINGResidue-level modifications for polymers. One per line in the format 'position:ccd_code'.5:MSE 23:SEP
cyclicFalseBOOLEANMarks the polymer as cyclic when true.false
multiple_chainsFalseSTRINGComma-separated list of additional chain IDs to create identical copies of this entity.B,C

Outputs

FieldTypeDescriptionExample
sequences*A list containing a single Boltz-ready sequence mapping for the specified entity. Designed to be merged with other sequences using a list combiner before building the final YAML.[{"protein": {"id": "A", "sequence": "MKTAYIAK...", "_sequence_name": "my_protein", "msa": "empty"}}]

Important Notes

  • For polymers (protein/dna/rna), a sequence is required: provide either 'sequence' (plain or FASTA text) or 'sequence_fasta'.
  • For ligands, exactly one identifier is required: provide 'ligand_smiles' or 'ligand_ccd', but not both.
  • Multiple chains: Use 'multiple_chains' to generate identical copies; ensure all chain IDs are unique across the final YAML.
  • MSA handling (proteins only): Accepts A3M or CSV. If omitted, an 'empty' MSA will be used downstream.
  • Sequence names: When a FASTA header is present, the first header token (commas removed) is used as the sequence name.
  • Modifications: Use integer residue positions with CCD codes in 'position:ccd' format, one per line.
  • Cyclic polymers: Enable 'cyclic' to mark the chain as cyclic in the YAML.

Troubleshooting

  • Error: 'Sequence is required for protein/dna/rna': Provide a valid sequence via 'sequence' or 'sequence_fasta'. If using FASTA, ensure it starts with '>' and contains sequence lines.
  • Error: 'SMILES or CCD code is required for ligands': For ligand entities, set either 'ligand_smiles' or 'ligand_ccd'.
  • Error: 'Provide either SMILES or CCD code, not both': Remove one of the two ligand identifiers.
  • Invalid modification format warning: Ensure each line is 'position:ccd_code' with an integer position (e.g., '23:SEP').
  • Unexpected characters in FASTA name: Commas are stripped from FASTA headers. Consider simplifying header tokens.
  • Duplicate chain IDs when combining YAML: If later nodes report duplicate chain IDs, revise 'chain_id' and 'multiple_chains' to ensure global uniqueness.