Boltz Sequence Builder¶

Creates a single molecular sequence entry formatted for Boltz YAML. Supports protein, DNA, RNA, and ligand entities, including chain assignment, optional multiple identical chains, modifications, and polymer-specific options like MSA and cyclic polymers. Returns a list containing one sequence mapping ready to be combined into a full Boltz YAML configuration.

Usage¶

Use this node to define each chain/entity in a Boltz job. For proteins/DNA/RNA, provide a sequence (plain text or FASTA). For ligands, specify either a SMILES string or a CCD code. If you need identical copies of the same chain, list additional chain IDs. Connect outputs to a list combiner and then to the Boltz YAML combiner for prediction.

Inputs¶

Field	Required	Type	Description	Example
entity_type	True	CHOICE	Type of molecular entity to define. Determines which other inputs are required and how the sequence is interpreted.	protein
chain_id	True	STRING	Primary chain identifier for this entity. Must be unique across all chains in the final YAML.	A
sequence	False	STRING	Amino acid or nucleotide sequence for protein/DNA/RNA. Can be plain sequence characters or FASTA-formatted text. If FASTA-formatted, the first header is used to name the sequence and the content after headers is extracted.	MKTAYIAKQRQISFVKSHFSRQ
sequence_fasta	False	FASTA	Sequence in FASTA format for protein/DNA/RNA as an alternative to 'sequence'. If provided, it takes precedence and the header is used to name the sequence.	>my_protein MKTAYIAKQRQISFVKSHFSRQ
msa	False	MSA	Multiple Sequence Alignment for proteins in A3M or CSV format. Leave empty to use an 'empty' MSA.	>seq1 MKTAYIAK- >seq2 MKTAFIAKQ
ligand_smiles	False	STRING	SMILES string for ligand entities. Provide this OR 'ligand_ccd' (not both).	CCO
ligand_ccd	False	STRING	Three-letter CCD code for ligand entities. Provide this OR 'ligand_smiles' (not both).	ATP
modifications	False	STRING	Residue-level modifications for polymers. One per line in the format 'position:ccd_code'.	5:MSE 23:SEP
cyclic	False	BOOLEAN	Marks the polymer as cyclic when true.	false
multiple_chains	False	STRING	Comma-separated list of additional chain IDs to create identical copies of this entity.	B,C

Outputs¶

Field	Type	Description	Example
sequences	*	A list containing a single Boltz-ready sequence mapping for the specified entity. Designed to be merged with other sequences using a list combiner before building the final YAML.	[{"protein": {"id": "A", "sequence": "MKTAYIAK...", "_sequence_name": "my_protein", "msa": "empty"}}]

Important Notes¶

For polymers (protein/dna/rna), a sequence is required: provide either 'sequence' (plain or FASTA text) or 'sequence_fasta'.
For ligands, exactly one identifier is required: provide 'ligand_smiles' or 'ligand_ccd', but not both.
Multiple chains: Use 'multiple_chains' to generate identical copies; ensure all chain IDs are unique across the final YAML.
MSA handling (proteins only): Accepts A3M or CSV. If omitted, an 'empty' MSA will be used downstream.
Sequence names: When a FASTA header is present, the first header token (commas removed) is used as the sequence name.
Modifications: Use integer residue positions with CCD codes in 'position:ccd' format, one per line.
Cyclic polymers: Enable 'cyclic' to mark the chain as cyclic in the YAML.

Troubleshooting¶

Error: 'Sequence is required for protein/dna/rna': Provide a valid sequence via 'sequence' or 'sequence_fasta'. If using FASTA, ensure it starts with '>' and contains sequence lines.
Error: 'SMILES or CCD code is required for ligands': For ligand entities, set either 'ligand_smiles' or 'ligand_ccd'.
Error: 'Provide either SMILES or CCD code, not both': Remove one of the two ligand identifiers.
Invalid modification format warning: Ensure each line is 'position:ccd_code' with an integer position (e.g., '23:SEP').
Unexpected characters in FASTA name: Commas are stripped from FASTA headers. Consider simplifying header tokens.
Duplicate chain IDs when combining YAML: If later nodes report duplicate chain IDs, revise 'chain_id' and 'multiple_chains' to ensure global uniqueness.