Boltz Sequence Builder¶
Creates a single molecular sequence entry formatted for Boltz YAML. Supports protein, DNA, RNA, and ligand entities, including chain assignment, optional multiple identical chains, modifications, and polymer-specific options like MSA and cyclic polymers. Returns a list containing one sequence mapping ready to be combined into a full Boltz YAML configuration.

Usage¶
Use this node to define each chain/entity in a Boltz job. For proteins/DNA/RNA, provide a sequence (plain text or FASTA). For ligands, specify either a SMILES string or a CCD code. If you need identical copies of the same chain, list additional chain IDs. Connect outputs to a list combiner and then to the Boltz YAML combiner for prediction.
Inputs¶
| Field | Required | Type | Description | Example |
|---|---|---|---|---|
| entity_type | True | CHOICE | Type of molecular entity to define. Determines which other inputs are required and how the sequence is interpreted. | protein |
| chain_id | True | STRING | Primary chain identifier for this entity. Must be unique across all chains in the final YAML. | A |
| sequence | False | STRING | Amino acid or nucleotide sequence for protein/DNA/RNA. Can be plain sequence characters or FASTA-formatted text. If FASTA-formatted, the first header is used to name the sequence and the content after headers is extracted. | MKTAYIAKQRQISFVKSHFSRQ |
| sequence_fasta | False | FASTA | Sequence in FASTA format for protein/DNA/RNA as an alternative to 'sequence'. If provided, it takes precedence and the header is used to name the sequence. | >my_protein MKTAYIAKQRQISFVKSHFSRQ |
| msa | False | MSA | Multiple Sequence Alignment for proteins in A3M or CSV format. Leave empty to use an 'empty' MSA. | >seq1 MKTAYIAK- >seq2 MKTAFIAKQ |
| ligand_smiles | False | STRING | SMILES string for ligand entities. Provide this OR 'ligand_ccd' (not both). | CCO |
| ligand_ccd | False | STRING | Three-letter CCD code for ligand entities. Provide this OR 'ligand_smiles' (not both). | ATP |
| modifications | False | STRING | Residue-level modifications for polymers. One per line in the format 'position:ccd_code'. | 5:MSE 23:SEP |
| cyclic | False | BOOLEAN | Marks the polymer as cyclic when true. | false |
| multiple_chains | False | STRING | Comma-separated list of additional chain IDs to create identical copies of this entity. | B,C |
Outputs¶
| Field | Type | Description | Example |
|---|---|---|---|
| sequences | * | A list containing a single Boltz-ready sequence mapping for the specified entity. Designed to be merged with other sequences using a list combiner before building the final YAML. | [{"protein": {"id": "A", "sequence": "MKTAYIAK...", "_sequence_name": "my_protein", "msa": "empty"}}] |
Important Notes¶
- For polymers (protein/dna/rna), a sequence is required: provide either 'sequence' (plain or FASTA text) or 'sequence_fasta'.
- For ligands, exactly one identifier is required: provide 'ligand_smiles' or 'ligand_ccd', but not both.
- Multiple chains: Use 'multiple_chains' to generate identical copies; ensure all chain IDs are unique across the final YAML.
- MSA handling (proteins only): Accepts A3M or CSV. If omitted, an 'empty' MSA will be used downstream.
- Sequence names: When a FASTA header is present, the first header token (commas removed) is used as the sequence name.
- Modifications: Use integer residue positions with CCD codes in 'position:ccd' format, one per line.
- Cyclic polymers: Enable 'cyclic' to mark the chain as cyclic in the YAML.
Troubleshooting¶
- Error: 'Sequence is required for protein/dna/rna': Provide a valid sequence via 'sequence' or 'sequence_fasta'. If using FASTA, ensure it starts with '>' and contains sequence lines.
- Error: 'SMILES or CCD code is required for ligands': For ligand entities, set either 'ligand_smiles' or 'ligand_ccd'.
- Error: 'Provide either SMILES or CCD code, not both': Remove one of the two ligand identifiers.
- Invalid modification format warning: Ensure each line is 'position:ccd_code' with an integer position (e.g., '23:SEP').
- Unexpected characters in FASTA name: Commas are stripped from FASTA headers. Consider simplifying header tokens.
- Duplicate chain IDs when combining YAML: If later nodes report duplicate chain IDs, revise 'chain_id' and 'multiple_chains' to ensure global uniqueness.