Boltz Sequence Builder¶
Creates a single molecular sequence entry for use in Boltz YAML workflows. Supports proteins, DNA, RNA, and ligands, including chain IDs, multiple identical chains, optional MSA for proteins, residue modifications, and cyclic polymers. Outputs a list containing one standardized sequence mapping ready to be combined into a full Boltz configuration.

Usage¶
Use this node to define each molecular entity in your system before assembling the final Boltz YAML. For proteins/DNA/RNA, provide the sequence (as raw string or FASTA). For ligands, provide either a SMILES string or a CCD code. If you have multiple identical chains (e.g., A,B), use the multiple_chains input. For proteins, you can attach MSA content (A3M file or raw string). Feed the resulting 'sequences' output into a list combiner and then into the Boltz YAML combiner.
Inputs¶
| Field | Required | Type | Description | Example |
|---|---|---|---|---|
| entity_type | True | one of: protein \| dna \| rna \| ligand | Type of molecular entity to define. | protein |
| chain_id | True | STRING | Primary chain identifier for this entity. If multiple copies are needed, add them via multiple_chains. | A |
| sequence | False | STRING | Sequence string for protein/DNA/RNA. Can be a raw sequence or FASTA text. If FASTA text is provided here, the header name will be used and the sequence extracted automatically. | MKTAYIAKQRQISFVKSHFSRQDILD... |
| sequence_fasta | False | FASTA | Sequence provided as a FASTA file/object (alternative to 'sequence'). The header name is stored, and the sequence is extracted. | {'seq_proteinA': '>seq_proteinA\nMKTAYI...\n'} |
| msa_content | False | STRING | MSA content as a raw string (protein only). Leave empty to mark MSA as 'empty'. If 'msa_a3m' is provided, it takes precedence. | >seq1/100-200 ACDE... >seq2/101-201 AC-E... |
| msa_a3m | False | A3M | MSA in A3M format (protein only). If provided, overrides 'msa_content'. | {'msaA': '>seq1\nACDE...\n>seq2\nAC-E...\n'} |
| ligand_smiles | False | STRING | SMILES string for ligand entities. Provide this or 'ligand_ccd', not both. | CC(=O)OC1=CC=CC=C1C(=O)O |
| ligand_ccd | False | STRING | CCD (Chemical Component Dictionary) code for ligand entities. Provide this or 'ligand_smiles', not both. | ATP |
| modifications | False | STRING | Residue-level modifications for protein/DNA/RNA. One per line in the form position:ccd_code. | 5:MSE 42:SEP |
| cyclic | False | BOOLEAN | Mark the polymer as cyclic (applies to protein/DNA/RNA). | false |
| multiple_chains | False | STRING | Comma-separated list of additional chain IDs to create identical copies of this entity. | B,C |
Outputs¶
| Field | Type | Description | Example |
|---|---|---|---|
| sequences | * | A list containing one standardized sequence mapping for the specified entity. Includes chain id(s) and the appropriate fields (sequence or ligand spec, MSA if protein, modifications, cyclic). | [{'protein': {'id': ['A', 'B'], 'sequence': 'MKTAYIAKQRQISFVKSHFSRQDILD...', '_sequence_name': 'protein_sequence', 'msa': 'empty', 'modifications': [{'position': 5, 'ccd': 'MSE'}], 'cyclic': False}}] |
Important Notes¶
- Sequence requirement: For protein/DNA/RNA, you must provide a sequence via 'sequence' or 'sequence_fasta'.
- Ligand exclusivity: For ligands, provide exactly one of 'ligand_smiles' or 'ligand_ccd'—not both.
- Multiple chains: Use 'multiple_chains' for identical copies (e.g., A with additional B,C). All listed IDs will be included under a single entity.
- MSA handling: If no MSA is provided for proteins, the node sets MSA to the literal value 'empty'.
- FASTA headers: When a FASTA is provided, the header name (without commas) is stored as '_sequence_name'.
- Modifications format: Each modification line must be 'position:ccd_code' with a valid integer position.
- Downstream uniqueness: Final YAML requires unique chain IDs across all sequences; ensure IDs don’t conflict when combining multiple entities.
Troubleshooting¶
- Error: 'Sequence is required for protein/dna/rna': Provide a valid sequence string or FASTA via 'sequence' or 'sequence_fasta'.
- Error: 'SMILES or CCD code is required for ligands': Set 'ligand_smiles' or 'ligand_ccd' (exactly one).
- Error: 'Provide either SMILES or CCD code, not both': Remove one of the ligand fields so only a single specification remains.
- Invalid modification format warning: Ensure each line in 'modifications' follows 'position:ccd_code' with a numeric position (e.g., '42:SEP').
- Unexpected FASTA in 'sequence': If pasting FASTA into 'sequence', the node will extract header and sequence automatically. Verify the header is correct and the lines contain only sequence characters.
- Multiple chains not applied: Confirm 'multiple_chains' uses comma-separated IDs without spaces (e.g., 'B,C').
- Downstream duplicate chain ID error: When later combining into YAML, ensure this node’s chain IDs do not overlap with other entities’ chain IDs.