Boltz Sequence Builder¶
Constructs a single molecular sequence entry for Boltz YAML configurations. Supports protein, DNA, RNA, and ligand entities, with optional MSA, cyclic flags, chain duplication, and residue modifications. FASTA input is parsed to extract both the sequence and an optional sequence name.

Usage¶
Use this node to define one chain or a set of identical chains for a molecular entity before assembling a full Boltz YAML. For proteins/DNA/RNA, provide a sequence (plain or FASTA); for ligands, specify either a SMILES string or a CCD code. Connect the output to a list combiner and then to the Boltz YAML combiner node to build the complete configuration.
Inputs¶
| Field | Required | Type | Description | Example |
|---|---|---|---|---|
| entity_type | True | CHOICE | Type of molecular entity to define. | protein |
| chain_id | True | STRING | Primary chain identifier for this entity. If multiple copies are needed, use multiple_chains to add more IDs. | A |
| sequence | False | STRING | Amino acid or nucleotide sequence for protein/DNA/RNA. Supports plain text or FASTA format (header lines starting with '>'). Ignored for ligands. | >my_protein\nMKTLLILAVVAAASA... |
| sequence_fasta | False | FASTA | Sequence provided as a FASTA file-like input (alternative to 'sequence'). The header will be used to set an internal sequence name. | {'my_seq.fasta': '>protA\\nMKTLLILAVVAAASA...'} |
| msa_content | False | STRING | MSA content in string form for proteins. If not provided and no A3M is given, an 'empty' MSA is set. Ignored for non-proteins. | >seq1\nMKTLLI-AVV\n>seq2\nMKTL-LVAVV |
| msa_a3m | False | A3M | MSA in A3M format for proteins (alternative to 'msa_content'). If provided, it takes precedence over 'msa_content'. | {'alignment.a3m': '>seq1\\nMKTLLI-AVV\\n>seq2\\nMKTL-LVAVV'} |
| ligand_smiles | False | STRING | SMILES string for ligand entities. Required if entity_type is ligand and 'ligand_ccd' is not provided. | CC(=O)Oc1ccccc1C(=O)O |
| ligand_ccd | False | STRING | CCD code for ligand entities. Required if entity_type is ligand and 'ligand_smiles' is not provided. | HEM |
| modifications | False | STRING | Residue-level modifications in the format 'position:ccd_code', one per line. Applies to polymers (protein/DNA/RNA). | 5:PTR\n12:MSE |
| cyclic | False | BOOLEAN | Mark the polymer as cyclic (applies to protein/DNA/RNA). | false |
| multiple_chains | False | STRING | Comma-separated additional chain IDs to represent identical copies of this entity. | B,C |
Outputs¶
| Field | Type | Description | Example |
|---|---|---|---|
| sequences | * | A list containing one sequence mapping for Boltz YAML. The mapping includes entity type keys ('protein', 'dna', 'rna', or 'ligand') with fields such as 'id', 'sequence' or 'smiles/ccd', optional 'msa', 'modifications', and 'cyclic'. | [{'protein': {'id': ['A', 'B'], 'sequence': 'MKTLLILAVVAAASA...', '_sequence_name': 'protA', 'msa': 'empty', 'modifications': [{'position': 5, 'ccd': 'PTR'}], 'cyclic': False}}] |
Important Notes¶
- Sequence requirement: For protein/DNA/RNA, a sequence must be provided via 'sequence' or 'sequence_fasta'; otherwise the node raises an error.
- Ligand specification: For ligands, provide exactly one of 'ligand_smiles' or 'ligand_ccd'—using both causes an error.
- MSA handling: For proteins, if neither 'msa_a3m' nor 'msa_content' is given, the node sets 'msa' to 'empty'.
- FASTA parsing: If input begins with a FASTA header ('>'), the header’s first token becomes the internal sequence name and the sequence lines are concatenated.
- Multiple chains: 'multiple_chains' duplicates the same entity across additional chain IDs; the 'id' field becomes a list of chain IDs.
- Modifications format: Each line must be 'position:ccd_code' with a valid integer position; invalid lines are ignored with a warning.
Troubleshooting¶
- Error: Sequence is required for protein/dna/rna: Provide a non-empty 'sequence' or 'sequence_fasta'.
- Error: Provide either SMILES or CCD code, not both: For ligand entities, remove one of 'ligand_smiles' or 'ligand_ccd' so only one is set.
- Unexpected empty sequence after FASTA input: Ensure the FASTA includes sequence lines below the header and that line breaks are preserved.
- Modifications not applied: Check each line is in 'position:ccd_code' format and that positions are integers.
- MSA not recognized: When supplying A3M as a file-like input, ensure it is provided as a keyed object (e.g., {'alignment.a3m': '
'}); otherwise use 'msa_content' as plain text.