Boltz Sequence Builder¶
Creates a single Boltz YAML-ready sequence entry for proteins, nucleic acids, or ligands. Supports FASTA or raw sequence text, optional MSA (A3M or text) for proteins, chemical specifications for ligands, chain duplication, cyclic polymers, and residue-level modifications.

Usage¶
Use this node to define one molecular entity (protein, DNA, RNA, or ligand) that will go into a Boltz YAML configuration. Typically, you create one or more sequences with this node, optionally add constraints/templates/properties, combine them with a list combiner, and then feed them to the Boltz YAML combiner and prediction nodes.
Inputs¶
| Field | Required | Type | Description | Example |
|---|---|---|---|---|
| entity_type | True | CHOICE | Type of molecular entity to define. | protein |
| chain_id | True | STRING | Primary chain identifier for this entity. If multiple_chains is provided, this becomes the first chain in the list. | A |
| sequence | False | STRING | Raw sequence text for protein/DNA/RNA. Can be plain sequence or FASTA-formatted text. If FASTA is used here, the header name is extracted automatically. | >my_protein ACDEFGHIKLMNPQRSTVWY |
| sequence_fasta | False | FASTA | Sequence in FASTA input type (alternative to sequence). The first entry’s header becomes the sequence name; sequence lines are concatenated. | {'my_protein': '>my_protein\nACDEFGHIKLMNPQRSTVWY'} |
| msa_content | False | STRING | Protein MSA content as plain text (A3M format). If omitted and no msa_a3m is provided, the MSA is set to 'empty'. | >seq1 ACD- >seq2 ACDE |
| msa_a3m | False | A3M | Protein MSA provided via A3M input type (alternative to msa_content). The first key’s content is used. | {'msa_1.a3m': '>seq1\nACD-\n>seq2\nACDE'} |
| ligand_smiles | False | STRING | Ligand specification via SMILES string (for entity_type=ligand). Provide either this or ligand_ccd, not both. | CCO |
| ligand_ccd | False | STRING | Ligand specification via PDB CCD code (for entity_type=ligand). Provide either this or ligand_smiles, not both. | HEM |
| modifications | False | STRING | Residue-level modifications for polymer entities. One per line in the form 'position:ccd_code'. | 5:SEP 12:MSE |
| cyclic | False | BOOLEAN | Whether the polymer is cyclic. Applicable to protein/DNA/RNA. | false |
| multiple_chains | False | STRING | Additional chain IDs for identical copies of this entity. Comma-separated; the final 'id' will be a list if provided. | B,C |
Outputs¶
| Field | Type | Description | Example |
|---|---|---|---|
| sequences | * | A list containing one Boltz-ready sequence mapping for the specified entity. This should be combined with other sequences using a list combiner before building the final YAML. | [{'protein': {'id': ['A', 'B'], 'sequence': 'ACDEFGHIKLMNPQRSTVWY', '_sequence_name': 'my_protein', 'msa': 'empty', 'modifications': [{'position': 5, 'ccd': 'SEP'}], 'cyclic': False}}] |
Important Notes¶
- Protein/DNA/RNA sequences: You must provide a sequence via 'sequence' or 'sequence_fasta'. If 'sequence' starts with '>' it is treated as FASTA and the header name is extracted.
- Ligands: Provide exactly one of 'ligand_smiles' or 'ligand_ccd' when entity_type is 'ligand'. Supplying both will raise an error.
- MSA handling (proteins): If neither 'msa_a3m' nor 'msa_content' is provided, the MSA is set to 'empty'.
- Chain IDs: If 'multiple_chains' is set, the 'id' field becomes a list of chain IDs; otherwise it is a single string.
- Modifications format: Each line must be 'position:ccd_code' with an integer position; invalid lines are ignored with a warning.
- Cyclic polymers: Setting 'cyclic' to true will add a 'cyclic' flag to the entity (polymer only).
Troubleshooting¶
- Error: 'Sequence is required for protein/dna/rna': Provide a non-empty 'sequence' or 'sequence_fasta'. If using FASTA, ensure it contains a header and sequence lines.
- Error: 'SMILES or CCD code is required for ligands': When entity_type='ligand', supply either 'ligand_smiles' or 'ligand_ccd'.
- Error: 'Provide either SMILES or CCD code, not both': Remove one of the ligand specifications so only one remains.
- Unexpected sequence name: If using 'sequence' with a FASTA header, the header’s first token becomes the sequence name. Adjust the header or use 'sequence_fasta' for clarity.
- Modifications ignored: Ensure each line follows 'position:ccd' and that 'position' is an integer (e.g., '12:MSE').
- Multiple chains not applied: Provide comma-separated chain IDs with no spaces or ensure you trim spaces (e.g., 'B,C' not 'B, C').