Skip to content

Boltz YAML Combiner

This node assembles Boltz-ready configuration data by merging sequences, constraints, templates, and properties into a single Boltz YAML object alongside its auxiliary file bundle. It validates chain IDs, enforces the expected Boltz schema per entity type, deduplicates MSAs for identical protein sequences, and converts inlined MSA and template content into file references with generated filenames. The output is designed for direct consumption by Boltz prediction nodes.
Preview

Usage

Place BoltzYAMLCombinerNode after all sequence and configuration-building nodes in a Boltz workflow. Upstream, you typically create sequences via BoltzSequenceNode or BoltzProteinSequenceNode (for proteins/nucleic acids) and BoltzLigandSequenceNode (for ligands), then optionally define structural rules with BoltzConstraintNode, structural templates with BoltzTemplateNode, and prediction targets with BoltzPropertyNode. You can use BoltzListCombinerNode to aggregate multiple such items into lists for each category and feed these lists into the sequences, constraints, templates, and properties inputs. BoltzYAMLCombinerNode verifies that every chain ID appears only once across all sequences, that mandatory fields like sequence (for polymers) and smiles/ccd (for ligands) are correctly set, and that the overall YAML meets Boltz’s expectations. It then generates file names like msa_1.a3m or template_1.pdb, stores the corresponding contents in boltz_files, and replaces the inlined data with these filenames in boltz_yaml. Finally, connect both outputs (boltz_yaml and boltz_files) to BoltzPredictNode or equivalent execution nodes to actually run Boltz.

Inputs

FieldRequiredTypeDescriptionExample
sequencesTrue*List (or single object) describing all molecular entities in the Boltz run. Each list element is a dict mapping one entity type ("protein", "dna", "rna", or "ligand") to its data dict. Polymer entities must include an `id` (either a single chain ID like "A" or a list like ["A", "B"]) and a `sequence`. Ligands must include an `id` plus exactly one of `smiles` or `ccd`. For protein entries, optional `msa` (alignment content) and `_msa_format` (e.g. "a3m" or "csv") can be provided. All chain IDs across all entities must be unique.[{"protein": {"id": "A", "sequence": "MGSDKIYQ...", "msa": "raw-a3m-content", "_msa_format": "a3m"}}, {"ligand": {"id": "L", "smiles": "CC(=O)Oc1ccccc1C(=O)O"}}]
constraintsFalse*Optional single constraint object or list of constraint objects describing structural constraints for Boltz. These should be generated by BoltzConstraintNode and typically look like {"bond": {...}}, {"pocket": {...}}, or {"contact": {...}}. You can pass them directly or via BoltzListCombinerNode. If omitted, no constraints are included in the YAML.[{"pocket": {"binder": "L", "contacts": [["A", 45], ["A", 50]], "max_distance": 6.0, "force": true}}]
templatesFalse*Optional single template object or list of template objects, usually from BoltzTemplateNode. Each object must include either a `pdb` key or a `cif` key whose value is the full text content of the structure file. Optional fields like `chain_id`, `template_id`, `force`, and `threshold` may also be present. The node will replace the raw structure content with generated filenames (for example `template_1.pdb`) and store the content in `boltz_files`.[{"pdb": "ATOM 1 N MET A 1 ...", "chain_id": ["A"], "template_id": ["T1"], "force": true, "threshold": 2.5}]
propertiesFalse*Optional single property object or list of property objects, typically from BoltzPropertyNode. These describe which properties Boltz should predict, for example an affinity specification like {"affinity": {"binder": "L"}}. If omitted, the YAML will not include a `properties` section.[{"affinity": {"binder": "L"}}]

Outputs

FieldTypeDescriptionExample
boltz_yamlBOLTZ_YAMLA structured Boltz YAML configuration object containing at least `version` and `sequences`, and optionally `constraints`, `templates`, and `properties`. All chains and entities are validated, ligands are checked for proper `smiles` or `ccd` usage, and protein entries reference MSA filenames instead of raw alignment content. Template entries reference generated PDB or CIF filenames instead of inline structure text.{ "version": 1, "sequences": [ {"protein": {"id": "A", "sequence": "MGSDKIYQ...", "msa": "msa_1.a3m"}}, {"ligand": {"id": "L", "smiles": "CC(=O)Oc1ccccc1C(=O)O"}} ], "constraints": [ {"pocket": {"binder": "L", "contacts": [["A", 45], ["A", 50]], "max_distance": 6.0, "force": true}} ], "templates": [ {"pdb": "template_2.pdb", "chain_id": ["A"], "template_id": ["T1"], "force": true, "threshold": 2.5} ], "properties": [ {"affinity": {"binder": "L"}} ] }
boltz_filesBOLTZ_FILESA mapping of generated auxiliary filenames to their actual file contents. This includes MSA files for protein sequences (for example `msa_1.a3m` or `msa_2.csv`) and template structure files (for example `template_1.pdb` or `template_2.cif`). These files must be supplied along with `boltz_yaml` to any Boltz execution node so that the filenames referenced in the YAML can be resolved.{ "msa_1.a3m": "\n>seqA\nMGSDKIYQ...\n>homolog1\nMGADKIYQ...\n", "template_2.pdb": "ATOM 1 N MET A 1 ...\nTER\nEND\n" }

Important Notes

  • Performance: The node deep-copies all sequence, constraint, template, and property inputs to avoid mutating cached outputs, which can incur overhead when working with very large MSAs or many templates.
  • Limitations: All chain IDs across all entities must be globally unique; reusing a chain ID (such as "A") in more than one sequence entry will cause a validation error and block YAML generation.
  • Behavior: When multiple protein chains share exactly the same amino acid sequence, their MSAs are automatically deduplicated so they point to a shared MSA file; this satisfies Boltz constraints while reducing redundant data.
  • Behavior: Raw MSA and template contents are replaced with generated filenames in boltz_yaml, with the actual contents stored only in boltz_files; any downstream consumer must use both outputs together.
  • Limitations: Ligand entities must contain exactly one of smiles or ccd; providing both, or neither, for a ligand will fail validation in this node.
  • Behavior: If sequences is provided as a single object rather than a list, it is automatically wrapped into a one-element list internally but must still conform to all structural and validation rules.

Troubleshooting

  • Error: Duplicate chain ID found: 'A'. All chain IDs must be unique.: This means at least two sequences share the same chain ID. Adjust upstream sequence nodes so each chain (e.g. A, B, C, L) is used only once across all protein, nucleic acid, and ligand entities.
  • Error: Ligand sequence N must have either 'smiles' or 'ccd' field / cannot have both.: A ligand entry is missing its chemical specification or defines both forms. In the upstream ligand node, provide exactly one of ligand_smiles or ligand_ccd and clear the other field.
  • Error: At least one sequence is required.: The sequences input is empty or not connected. Ensure you connect the output from a sequence-building node (or from BoltzListCombinerNode) into the sequences input and that it actually contains at least one entity.
  • Error: Sequence N must be a mapping of entity_type -> entity_data / Invalid entity type 'X'.: The structure of the sequence object is wrong or contains unsupported keys. Check that each sequence element is a dict with exactly one of the allowed keys: protein, dna, rna, or ligand, and that custom nodes upstream preserve this schema.
  • Error: Missing 'sequence' field in protein/dna/rna sequence N.: A polymer entry lacks its primary sequence. Confirm that upstream sequence nodes received valid data (FASTA or plain sequence) and that no custom processing removed the sequence field.
  • Downstream error: MSA or template file not found when running Boltz.: This usually indicates that boltz_files was not forwarded to the execution node or filenames in the YAML were altered. Always pass both boltz_yaml and boltz_files outputs together and avoid modifying the generated filenames.