Skip to content

Boltz Sequence Builder

Creates a single molecular sequence entry for use in Boltz YAML workflows. Supports proteins, DNA, RNA, and ligands, including chain IDs, multiple identical chains, optional MSA for proteins, residue modifications, and cyclic polymers. Outputs a list containing one standardized sequence mapping ready to be combined into a full Boltz configuration.
Preview

Usage

Use this node to define each molecular entity in your system before assembling the final Boltz YAML. For proteins/DNA/RNA, provide the sequence (as raw string or FASTA). For ligands, provide either a SMILES string or a CCD code. If you have multiple identical chains (e.g., A,B), use the multiple_chains input. For proteins, you can attach MSA content (A3M file or raw string). Feed the resulting 'sequences' output into a list combiner and then into the Boltz YAML combiner.

Inputs

FieldRequiredTypeDescriptionExample
entity_typeTrueone of: protein \| dna \| rna \| ligandType of molecular entity to define.protein
chain_idTrueSTRINGPrimary chain identifier for this entity. If multiple copies are needed, add them via multiple_chains.A
sequenceFalseSTRINGSequence string for protein/DNA/RNA. Can be a raw sequence or FASTA text. If FASTA text is provided here, the header name will be used and the sequence extracted automatically.MKTAYIAKQRQISFVKSHFSRQDILD...
sequence_fastaFalseFASTASequence provided as a FASTA file/object (alternative to 'sequence'). The header name is stored, and the sequence is extracted.{'seq_proteinA': '>seq_proteinA\nMKTAYI...\n'}
msa_contentFalseSTRINGMSA content as a raw string (protein only). Leave empty to mark MSA as 'empty'. If 'msa_a3m' is provided, it takes precedence.>seq1/100-200 ACDE... >seq2/101-201 AC-E...
msa_a3mFalseA3MMSA in A3M format (protein only). If provided, overrides 'msa_content'.{'msaA': '>seq1\nACDE...\n>seq2\nAC-E...\n'}
ligand_smilesFalseSTRINGSMILES string for ligand entities. Provide this or 'ligand_ccd', not both.CC(=O)OC1=CC=CC=C1C(=O)O
ligand_ccdFalseSTRINGCCD (Chemical Component Dictionary) code for ligand entities. Provide this or 'ligand_smiles', not both.ATP
modificationsFalseSTRINGResidue-level modifications for protein/DNA/RNA. One per line in the form position:ccd_code.5:MSE 42:SEP
cyclicFalseBOOLEANMark the polymer as cyclic (applies to protein/DNA/RNA).false
multiple_chainsFalseSTRINGComma-separated list of additional chain IDs to create identical copies of this entity.B,C

Outputs

FieldTypeDescriptionExample
sequences*A list containing one standardized sequence mapping for the specified entity. Includes chain id(s) and the appropriate fields (sequence or ligand spec, MSA if protein, modifications, cyclic).[{'protein': {'id': ['A', 'B'], 'sequence': 'MKTAYIAKQRQISFVKSHFSRQDILD...', '_sequence_name': 'protein_sequence', 'msa': 'empty', 'modifications': [{'position': 5, 'ccd': 'MSE'}], 'cyclic': False}}]

Important Notes

  • Sequence requirement: For protein/DNA/RNA, you must provide a sequence via 'sequence' or 'sequence_fasta'.
  • Ligand exclusivity: For ligands, provide exactly one of 'ligand_smiles' or 'ligand_ccd'—not both.
  • Multiple chains: Use 'multiple_chains' for identical copies (e.g., A with additional B,C). All listed IDs will be included under a single entity.
  • MSA handling: If no MSA is provided for proteins, the node sets MSA to the literal value 'empty'.
  • FASTA headers: When a FASTA is provided, the header name (without commas) is stored as '_sequence_name'.
  • Modifications format: Each modification line must be 'position:ccd_code' with a valid integer position.
  • Downstream uniqueness: Final YAML requires unique chain IDs across all sequences; ensure IDs don’t conflict when combining multiple entities.

Troubleshooting

  • Error: 'Sequence is required for protein/dna/rna': Provide a valid sequence string or FASTA via 'sequence' or 'sequence_fasta'.
  • Error: 'SMILES or CCD code is required for ligands': Set 'ligand_smiles' or 'ligand_ccd' (exactly one).
  • Error: 'Provide either SMILES or CCD code, not both': Remove one of the ligand fields so only a single specification remains.
  • Invalid modification format warning: Ensure each line in 'modifications' follows 'position:ccd_code' with a numeric position (e.g., '42:SEP').
  • Unexpected FASTA in 'sequence': If pasting FASTA into 'sequence', the node will extract header and sequence automatically. Verify the header is correct and the lines contain only sequence characters.
  • Multiple chains not applied: Confirm 'multiple_chains' uses comma-separated IDs without spaces (e.g., 'B,C').
  • Downstream duplicate chain ID error: When later combining into YAML, ensure this node’s chain IDs do not overlap with other entities’ chain IDs.