Skip to content

Boltz Sequence Builder

Constructs a single molecular sequence entry for Boltz YAML configurations. Supports protein, DNA, RNA, and ligand entities, with optional MSA, cyclic flags, chain duplication, and residue modifications. FASTA input is parsed to extract both the sequence and an optional sequence name.
Preview

Usage

Use this node to define one chain or a set of identical chains for a molecular entity before assembling a full Boltz YAML. For proteins/DNA/RNA, provide a sequence (plain or FASTA); for ligands, specify either a SMILES string or a CCD code. Connect the output to a list combiner and then to the Boltz YAML combiner node to build the complete configuration.

Inputs

FieldRequiredTypeDescriptionExample
entity_typeTrueCHOICEType of molecular entity to define.protein
chain_idTrueSTRINGPrimary chain identifier for this entity. If multiple copies are needed, use multiple_chains to add more IDs.A
sequenceFalseSTRINGAmino acid or nucleotide sequence for protein/DNA/RNA. Supports plain text or FASTA format (header lines starting with '>'). Ignored for ligands.>my_protein\nMKTLLILAVVAAASA...
sequence_fastaFalseFASTASequence provided as a FASTA file-like input (alternative to 'sequence'). The header will be used to set an internal sequence name.{'my_seq.fasta': '>protA\\nMKTLLILAVVAAASA...'}
msa_contentFalseSTRINGMSA content in string form for proteins. If not provided and no A3M is given, an 'empty' MSA is set. Ignored for non-proteins.>seq1\nMKTLLI-AVV\n>seq2\nMKTL-LVAVV
msa_a3mFalseA3MMSA in A3M format for proteins (alternative to 'msa_content'). If provided, it takes precedence over 'msa_content'.{'alignment.a3m': '>seq1\\nMKTLLI-AVV\\n>seq2\\nMKTL-LVAVV'}
ligand_smilesFalseSTRINGSMILES string for ligand entities. Required if entity_type is ligand and 'ligand_ccd' is not provided.CC(=O)Oc1ccccc1C(=O)O
ligand_ccdFalseSTRINGCCD code for ligand entities. Required if entity_type is ligand and 'ligand_smiles' is not provided.HEM
modificationsFalseSTRINGResidue-level modifications in the format 'position:ccd_code', one per line. Applies to polymers (protein/DNA/RNA).5:PTR\n12:MSE
cyclicFalseBOOLEANMark the polymer as cyclic (applies to protein/DNA/RNA).false
multiple_chainsFalseSTRINGComma-separated additional chain IDs to represent identical copies of this entity.B,C

Outputs

FieldTypeDescriptionExample
sequences*A list containing one sequence mapping for Boltz YAML. The mapping includes entity type keys ('protein', 'dna', 'rna', or 'ligand') with fields such as 'id', 'sequence' or 'smiles/ccd', optional 'msa', 'modifications', and 'cyclic'.[{'protein': {'id': ['A', 'B'], 'sequence': 'MKTLLILAVVAAASA...', '_sequence_name': 'protA', 'msa': 'empty', 'modifications': [{'position': 5, 'ccd': 'PTR'}], 'cyclic': False}}]

Important Notes

  • Sequence requirement: For protein/DNA/RNA, a sequence must be provided via 'sequence' or 'sequence_fasta'; otherwise the node raises an error.
  • Ligand specification: For ligands, provide exactly one of 'ligand_smiles' or 'ligand_ccd'—using both causes an error.
  • MSA handling: For proteins, if neither 'msa_a3m' nor 'msa_content' is given, the node sets 'msa' to 'empty'.
  • FASTA parsing: If input begins with a FASTA header ('>'), the header’s first token becomes the internal sequence name and the sequence lines are concatenated.
  • Multiple chains: 'multiple_chains' duplicates the same entity across additional chain IDs; the 'id' field becomes a list of chain IDs.
  • Modifications format: Each line must be 'position:ccd_code' with a valid integer position; invalid lines are ignored with a warning.

Troubleshooting

  • Error: Sequence is required for protein/dna/rna: Provide a non-empty 'sequence' or 'sequence_fasta'.
  • Error: Provide either SMILES or CCD code, not both: For ligand entities, remove one of 'ligand_smiles' or 'ligand_ccd' so only one is set.
  • Unexpected empty sequence after FASTA input: Ensure the FASTA includes sequence lines below the header and that line breaks are preserved.
  • Modifications not applied: Check each line is in 'position:ccd_code' format and that positions are integers.
  • MSA not recognized: When supplying A3M as a file-like input, ensure it is provided as a keyed object (e.g., {'alignment.a3m': ''}); otherwise use 'msa_content' as plain text.