Skip to content

Boltz Sequence Builder

Creates a single molecular sequence specification entry for use in Boltz YAML pipelines. Supports proteins, nucleic acids (DNA/RNA), and ligands, with options for chain IDs, multiple identical chains, MSAs, modifications, and cyclic polymers. Produces a structured sequence object encapsulated in a list, ready to be combined with other Boltz components.
Preview

Usage

Use this node to define one entity (protein, DNA, RNA, or ligand) in a Boltz workflow. For proteins/nucleic acids, provide the sequence as plain text or FASTA; for proteins, optionally include an MSA. For ligands, specify either a SMILES string or a CCD code. The output can be merged with other sequences or constraints and passed to downstream Boltz nodes that assemble YAML and run predictions.

Inputs

FieldRequiredTypeDescriptionExample
entity_typeTrueCHOICEType of molecular entity to create: protein, dna, rna, or ligand.protein
chain_idTrueSTRINGPrimary chain identifier for this entity.A
sequenceFalseSTRINGSequence for protein/dna/rna. Accepts plain sequence or FASTA-formatted text; if FASTA, the header is used to derive a sequence name.>MyProt MSTNPKPQRIT...
sequence_fastaFalseFASTASequence provided as a FASTA file/object for protein/dna/rna. Alternative to the sequence text input.{ "MyProt": ">MyProt\nMSTNPKPQRIT..." }
msa_contentFalseSTRINGMultiple sequence alignment (MSA) content as a plain string for proteins. Leave empty to use an explicit 'empty' MSA.>seq1\nMSTN-...\n>seq2\nMSAN-...
msa_a3mFalseA3MMSA in A3M format for proteins. Alternative to msa_content.{ "alignment.a3m": ">seq1\nMSTN-...\n>seq2\nMSAN-..." }
ligand_smilesFalseSTRINGSMILES string for ligand entities. Provide this or ligand_ccd, not both.CC(=O)Oc1ccccc1C(=O)O
ligand_ccdFalseSTRINGLigand CCD (Chemical Component Dictionary) code. Provide this or ligand_smiles, not both.ATP
modificationsFalseSTRINGPost-translational or chemical modifications for protein/dna/rna. One per line in the form position:ccd_code.5:SEP 12:MLZ
cyclicFalseBOOLEANMarks the polymer as cyclic for protein/dna/rna.false
multiple_chainsFalseSTRINGAdditional chain IDs for identical copies of this entity, comma-separated. The entity will be duplicated across these chain IDs.B,C

Outputs

FieldTypeDescriptionExample
sequences*A list containing a single sequence specification dict for the chosen entity type, ready to be combined into Boltz YAML by downstream nodes.[{ "protein": { "id": ["A","B"], "sequence": "MSTN...", "_sequence_name": "MyProt", "msa": "empty", "modifications": [{"position":5,"ccd":"SEP"}], "cyclic": false } }]

Important Notes

  • Entity type drives required fields: protein/dna/rna require a sequence; ligand requires either ligand_smiles or ligand_ccd.
  • FASTA support: If the sequence begins with '>', it is parsed as FASTA. The header (without commas) becomes _sequence_name and the sequence lines are concatenated.
  • Protein MSAs: Provide msa_a3m or msa_content; if neither is provided, an explicit "empty" MSA is set.
  • Ligand exclusivity: Do not supply both ligand_smiles and ligand_ccd; exactly one must be provided.
  • Multiple chains: The id field becomes an array when multiple_chains is provided; otherwise it is a single chain ID string.
  • Modifications format: Each line must be position:ccd_code with an integer position; invalid lines are ignored with a warning.
  • Cyclic flag: Applies only to polymer entities (protein/dna/rna).
  • Validation errors: Missing required values or conflicting ligand inputs will raise errors.

Troubleshooting

  • Error: Sequence is required for protein/dna/rna: Provide sequence in either the sequence field or sequence_fasta.
  • Error: SMILES or CCD code is required for ligands: Set ligand_smiles or ligand_ccd (but not both).
  • Error: Provide either SMILES or CCD code, not both: Remove one of the two ligand inputs.
  • Protein MSA not appearing: Ensure msa_a3m is a valid A3M object or msa_content is non-empty; otherwise it defaults to "empty".
  • Modification ignored: Check the format is position:ccd_code and that position is an integer (e.g., 12:MLZ).
  • Unexpected chain IDs: Trim spaces and separate multiple chains with commas only (e.g., B,C).