Boltz Sequence Builder¶
Creates a single molecular sequence specification entry for use in Boltz YAML pipelines. Supports proteins, nucleic acids (DNA/RNA), and ligands, with options for chain IDs, multiple identical chains, MSAs, modifications, and cyclic polymers. Produces a structured sequence object encapsulated in a list, ready to be combined with other Boltz components.

Usage¶
Use this node to define one entity (protein, DNA, RNA, or ligand) in a Boltz workflow. For proteins/nucleic acids, provide the sequence as plain text or FASTA; for proteins, optionally include an MSA. For ligands, specify either a SMILES string or a CCD code. The output can be merged with other sequences or constraints and passed to downstream Boltz nodes that assemble YAML and run predictions.
Inputs¶
| Field | Required | Type | Description | Example | 
|---|---|---|---|---|
| entity_type | True | CHOICE | Type of molecular entity to create: protein, dna, rna, or ligand. | protein | 
| chain_id | True | STRING | Primary chain identifier for this entity. | A | 
| sequence | False | STRING | Sequence for protein/dna/rna. Accepts plain sequence or FASTA-formatted text; if FASTA, the header is used to derive a sequence name. | >MyProt MSTNPKPQRIT... | 
| sequence_fasta | False | FASTA | Sequence provided as a FASTA file/object for protein/dna/rna. Alternative to the sequence text input. | { "MyProt": ">MyProt\nMSTNPKPQRIT..." } | 
| msa_content | False | STRING | Multiple sequence alignment (MSA) content as a plain string for proteins. Leave empty to use an explicit 'empty' MSA. | >seq1\nMSTN-...\n>seq2\nMSAN-... | 
| msa_a3m | False | A3M | MSA in A3M format for proteins. Alternative to msa_content. | { "alignment.a3m": ">seq1\nMSTN-...\n>seq2\nMSAN-..." } | 
| ligand_smiles | False | STRING | SMILES string for ligand entities. Provide this or ligand_ccd, not both. | CC(=O)Oc1ccccc1C(=O)O | 
| ligand_ccd | False | STRING | Ligand CCD (Chemical Component Dictionary) code. Provide this or ligand_smiles, not both. | ATP | 
| modifications | False | STRING | Post-translational or chemical modifications for protein/dna/rna. One per line in the form position:ccd_code. | 5:SEP 12:MLZ | 
| cyclic | False | BOOLEAN | Marks the polymer as cyclic for protein/dna/rna. | false | 
| multiple_chains | False | STRING | Additional chain IDs for identical copies of this entity, comma-separated. The entity will be duplicated across these chain IDs. | B,C | 
Outputs¶
| Field | Type | Description | Example | 
|---|---|---|---|
| sequences | * | A list containing a single sequence specification dict for the chosen entity type, ready to be combined into Boltz YAML by downstream nodes. | [{ "protein": { "id": ["A","B"], "sequence": "MSTN...", "_sequence_name": "MyProt", "msa": "empty", "modifications": [{"position":5,"ccd":"SEP"}], "cyclic": false } }] | 
Important Notes¶
- Entity type drives required fields: protein/dna/rna require a sequence; ligand requires either ligand_smiles or ligand_ccd.
- FASTA support: If the sequence begins with '>', it is parsed as FASTA. The header (without commas) becomes _sequence_name and the sequence lines are concatenated.
- Protein MSAs: Provide msa_a3m or msa_content; if neither is provided, an explicit "empty" MSA is set.
- Ligand exclusivity: Do not supply both ligand_smiles and ligand_ccd; exactly one must be provided.
- Multiple chains: The id field becomes an array when multiple_chains is provided; otherwise it is a single chain ID string.
- Modifications format: Each line must be position:ccd_code with an integer position; invalid lines are ignored with a warning.
- Cyclic flag: Applies only to polymer entities (protein/dna/rna).
- Validation errors: Missing required values or conflicting ligand inputs will raise errors.
Troubleshooting¶
- Error: Sequence is required for protein/dna/rna: Provide sequence in either the sequence field or sequence_fasta.
- Error: SMILES or CCD code is required for ligands: Set ligand_smiles or ligand_ccd (but not both).
- Error: Provide either SMILES or CCD code, not both: Remove one of the two ligand inputs.
- Protein MSA not appearing: Ensure msa_a3m is a valid A3M object or msa_content is non-empty; otherwise it defaults to "empty".
- Modification ignored: Check the format is position:ccd_code and that position is an integer (e.g., 12:MLZ).
- Unexpected chain IDs: Trim spaces and separate multiple chains with commas only (e.g., B,C).