Skip to content

Boltz (Deprecated)

Runs legacy Boltz structure prediction on protein sequences using MSA inputs. Generates multiple sampled structures and corresponding confidence/quality metrics. This node is deprecated; use Boltz-2 nodes for enhanced features.
Preview

Usage

Use this node only for legacy workflows that still depend on the original Boltz service. Provide MSA search results (A3M) for one or more sequences; the node will perform structure prediction with configurable recycling and diffusion sampling, returning ranked PDBs and confidence scores. For new projects, migrate to Boltz2StructurePredictionNode.

Inputs

FieldRequiredTypeDescriptionExample
a3mTrueA3MMSA search results as a dictionary mapping sequence IDs to A3M-formatted strings. Each entry will be processed to predict structures.{'seq1': '>seq1\nAAAA...\n>seq1/2-100\n-AAA...', 'seq2': '>seq2\nMKTW...\n>seq2/align\n-MKT...'}
recycling_stepsTrueINTNumber of recycling iterations during prediction. Higher values can improve accuracy but increase runtime.10
diffusion_samplesTrueINTNumber of structure samples to generate via diffusion for each input sequence. More samples yield more diverse predictions.5
seedTrueINTBase random seed. If multiple sequences are processed, each uses an incremented seed (seed + index).42
modeTrueSTRINGExecution mode: MOCK (uses predefined mock outputs), PROD (runs the live service), TEST (overrides key params for quick checks).PROD

Outputs

FieldTypeDescriptionExample
folding.pdbPDBRanked predicted structures as a dictionary mapping '_' to PDB content strings.{'seq1_rank_001': 'MODEL 1\nATOM ...\nENDMDL\n', 'seq1_rank_002': 'MODEL 1\nATOM ...\nENDMDL\n'}
confidence_scores.jsonJSONConfidence and related quality metrics for each predicted structure, keyed by '_'.{'seq1_rank_001': {'pLDDT_mean': 78.4, 'PAE': [[0.0, 1.2], [1.1, 0.0]]}, 'seq1_rank_002': {'pLDDT_mean': 74.2, 'PAE': [[0.0, 2.0], [2.1, 0.0]]}}

Important Notes

  • Deprecated: This node is deprecated. Prefer Boltz2StructurePredictionNode for YAML input support, ligands, DNA/RNA, potentials, and affinity prediction.
  • Mode behavior: TEST mode forces recycling_steps=1 and diffusion_samples=1 to reduce runtime. MOCK mode returns predefined mock results.
  • Multiple inputs: When multiple sequences are provided, the node processes each and increments the seed per sequence.
  • Performance: Increasing recycling_steps and diffusion_samples increases runtime and cost.
  • Input expectations: a3m must be a dict of A3M-formatted strings keyed by sequence IDs; each must contain a valid FASTA header and aligned sequences.

Troubleshooting

  • Empty or missing outputs: Ensure 'a3m' is a non-empty dict and each value is valid A3M text with a FASTA header line.
  • Service timeout or long runtime: Reduce diffusion_samples and recycling_steps, or test with mode='TEST' to validate the pipeline quickly.
  • Inconsistent IDs in results: Output keys are derived from input sequence IDs; verify your 'a3m' dict keys are unique and meaningful.
  • Unexpectedly few structures: Check diffusion_samples; in TEST mode it is forced to 1.
  • Reproducibility issues: Confirm the base seed and remember seeds are incremented per sequence (seed + index).
  • Mock data confusion: If using mode='MOCK', outputs are fixed from mock data and will not reflect your inputs.