Boltz (Deprecated)¶

Runs legacy Boltz structure prediction on protein sequences using MSA inputs. Generates multiple sampled structures and corresponding confidence/quality metrics. This node is deprecated; use Boltz-2 nodes for enhanced features.

Usage¶

Use this node only for legacy workflows that still depend on the original Boltz service. Provide MSA search results (A3M) for one or more sequences; the node will perform structure prediction with configurable recycling and diffusion sampling, returning ranked PDBs and confidence scores. For new projects, migrate to Boltz2StructurePredictionNode.

Inputs¶

Field	Required	Type	Description	Example
a3m	True	A3M	MSA search results as a dictionary mapping sequence IDs to A3M-formatted strings. Each entry will be processed to predict structures.	{'seq1': '>seq1\nAAAA...\n>seq1/2-100\n-AAA...', 'seq2': '>seq2\nMKTW...\n>seq2/align\n-MKT...'}
recycling_steps	True	INT	Number of recycling iterations during prediction. Higher values can improve accuracy but increase runtime.	10
diffusion_samples	True	INT	Number of structure samples to generate via diffusion for each input sequence. More samples yield more diverse predictions.	5
seed	True	INT	Base random seed. If multiple sequences are processed, each uses an incremented seed (seed + index).	42
mode	True	STRING	Execution mode: MOCK (uses predefined mock outputs), PROD (runs the live service), TEST (overrides key params for quick checks).	PROD

Outputs¶

Field	Type	Description	Example
folding.pdb	PDB	Ranked predicted structures as a dictionary mapping '_' to PDB content strings.	{'seq1_rank_001': 'MODEL 1\nATOM ...\nENDMDL\n', 'seq1_rank_002': 'MODEL 1\nATOM ...\nENDMDL\n'}
confidence_scores.json	JSON	Confidence and related quality metrics for each predicted structure, keyed by '_'.	{'seq1_rank_001': {'pLDDT_mean': 78.4, 'PAE': [[0.0, 1.2], [1.1, 0.0]]}, 'seq1_rank_002': {'pLDDT_mean': 74.2, 'PAE': [[0.0, 2.0], [2.1, 0.0]]}}

Important Notes¶

Deprecated: This node is deprecated. Prefer Boltz2StructurePredictionNode for YAML input support, ligands, DNA/RNA, potentials, and affinity prediction.
Mode behavior: TEST mode forces recycling_steps=1 and diffusion_samples=1 to reduce runtime. MOCK mode returns predefined mock results.
Multiple inputs: When multiple sequences are provided, the node processes each and increments the seed per sequence.
Performance: Increasing recycling_steps and diffusion_samples increases runtime and cost.
Input expectations: a3m must be a dict of A3M-formatted strings keyed by sequence IDs; each must contain a valid FASTA header and aligned sequences.

Troubleshooting¶

Empty or missing outputs: Ensure 'a3m' is a non-empty dict and each value is valid A3M text with a FASTA header line.
Service timeout or long runtime: Reduce diffusion_samples and recycling_steps, or test with mode='TEST' to validate the pipeline quickly.
Inconsistent IDs in results: Output keys are derived from input sequence IDs; verify your 'a3m' dict keys are unique and meaningful.
Unexpectedly few structures: Check diffusion_samples; in TEST mode it is forced to 1.
Reproducibility issues: Confirm the base seed and remember seeds are incremented per sequence (seed + index).
Mock data confusion: If using mode='MOCK', outputs are fixed from mock data and will not reflect your inputs.