MSA Search¶

Runs Multiple Sequence Alignment (MSA) searches for one or more input protein sequences. It splits a multi-record FASTA into individual sequences, performs database searches, and returns alignments in A3M format keyed by sequence IDs. Supports MOCK (precomputed), PROD (full service), and TEST (toy dataset) modes.

Usage¶

Use this node to prepare A3M alignments before structure prediction nodes like Alphafold or Boltz. Provide one or more sequences in FASTA format. Choose a database preset based on speed vs. coverage needs. In TEST mode, it automatically uses a toy dataset for quick checks. The output maps each input sequence ID to its corresponding A3M alignment.

Inputs¶

Field	Required	Type	Description	Example
fasta	True	FASTA	Protein sequence(s) in FASTA format. If multiple sequences are provided, each record's ID will be used to key outputs.	>seq1 MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQ... >seq2 GHHHHHHSSGLVPRGSHMASMTGGQQMGRGSM...
db_preset	True	CHOICE	Database configuration for the MSA search. Choose based on desired speed and coverage.	alphafold_reduced_dataset
mode	True	CHOICE	Execution mode. MOCK uses precomputed data, PROD runs the actual search, TEST forces a toy dataset for quick validation.	PROD

Outputs¶

Field	Type	Description	Example
msa.a3m	A3M	MSA results in A3M format as a dictionary mapping input sequence IDs to their A3M content.	{"seq1": "...A3M content...", "seq2": "...A3M content..."}

Important Notes¶

Mode behavior: TEST mode overrides db_preset to the toy dataset regardless of input; MOCK returns predefined results for quick demos.
Timeout scaling: Runtime scales with the number of sequences and chosen database. Toy≈300s/seq, Reduced≈3000s/seq, Full≈12000s/seq timeouts.
Output keys: The output dictionary keys match the FASTA record IDs; ensure your IDs are unique and meaningful.
Workflow pairing: The resulting A3M is typically consumed by structure prediction nodes (e.g., Alphafold) that expect A3M per sequence ID.
Data handling: Multi-record FASTA inputs are split automatically and processed in batch.

Troubleshooting¶

Long runtimes or timeouts: Use alphafold_toy_dataset for quick checks or alphafold_reduced_dataset for faster runs; split large batches into smaller ones.
Empty or missing outputs: Verify FASTA formatting and that each record includes an ID and valid amino acid sequence.
Unexpected TEST results: Remember TEST mode forces the toy dataset; switch to PROD for real searches.
Inconsistent IDs: Ensure each FASTA record has a unique ID; outputs are keyed by these IDs.