Skip to content

MSA Search

Runs multiple sequence alignment (MSA) searches for one or more protein sequences. It queries managed databases, merges results from multiple sources, and returns per-sequence alignments in A3M format. Supports MOCK, PROD, and TEST modes, including automatic format conversion and result consolidation.
Preview

Usage

Use this node at the start of a protein structure prediction workflow to generate MSAs from input FASTA sequences. Select a database preset based on speed vs. coverage needs (toy, reduced, or full). Feed the resulting A3M output into downstream folding/prediction nodes.

Inputs

FieldRequiredTypeDescriptionExample
fastaTrueFASTAOne or more protein sequences in FASTA format. Each sequence will be processed independently and returned as a separate A3M entry.>seq1 MKTAYIAKQRQISFVKSHFSRQDILDLWQ >seq2 GHHHHHHSSGVDLGTENLYFQSMASMTGGQQ
db_presetTrueSTRING (enum)Which database set to search. Allowed values: alphafold_toy_dataset (fast, minimal), alphafold_reduced_dataset (balanced), alphafold_full_dataset (maximum coverage).alphafold_reduced_dataset
modeTrueSTRING (enum)Execution mode. MOCK uses bundled mock data, PROD calls the live service, TEST forces minimal settings for quick checks.PROD

Outputs

FieldTypeDescriptionExample
msa.a3mMSADictionary mapping each input sequence ID to its merged A3M alignment text.{"seq1": "# A3M content...\n>seq1\nMKTAYI-...", "seq2": ">seq2\nG-HHHS..."}

Important Notes

  • In TEST mode, the database preset is automatically overridden to alphafold_toy_dataset for faster turnaround.
  • Timeouts scale with the number of input sequences and the selected database preset (toy < reduced < full). Very large batches can take a long time.
  • Results from multiple sources are merged: Stockholm-formatted MSAs are aggregated and converted to A3M per sequence.
  • If a source returns A3M, it is internally converted to Stockholm before merging to ensure consistent processing.
  • MOCK mode serves precomputed responses from bundled mock data and does not perform real searches.
  • Ensure unique FASTA headers; output keys mirror the input sequence IDs.
  • The node returns an A3M dictionary even if some sequences have no hits; such entries may be empty.

Troubleshooting

  • Empty or missing A3M for a sequence: Verify the sequence ID is unique and valid. Try a larger database preset (reduced or full) for better coverage.
  • Timeouts or job takes too long: Reduce batch size (fewer sequences per run), use alphafold_toy_dataset or alphafold_reduced_dataset, or run sequences in separate executions.
  • Unexpected MOCK-like results: Confirm mode is set to PROD (not MOCK or TEST).
  • Downstream node rejects output: Ensure you're passing the MSA (A3M) dictionary directly; do not modify keys or formats.
  • Mismatched sequence IDs: Check FASTA headers for duplicates or unusual characters; headers become dictionary keys in the output.