Skip to content

MSA Search

Runs multiple sequence alignment (MSA) searches for one or more protein sequences. Supports different database presets (toy, reduced, full) to balance speed and coverage, and modes for production, mock results, and quick testing. Returns results in A3M format keyed by input sequence IDs.
Preview

Usage

Use this node as the first step in a protein structure prediction workflow to generate A3M MSAs from input FASTA sequences. Typical flow: provide FASTA(s) -> run MSA Search -> feed A3M output into folding nodes (e.g., Alphafold). Choose db_preset based on your accuracy/time needs and mode for development vs production runs.

Inputs

FieldRequiredTypeDescriptionExample
fastaTrueFASTAOne or more protein sequences in FASTA format. If multiple records are provided, each record ID must be unique; outputs will be keyed by these IDs.>seq1 MKTAYIAKQRQISFVKSHFSRQ >seq2 GAVLIPFYWSTCMNQDEKHR
db_presetTrueSTRINGMSA database preset to search. Options: alphafold_toy_dataset (small, fast), alphafold_reduced_dataset (balanced), alphafold_full_dataset (comprehensive, slow).alphafold_reduced_dataset
modeTrueSTRINGExecution mode. MOCK: returns predefined sample results. PROD: runs actual MSA search service. TEST: overrides settings for quick runs on a toy dataset.PROD

Outputs

FieldTypeDescriptionExample
msa.a3mA3MMSA results as a dictionary mapping each input sequence ID to its A3M-formatted alignment.{"seq1": "# A3M content...\n>seq1\nMKTAYI-...", "seq2": "# A3M content..."}

Important Notes

  • In TEST mode, db_preset is automatically set to alphafold_toy_dataset to speed up runs.
  • When multiple sequences are provided in FASTA, the output is a dict keyed by the original FASTA record IDs; ensure IDs are unique and meaningful.
  • Runtime scales with both the number of sequences and the chosen database: toy (fast), reduced (moderate), full (slow).
  • MOCK mode returns precomputed mock data and does not perform a real search.
  • Large database presets can lead to long execution times; plan timeouts and batching accordingly.

Troubleshooting

  • MSA search takes too long: use alphafold_toy_dataset or alphafold_reduced_dataset, or reduce the number/length of input sequences.
  • Empty or missing results for a sequence: verify the FASTA formatting and that the sequence ID is unique and non-empty.
  • Service timeout: try fewer sequences, switch to a smaller db_preset, or rerun in TEST/MOCK for quick validation.
  • Invalid FASTA input: ensure sequences use standard FASTA format with header lines starting with '>' and valid amino acid characters.