Skip to content

MSA Search

Runs a multiple sequence alignment (MSA) search for one or more protein sequences. It accepts FASTA-formatted input, queries a selected database preset, and returns alignments in A3M format keyed by sequence IDs. Supports production runs, a lightweight test mode, and a mock mode with predefined data.
Preview

Usage

Use this node to generate MSA results as a precursor to protein structure prediction or other downstream analyses that require A3M alignments. Provide one or more sequences in FASTA format, choose the database preset based on accuracy/speed needs, and select the appropriate mode (PROD for real runs, TEST for quick checks, MOCK for demos). Feed the resulting A3M output into folding or feature-combination nodes.

Inputs

FieldRequiredTypeDescriptionExample
fastaTrueFASTAFASTA-formatted sequences to run the MSA search on. Multiple sequences are supported and will be processed individually.>protein_1 MSEQNNTEMTFQIQRIYTKDISFEAPNAPHVFQKDW >protein_2 MGSSHHHHHHSSGLVPRGSHMASMTGGQQMGRGS
db_presetTrueENUMSelects the MSA database size and coverage. 'alphafold_toy_dataset' is small and fast for debugging, 'alphafold_reduced_dataset' balances speed and coverage, 'alphafold_full_dataset' offers maximum coverage but is slowest.alphafold_reduced_dataset
modeTrueENUMExecution mode. 'PROD' runs the actual service, 'TEST' uses a lightweight configuration for quick checks, and 'MOCK' returns predefined sample results.PROD

Outputs

FieldTypeDescriptionExample
msa.a3mA3MMSA search results in A3M format, returned as a dictionary mapping each input sequence ID to its A3M alignment.{'protein_1': '', 'protein_2': ''}

Important Notes

  • Database preset affects runtime: Larger presets provide better coverage but take significantly longer.
  • Test mode constraints: In TEST mode, parameters are adjusted for speed (e.g., smaller datasets), which may reduce coverage.
  • Mock mode: MOCK mode returns predefined data suitable for demos or pipeline wiring, not for scientific results.
  • Multiple sequences: If you provide multiple sequences in the FASTA input, each will be processed and returned separately, keyed by its FASTA header ID.
  • Downstream compatibility: Output is in A3M format and suitable for use in folding/prediction nodes.
  • Input formatting: Ensure each sequence has a unique FASTA header; malformed FASTA input may lead to empty or failed results.

Troubleshooting

  • Empty or missing alignments: Verify your FASTA formatting (headers begin with '>' and sequences are valid amino acid strings). Ensure each header is unique.
  • Long runtimes or timeouts: Use 'alphafold_toy_dataset' or 'alphafold_reduced_dataset', reduce the number of sequences, or run in TEST mode for quick validation.
  • Unexpected test-like results: Confirm that 'mode' is set to 'PROD' rather than 'TEST' or 'MOCK'.
  • Downstream node rejects output: Ensure you pass the entire A3M dictionary output and that downstream nodes expect A3M format.
  • Inconsistent IDs: Check that FASTA headers (sequence IDs) are correctly set; output keys mirror these IDs.