Skip to content

MSA Search

Runs Multiple Sequence Alignment (MSA) searches for one or more input protein sequences. It splits a multi-record FASTA into individual sequences, performs database searches, and returns alignments in A3M format keyed by sequence IDs. Supports MOCK (precomputed), PROD (full service), and TEST (toy dataset) modes.
Preview

Usage

Use this node to prepare A3M alignments before structure prediction nodes like Alphafold or Boltz. Provide one or more sequences in FASTA format. Choose a database preset based on speed vs. coverage needs. In TEST mode, it automatically uses a toy dataset for quick checks. The output maps each input sequence ID to its corresponding A3M alignment.

Inputs

FieldRequiredTypeDescriptionExample
fastaTrueFASTAProtein sequence(s) in FASTA format. If multiple sequences are provided, each record's ID will be used to key outputs.>seq1 MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQ... >seq2 GHHHHHHSSGLVPRGSHMASMTGGQQMGRGSM...
db_presetTrueCHOICEDatabase configuration for the MSA search. Choose based on desired speed and coverage.alphafold_reduced_dataset
modeTrueCHOICEExecution mode. MOCK uses precomputed data, PROD runs the actual search, TEST forces a toy dataset for quick validation.PROD

Outputs

FieldTypeDescriptionExample
msa.a3mA3MMSA results in A3M format as a dictionary mapping input sequence IDs to their A3M content.{"seq1": "...A3M content...", "seq2": "...A3M content..."}

Important Notes

  • Mode behavior: TEST mode overrides db_preset to the toy dataset regardless of input; MOCK returns predefined results for quick demos.
  • Timeout scaling: Runtime scales with the number of sequences and chosen database. Toy≈300s/seq, Reduced≈3000s/seq, Full≈12000s/seq timeouts.
  • Output keys: The output dictionary keys match the FASTA record IDs; ensure your IDs are unique and meaningful.
  • Workflow pairing: The resulting A3M is typically consumed by structure prediction nodes (e.g., Alphafold) that expect A3M per sequence ID.
  • Data handling: Multi-record FASTA inputs are split automatically and processed in batch.

Troubleshooting

  • Long runtimes or timeouts: Use alphafold_toy_dataset for quick checks or alphafold_reduced_dataset for faster runs; split large batches into smaller ones.
  • Empty or missing outputs: Verify FASTA formatting and that each record includes an ID and valid amino acid sequence.
  • Unexpected TEST results: Remember TEST mode forces the toy dataset; switch to PROD for real searches.
  • Inconsistent IDs: Ensure each FASTA record has a unique ID; outputs are keyed by these IDs.

Example Pipelines

Example
Example