MSA Search¶
Runs a multiple sequence alignment (MSA) search for one or more protein sequences. It accepts FASTA-formatted input, queries a selected database preset, and returns alignments in A3M format keyed by sequence IDs. Supports production runs, a lightweight test mode, and a mock mode with predefined data.

Usage¶
Use this node to generate MSA results as a precursor to protein structure prediction or other downstream analyses that require A3M alignments. Provide one or more sequences in FASTA format, choose the database preset based on accuracy/speed needs, and select the appropriate mode (PROD for real runs, TEST for quick checks, MOCK for demos). Feed the resulting A3M output into folding or feature-combination nodes.
Inputs¶
| Field | Required | Type | Description | Example |
|---|---|---|---|---|
| fasta | True | FASTA | FASTA-formatted sequences to run the MSA search on. Multiple sequences are supported and will be processed individually. | >protein_1 MSEQNNTEMTFQIQRIYTKDISFEAPNAPHVFQKDW >protein_2 MGSSHHHHHHSSGLVPRGSHMASMTGGQQMGRGS |
| db_preset | True | ENUM | Selects the MSA database size and coverage. 'alphafold_toy_dataset' is small and fast for debugging, 'alphafold_reduced_dataset' balances speed and coverage, 'alphafold_full_dataset' offers maximum coverage but is slowest. | alphafold_reduced_dataset |
| mode | True | ENUM | Execution mode. 'PROD' runs the actual service, 'TEST' uses a lightweight configuration for quick checks, and 'MOCK' returns predefined sample results. | PROD |
Outputs¶
| Field | Type | Description | Example |
|---|---|---|---|
| msa.a3m | A3M | MSA search results in A3M format, returned as a dictionary mapping each input sequence ID to its A3M alignment. | {'protein_1': ' |
Important Notes¶
- Database preset affects runtime: Larger presets provide better coverage but take significantly longer.
- Test mode constraints: In TEST mode, parameters are adjusted for speed (e.g., smaller datasets), which may reduce coverage.
- Mock mode: MOCK mode returns predefined data suitable for demos or pipeline wiring, not for scientific results.
- Multiple sequences: If you provide multiple sequences in the FASTA input, each will be processed and returned separately, keyed by its FASTA header ID.
- Downstream compatibility: Output is in A3M format and suitable for use in folding/prediction nodes.
- Input formatting: Ensure each sequence has a unique FASTA header; malformed FASTA input may lead to empty or failed results.
Troubleshooting¶
- Empty or missing alignments: Verify your FASTA formatting (headers begin with '>' and sequences are valid amino acid strings). Ensure each header is unique.
- Long runtimes or timeouts: Use 'alphafold_toy_dataset' or 'alphafold_reduced_dataset', reduce the number of sequences, or run in TEST mode for quick validation.
- Unexpected test-like results: Confirm that 'mode' is set to 'PROD' rather than 'TEST' or 'MOCK'.
- Downstream node rejects output: Ensure you pass the entire A3M dictionary output and that downstream nodes expect A3M format.
- Inconsistent IDs: Check that FASTA headers (sequence IDs) are correctly set; output keys mirror these IDs.