Boltz Protein Sequence¶
Creates a protein sequence object formatted for Boltz YAML workflows. It parses a provided protein FASTA, extracts the sequence and optional header as a sequence name, and attaches optional MSA, modifications, cyclic flag, and multiple chain IDs. Output is a ready-to-combine sequence payload for downstream Boltz YAML assembly and prediction.

Usage¶
Use this node when you need to define a protein sequence for Boltz-based structure prediction or affinity workflows. Typically, you will: 1) provide a protein FASTA (with or without a header), 2) optionally attach an MSA and modifications, 3) configure chain IDs (and duplicates if needed), and 4) feed the resulting output into Boltz List Combiner -> Boltz YAML Combiner -> Boltz Predict.
Inputs¶
| Field | Required | Type | Description | Example |
|---|---|---|---|---|
| chain_id | True | STRING | Primary chain identifier for this protein (single letter or string). | A |
| sequence | True | FASTA | Protein sequence in FASTA format. If a header is present (e.g., '>my_protein'), it will be used as the internal sequence name; only the sequence lines are kept. | >my_protein_A MKTLLILAVAAALAAGASA... (AA sequence lines) |
| msa | False | MSA | Optional multiple sequence alignment content in A3M or CSV format. Leave empty to indicate an 'empty' MSA. | >seq1 MKTLLI---AVAAALA >seq2 MKTLFIVAAAVAAALA |
| modifications | False | STRING | Optional residue-level modifications, one per line, using 'position:ccd_code' format. Positions are 1-based indices. | 5:CSO 12:MSE |
| cyclic | False | BOOLEAN | Set true if the protein is cyclic. | false |
| multiple_chains | False | STRING | Comma-separated list of additional chain IDs to create identical copies of this protein (e.g., for homomers). | B,C |
Outputs¶
| Field | Type | Description | Example |
|---|---|---|---|
| protein_sequence | * | A protein sequence object (as a list with one item) containing fields like id (single or list of chain IDs), sequence, optional _sequence_name, optional msa and _msa_format, optional modifications, and cyclic. | [{'protein': {'id': ['A', 'B'], 'sequence': 'MKTLLILAVAAALAAGASA...', '_sequence_name': 'my_protein_A', 'msa': 'empty', '_msa_format': 'a3m', 'modifications': [{'position': 5, 'ccd': 'CSO'}], 'cyclic': False}}] |
Important Notes¶
- Sequence header handling: If the FASTA header is present, it will be used as the internal sequence name; otherwise a default name is assigned.
- MSA handling: The node determines and stores the MSA format (e.g., 'a3m' or 'csv'); leave MSA empty to indicate an 'empty' MSA.
- Multiple chains: Provide comma-separated chain IDs to create identical copies of this protein (e.g., homomers).
- Modifications format: Use 'position:ccd_code' per line (e.g., '5:CSO'); invalid lines are ignored with warnings.
- Cyclic flag: Set 'cyclic' to true if the protein is a cyclic polymer.
- Validation: The protein sequence is required and cannot be empty; FASTA content is parsed to extract only sequence lines.
- Combining later: When assembling the final YAML, ensure all chain IDs across all sequences are unique to avoid validation errors in downstream nodes.
Troubleshooting¶
- Protein sequence is required: Ensure 'sequence' contains a valid FASTA. If you only have plain sequence text, include it as FASTA (with or without a header).
- Empty or whitespace sequence: Remove extra whitespace and verify the FASTA has sequence lines after any header.
- Invalid modifications: Confirm each line follows 'position:ccd_code' and positions are integers (1-based).
- Unexpected MSA errors: Provide A3M or CSV content; if unavailable, leave MSA empty to use 'empty' MSA.
- Duplicate chain IDs in later steps: If YAML combination fails due to duplicate IDs, adjust 'chain_id' or 'multiple_chains' here so all chains are unique in the final workflow.