Load PDB¶
Ingests a Protein Data Bank (PDB) structure as a raw text string and packages it into a structured PDB output for downstream nodes. The node labels the structure with a user-provided ID and outputs a dictionary mapping that ID to the PDB content. Designed to initialize workflows from directly pasted or programmatically generated PDB text.

Usage¶
Use this node when you have a PDB file's content as text (e.g., pasted from a file or fetched via an API) and want to introduce it into a workflow. Set a unique pdb_id to distinguish it from other structures, and match this ID to related sequence/MSA nodes when required by downstream components.
Inputs¶
| Field | Required | Type | Description | Example |
|---|---|---|---|---|
| pdb_string | True | STRING | The full PDB file content as a text string. Supports multiline input. Provide the entire PDB text (ATOM/HETATM records, headers as applicable). | HEADER PROTEIN STRUCTURE\nATOM 1 N MET A 1 11.104 13.207 10.234 1.00 20.00 N\n... |
| pdb_id | True | STRING | Identifier to assign to this PDB entry. Must be unique if multiple PDBs are used in one workflow. If a matching FASTA is used elsewhere in the workflow, its sequence ID should match this value. | my_protein |
Outputs¶
| Field | Type | Description | Example |
|---|---|---|---|
| structure.pdb | PDB | A structured PDB output represented as a dictionary mapping pdb_id to pdb_string. This format is expected by other biotech nodes and batch combiners. | {'my_protein': 'HEADER PROTEIN STRUCTURE\\nATOM ...'} |
Important Notes¶
- IDs must be unique: When using multiple PDBs in one workflow or batching, ensure each pdb_id is unique to avoid conflicts.
- ID matching with sequences: If you provide a related FASTA or other sequence input, its sequence ID should match the pdb_id for proper alignment in downstream steps.
- Input format: This node accepts raw PDB text, not a file path or URL.
- Output structure: The output is a dictionary of {pdb_id: pdb_string}, which downstream nodes rely on.
- Multiline input: Provide the full, multiline PDB content; partial content may cause downstream parsing failures.
Troubleshooting¶
- Duplicate ID error in batching: If later nodes report non-unique IDs, change pdb_id here to a unique value.
- Downstream parsing failures: If visualization or processing fails, verify that pdb_string is valid PDB format and not truncated.
- Mismatched IDs with FASTA/A3M: If alignment or pairing steps fail, ensure the pdb_id matches the sequence ID used in related nodes.
- Empty or minimal PDB content: Ensure pdb_string contains complete ATOM/HETATM records; empty or header-only inputs can break downstream tools.
Example Pipelines¶