Load PDB¶

Ingests a Protein Data Bank (PDB) structure as a raw text string and packages it into a structured PDB output for downstream nodes. The node labels the structure with a user-provided ID and outputs a dictionary mapping that ID to the PDB content. Designed to initialize workflows from directly pasted or programmatically generated PDB text.

Usage¶

Use this node when you have a PDB file's content as text (e.g., pasted from a file or fetched via an API) and want to introduce it into a workflow. Set a unique pdb_id to distinguish it from other structures, and match this ID to related sequence/MSA nodes when required by downstream components.

Inputs¶

Field	Required	Type	Description	Example
pdb_string	True	STRING	The full PDB file content as a text string. Supports multiline input. Provide the entire PDB text (ATOM/HETATM records, headers as applicable).	HEADER PROTEIN STRUCTURE\nATOM 1 N MET A 1 11.104 13.207 10.234 1.00 20.00 N\n...
pdb_id	True	STRING	Identifier to assign to this PDB entry. Must be unique if multiple PDBs are used in one workflow. If a matching FASTA is used elsewhere in the workflow, its sequence ID should match this value.	my_protein

Outputs¶

Field	Type	Description	Example
structure.pdb	PDB	A structured PDB output represented as a dictionary mapping pdb_id to pdb_string. This format is expected by other biotech nodes and batch combiners.	{'my_protein': 'HEADER PROTEIN STRUCTURE\\nATOM ...'}

Important Notes¶

IDs must be unique: When using multiple PDBs in one workflow or batching, ensure each pdb_id is unique to avoid conflicts.
ID matching with sequences: If you provide a related FASTA or other sequence input, its sequence ID should match the pdb_id for proper alignment in downstream steps.
Input format: This node accepts raw PDB text, not a file path or URL.
Output structure: The output is a dictionary of {pdb_id: pdb_string}, which downstream nodes rely on.
Multiline input: Provide the full, multiline PDB content; partial content may cause downstream parsing failures.

Troubleshooting¶

Duplicate ID error in batching: If later nodes report non-unique IDs, change pdb_id here to a unique value.
Downstream parsing failures: If visualization or processing fails, verify that pdb_string is valid PDB format and not truncated.
Mismatched IDs with FASTA/A3M: If alignment or pairing steps fail, ensure the pdb_id matches the sequence ID used in related nodes.
Empty or minimal PDB content: Ensure pdb_string contains complete ATOM/HETATM records; empty or header-only inputs can break downstream tools.