Skip to content

Load PDB

Ingests a Protein Data Bank (PDB) structure as a raw text string and packages it into a structured PDB output for downstream nodes. The node labels the structure with a user-provided ID and outputs a dictionary mapping that ID to the PDB content. Designed to initialize workflows from directly pasted or programmatically generated PDB text.
Preview

Usage

Use this node when you have a PDB file's content as text (e.g., pasted from a file or fetched via an API) and want to introduce it into a workflow. Set a unique pdb_id to distinguish it from other structures, and match this ID to related sequence/MSA nodes when required by downstream components.

Inputs

FieldRequiredTypeDescriptionExample
pdb_stringTrueSTRINGThe full PDB file content as a text string. Supports multiline input. Provide the entire PDB text (ATOM/HETATM records, headers as applicable).HEADER PROTEIN STRUCTURE\nATOM 1 N MET A 1 11.104 13.207 10.234 1.00 20.00 N\n...
pdb_idTrueSTRINGIdentifier to assign to this PDB entry. Must be unique if multiple PDBs are used in one workflow. If a matching FASTA is used elsewhere in the workflow, its sequence ID should match this value.my_protein

Outputs

FieldTypeDescriptionExample
structure.pdbPDBA structured PDB output represented as a dictionary mapping pdb_id to pdb_string. This format is expected by other biotech nodes and batch combiners.{'my_protein': 'HEADER PROTEIN STRUCTURE\\nATOM ...'}

Important Notes

  • IDs must be unique: When using multiple PDBs in one workflow or batching, ensure each pdb_id is unique to avoid conflicts.
  • ID matching with sequences: If you provide a related FASTA or other sequence input, its sequence ID should match the pdb_id for proper alignment in downstream steps.
  • Input format: This node accepts raw PDB text, not a file path or URL.
  • Output structure: The output is a dictionary of {pdb_id: pdb_string}, which downstream nodes rely on.
  • Multiline input: Provide the full, multiline PDB content; partial content may cause downstream parsing failures.

Troubleshooting

  • Duplicate ID error in batching: If later nodes report non-unique IDs, change pdb_id here to a unique value.
  • Downstream parsing failures: If visualization or processing fails, verify that pdb_string is valid PDB format and not truncated.
  • Mismatched IDs with FASTA/A3M: If alignment or pairing steps fail, ensure the pdb_id matches the sequence ID used in related nodes.
  • Empty or minimal PDB content: Ensure pdb_string contains complete ATOM/HETATM records; empty or header-only inputs can break downstream tools.

Example Pipelines

Example
Example