PDB To CIF¶
Converts one or more PDB structures to CIF format using BioPython. The node parses each PDB, writes an mmCIF representation, and applies a light post-processing step to improve atom-to-entity mappings based on detected protein chains.

Usage¶
Use this node when you need CIF (mmCIF) files from PDB inputs for downstream tools that prefer or require CIF format. Provide a dictionary mapping names to PDB content; the node returns a dictionary with the same keys mapped to CIF strings. Commonly placed after nodes that load or generate PDB and before nodes that require CIF.
Inputs¶
| Field | Required | Type | Description | Example |
|---|---|---|---|---|
| pdb | True | PDB | Dictionary of PDB structure content to convert. Keys are structure names (identifiers), values are PDB file contents as strings. | {'1ABC': 'HEADER ...\\nATOM 1 N MET A 1 ...\\nTER\\nEND\\n', 'design_model': 'ATOM 1 N GLY B 1 ...\\nEND\\n'} |
Outputs¶
| Field | Type | Description | Example |
|---|---|---|---|
| structure.cif | CIF | Dictionary of converted CIF structures. Keys mirror the input names; values are CIF (mmCIF) contents as strings. | {'1ABC': 'data_template\\nloop_\\n_atom_site.group_PDB ...', 'design_model': 'data_template\\nloop_\\n_atom_site.group_PDB ...'} |
Important Notes¶
- Input format: The input must be a non-empty dictionary {name: pdb_string}. Empty or whitespace-only entries are skipped.
- Dependencies: Requires BioPython's PDB modules (e.g., PDBParser, MMCIFIO) to be available.
- Entity mapping fix: After conversion, the node attempts to fix atom-to-entity label mappings for protein chains. This adjustment is heuristic and may not cover all edge cases.
- Protein chains only: Chain detection for mapping focuses on protein residues (standard amino acids). Non-protein chains may not receive adjusted entity IDs.
- Temporary files: The node writes temporary .pdb and .cif files during conversion and cleans them up afterward.
- Error handling: If any conversion fails, the node raises a ValueError with a message indicating which structure failed.
- Multi-structure support: Processes multiple structures in a single run and returns a dictionary of outputs with the same keys.
Troubleshooting¶
- Conversion failed for a structure: Ensure the PDB content is valid and complete. Malformed headers or atom lines can cause BioPython parsing errors.
- Module not found (Bio.PDB): Install BioPython in the environment and verify the relevant PDB modules are available.
- Empty output or missing keys: Check that input values are non-empty strings. Empty entries are skipped and may result in missing outputs.
- Incorrect entity IDs in CIF: The heuristic mapping might not fit specialized molecules or modified residues. Consider post-processing the CIF or adjusting inputs.
- Chain not detected: If a protein chain contains only non-standard residues, it may be missed by the protein-only filter, reducing mapping fixes.