Skip to content

PDB Fixer

Replaces the residue types of a specific chain in one or more PDB structures with a provided FASTA sequence. The node keeps atom coordinates and residue numbering, modifying only the 3-letter residue names to match the FASTA sequence. If the FASTA length differs from the chain length, it replaces as many residues as possible and logs a warning.
Preview

Usage

Use this node when you need to substitute a placeholder chain (for example, a poly-Gly chain produced by design tools) with a target amino acid sequence while preserving the original PDB geometry. Typical workflow: load or generate a PDB, provide a FASTA sequence for the desired chain, set the target chain ID (e.g., 'B'), and feed the fixed PDB into downstream modeling or design nodes.

Inputs

FieldRequiredTypeDescriptionExample
pdbTruePDBPDB structure(s) to modify. Expects a dictionary mapping PDB IDs to PDB text content.{'template_1': 'ATOM 1 N GLY A 1 ...', 'template_2': 'ATOM 1 N GLY B 1 ...'}
fastaTrueFASTAFASTA sequence to apply to the target chain. Headers are ignored; only sequence lines are used. Whitespace is stripped.>target_chain_B MKTFFVAGL
target_chainTrueSTRINGSingle-character chain ID to replace (e.g., 'A', 'B').B

Outputs

FieldTypeDescriptionExample
fixed_pdbPDBDictionary of PDB structures with the target chain's residue names replaced according to the FASTA sequence while retaining coordinates and numbering.{'template_1': 'ATOM 1 N MET B 1 ...', 'template_2': 'ATOM 1 N LYS B 1 ...'}

Important Notes

  • Input format: The PDB input must be a non-empty dictionary {pdb_name: pdb_string}.
  • Target chain: Must be a single alphanumeric character matching the chain ID present in the PDB (position 22 in standard PDB lines).
  • FASTA parsing: Headers (lines starting with '>') are ignored; sequence lines are concatenated with whitespace removed.
  • Residue mapping: The node maps 1-letter amino acids to 3-letter codes (A→ALA, R→ARG, ..., U→SEC, O→PYL, X→UNK). Unknown letters are left as original residue names.
  • Length mismatch: If the FASTA length differs from the number of residues detected in the target chain, it replaces residues up to the shorter length and logs a warning.
  • Scope of changes: Only residue names are changed for ATOM/HETATM records of the target chain; coordinates, atom names, residue numbering, and all other records remain unchanged.
  • Per-residue logic: Replacement is based on residue sequence order inferred from residue number changes (unique residues by number). Multiple atoms for the same residue are updated consistently.
  • Multiple PDBs: Supports batch operation; each PDB entry in the input dictionary is processed independently.
  • Requirements: PDB lines must follow standard fixed-column formatting for chain ID (col 22) and residue number (cols 23–26).
  • Failure cases: The node raises an error if the PDB dict is empty/invalid, the chain is not found, or the FASTA contains no sequence.

Troubleshooting

  • Chain not found: Error like "Chain 'B' not found" indicates the target_chain does not exist in the input PDB. Verify the chain ID and that your PDB uses standard chain identifiers.
  • Empty or invalid FASTA: If you see "FASTA content cannot be empty" or "No sequence found in FASTA data", ensure your FASTA includes sequence lines beneath any header.
  • Length mismatch warning: If you get a warning about different lengths, confirm your FASTA matches the chain length. The node will still replace up to the shorter length.
  • Unknown amino acids: If warnings appear about unknown letters, replace nonstandard characters in the FASTA or note that those residues will keep original 3-letter names.
  • Unexpected output structure: If the output is not usable downstream, ensure the input was a dict of {id: pdb_content} and that your PDB lines maintain correct column formatting.