OpenMM PDB Fixer¶

Usage¶
Use this node early in a molecular simulation workflow whenever your starting structure comes from an experimental PDB file or any source that may contain incomplete or non-standard topology. It is especially useful for multi-chain proteins, antibodies, structures with chain breaks, missing terminal atoms, alternate conformations, or modified residues such as MSE that often cause downstream force-field setup to fail.
Typical workflow position: connect a structure-producing node such as a PDB loader into OpenMM PDB Fixer, then send its fixed.pdb output into OpenMM Solvate. From there, the usual sequence is OpenMM ForceField Config → OpenMM Solvate → OpenMM Energy Minimize → OpenMM Simulate. If your structure contains ligands or other non-water heterogens that must be preserved, pair this node with OpenMM Ligand Parameters and use heterogen_mode set to preserve so downstream OpenMM nodes can parameterize those residues correctly.
Best practices: keep the default error_if_present mode unless you are certain you want to discard heterogens or you already have ligand parameterization prepared. Use replace_nonstandard_residues=true for most crystal structures to improve compatibility with standard protein force fields. Leave keep_water=false in most cases, because explicit solvent is usually added later by OpenMM Solvate and input crystallographic waters are not typically needed for general-purpose simulation pipelines.
Inputs¶
| Field | Required | Type | Description | Example |
|---|---|---|---|---|
| pdb | True | PDB | One or more PDB structures to repair. The node expects a PDB collection keyed by structure ID, and processes each entry individually. | {"7KVE":"HEADER ANTIBODY COMPLEX ...\nATOM ...\nEND"} |
| pH | False | FLOAT | pH value used when assigning protonation states during hydrogen addition. Valid range is 0.0 to 14.0; physiological workflows typically use 7.4. | 7.4 |
| heterogen_mode | False | STRING | Controls treatment of non-water HETATM residues. `error_if_present` stops execution if ligands, cofactors, or metals are present; `remove_nonwater` strips non-water heterogens; `preserve` keeps them for downstream ligand-aware workflows. | preserve |
| keep_water | False | BOOLEAN | Whether to retain water molecules from the input structure. This setting only matters when `heterogen_mode` is `remove_nonwater`. In most explicit-solvent workflows, input waters are not needed because solvent is added later. | false |
| replace_nonstandard_residues | False | BOOLEAN | Whether to convert non-standard residues to their canonical equivalents where possible, such as MSE to MET. This improves compatibility with standard force fields. | true |
| timeout | False | INT | Maximum wait time in seconds for repairing each input PDB. Valid range is 60 to 3600 seconds; applied per structure in a batch. | 600 |
Outputs¶
| Field | Type | Description | Example |
|---|---|---|---|
| fixed.pdb | PDB | Repaired PDB structure collection ready for downstream solvation, minimization, or simulation. Each output entry contains cleaned coordinates and topology text for the corresponding input structure. | {"7KVE":"HEADER REPAIRED STRUCTURE ...\nATOM ...\nEND"} |
| statistics | DATAFRAME | Tabular summary of repaired structures with one row per input PDB. Includes at least `pdb_id` and `num_atoms` so downstream steps can verify successful repair and structure size. | [{"pdb_id":"7KVE","num_atoms":18432}] |
Important Notes¶
- Behavior: The default
heterogen_modeiserror_if_present, which intentionally prevents silent removal of ligands, cofactors, or metals. This protects simulation workflows from accidentally discarding chemically important molecules. - Integration: If you choose
preserve, you should typically use OpenMM Ligand Parameters before OpenMM Solvate, OpenMM Energy Minimize, or OpenMM Simulate so preserved ligand residues can be parameterized downstream. - Workflow:
keep_wateronly affects theremove_nonwaterpath. It does not change behavior when heterogens are preserved or when execution stops on heterogen detection. - Performance: Repair is generally fast for protein-only structures, but batch inputs are processed per PDB and the timeout applies to each one individually, so large batches can take proportionally longer.
Troubleshooting¶
- Force-field setup fails later with template or atom-mismatch errors: Run the structure through this node before solvation or minimization, and keep
replace_nonstandard_residuesenabled so modified residues are normalized where possible. - Node stops because heterogens are present: This is expected when
heterogen_modeiserror_if_present. Switch topreserveif the ligand or cofactor must remain, then provide matching residue definitions using OpenMM Ligand Parameters; or useremove_nonwaterfor protein-only simulations. - Unexpected waters remain or disappear: Check the combination of
heterogen_modeandkeep_water.keep_wateronly applies when usingremove_nonwater, and most workflows should still add a fresh solvent box later with OpenMM Solvate. - Repair times out on a large or unusual structure: Increase the
timeoutvalue, especially for very large complexes or problematic input files processed in batches.