Algorithm & Calculation Logic#
1. Algorithm Overview#
HBAT uses a geometric approach to identify hydrogen bonds by analyzing distance and angular criteria between donor-hydrogen-acceptor triplets. The main calculation is performed by the NPMolecularInteractionAnalyzer
class in hbat/core/np_analyzer.py
, which provides enhanced performance through NumPy vectorization.
Module Structure (Updated):
hbat/core/analyzer.py
: Main analysis engine interfacehbat/core/np_analyzer.py
: High-performance NumPy-based implementationhbat/core/interactions.py
: Interaction data classes and structureshbat/core/parameters.py
: Analysis parameters and constantshbat/ccd/ccd_analyzer.py
: CCD data management and BinaryCIF parsing
2. Core Calculation Steps#
Step 1: Donor-Acceptor Identification#
Donors: Heavy atoms (N, O, S) bonded to hydrogen atoms (
_get_hydrogen_bond_donors()
)Acceptors: Electronegative atoms (N, O, S, F, Cl) (
_get_hydrogen_bond_acceptors()
)
Step 2: Distance Criteria#
Two distance checks are performed:
H…A Distance: Hydrogen to acceptor distance
Cutoff: 3.5 Å (default from
ParametersDefault.HB_DISTANCE_CUTOFF
)Calculated: Using 3D Euclidean distance via
Vec3D.distance_to()
D…A Distance: Donor to acceptor distance
Cutoff: 4.0 Å (default from
ParametersDefault.HB_DA_DISTANCE
)Purpose: Ensures realistic hydrogen bond geometry
Step 3: Angular Criteria#
Angle: D-H…A angle using
angle_between_vectors()
fromhbat/core/vector.py
Cutoff: 120° minimum (default from
ParametersDefault.HB_ANGLE_CUTOFF
)Calculation: Uses vector dot product formula:
cos(θ) = (BA·BC)/(|BA||BC|)
3. Geometric Validation Process#
def _check_hydrogen_bond(donor, hydrogen, acceptor):
# Distance validation
h_a_distance = hydrogen.coords.distance_to(acceptor.coords)
if h_a_distance > 3.5: # Distance cutoff
return None
d_a_distance = donor.coords.distance_to(acceptor.coords)
if d_a_distance > 4.0: # Donor-acceptor cutoff
return None
# Angular validation
angle = angle_between_vectors(donor.coords, hydrogen.coords, acceptor.coords)
if math.degrees(angle) < 120.0: # Angle cutoff
return None
# Bond classification and creation
return HydrogenBond(...)
4. Key Parameters and Defaults#
From hbat/constants/parameters
(ParametersDefault
class):
Parameter |
Default Value |
Description |
---|---|---|
|
3.5 Å |
Maximum H…A distance |
|
120.0° |
Minimum D-H…A angle |
|
4.0 Å |
Maximum D…A distance |
|
0.6 |
Van der Waals to covalent bond factor |
|
2.5 Å |
Maximum covalent bond distance |
|
0.5 Å |
Minimum realistic bond distance |
5. PDB Structure Fixing and Preprocessing#
Missing Hydrogen Atom Detection#
HBAT now includes automatic PDB structure fixing capabilities to handle structures with missing hydrogen atoms:
PDB Fixing Methods:
OpenBabel Method (default): - Uses OpenBabel’s built-in hydrogen addition functionality - Fast and efficient for standard residues - Preserves original atom coordinates
PDBFixer Method: - Uses PDBFixer library with OpenMM - More comprehensive fixing capabilities - Can add missing heavy atoms and replace nonstandard residues
PDB Fixing Parameters:
From ParametersDefault
class:
Parameter |
Default Value |
Description |
---|---|---|
|
True |
Enable/disable PDB structure fixing |
|
“openbabel” |
Choose fixing method (“openbabel” or “pdbfixer”) |
|
True |
Add missing hydrogen atoms |
|
False |
Add missing heavy atoms (PDBFixer only) |
|
False |
Replace nonstandard residues |
|
False |
Remove heterogens |
|
True |
Keep water when removing heterogens |
Workflow Process:
Input validation: Check if PDB fixing is enabled and needed
Method selection: Choose between OpenBabel or PDBFixer
Structure fixing: Add missing atoms and fix structural issues
Output generation: Create fixed PDB file (e.g.,
structure_fixed.pdb
)Analysis continuation: Use fixed structure for interaction analysis
6. CCD Data Integration and Bond Detection#
Chemical Component Dictionary (CCD) Integration#
HBAT now integrates with the RCSB Chemical Component Dictionary (CCD) for accurate bond information:
CCD Data Manager:
Automatically downloads CCD BinaryCIF files from RCSB
Atom data:
cca.bcif
containing atomic propertiesBond data:
ccb.bcif
containing bond connectivity informationStorage location:
~/.hbat/ccd-data/
directoryAuto-download: Files are downloaded on first use and cached locally
Bond Detection Priority (Updated)#
The enhanced bond detection follows this priority:
RESIDUE_LOOKUP (new, preferred):
Uses pre-defined bond information from CCD for standard residues
Provides chemically accurate bond connectivity
Includes bond order (single/double) and aromaticity information
Covers all standard amino acids and nucleotides
CONECT Records (if available):
Parses explicit bond information from CONECT records in the PDB file
Creates bonds with
bond_type="explicit"
Preserves author-specified connectivity
Distance-based Detection (fallback):
Only used when no CONECT records are present or no bonds were found
Uses optimized spatial grid algorithm for large structures
Implements
_are_atoms_bonded_with_distance()
method
Distance-based Bond Criteria#
When detecting bonds by distance:
Van der Waals radii from
AtomicData.VDW_RADII
Distance criteria:
MIN_BOND_DISTANCE ≤ distance ≤ min(vdw_cutoff, MAX_BOND_DISTANCE)
VdW cutoff formula:
vdw_cutoff = (vdw1 + vdw2) × COVALENT_CUTOFF_FACTOR
Example: C-C bond = (1.70 + 1.70) × 0.6 = 2.04 Å maximum (but limited to 2.5 Å by MAX_BOND_DISTANCE)
Bond Types#
"residue_lookup"
: Bonds from CCD residue definitions"explicit"
: Bonds from CONECT records"covalent"
: Bonds detected by distance criteria
7. Performance Optimization and Vectorization#
NumPy-based High-Performance Analyzer#
HBAT now uses a high-performance NumPy-based analyzer (NPMolecularInteractionAnalyzer
) for enhanced computational efficiency:
Key Optimizations:
Vectorized Distance Calculations: - Uses
compute_distance_matrix()
for batch distance calculations - Replaces nested loops with NumPy array operations - Reduces computational complexity from O(n²) to O(n) for many operationsSpatial Indexing: - Pre-computed atom indices by type (hydrogen, donor, acceptor) - Optimized residue indexing for fast same-residue filtering - Grid-based spatial partitioning for bond detection
Batch Processing: - Vectorized angle calculations using NumPy operations - Simultaneous processing of multiple atom pairs - Optimized memory access patterns
Performance Benefits:
Large structures: Significant speedup for structures with >1000 atoms
Memory efficiency: Reduced memory allocation overhead
Scalability: Better performance scaling with structure size
Spatial Grid Algorithm#
For distance-based bond detection, HBAT uses a spatial grid algorithm:
Grid Setup:
- Grid cell size based on MAX_BOND_DISTANCE
(2.5 Å)
- Atoms are assigned to grid cells based on coordinates
- Only neighboring cells are checked for potential bonds
Benefits: - Reduces bond detection complexity from O(n²) to approximately O(n) - Particularly effective for large molecular systems - Maintains accuracy while improving performance
8. Vector Mathematics#
The NPVec3D
class (hbat/core/np_vector.py
) provides NumPy-based vector operations:
3D coordinates:
NPVec3D(x, y, z)
orNPVec3D(np.array([x, y, z]))
Batch operations: Support for multiple vectors simultaneously
NPVec3D(np.array([[x1,y1,z1], [x2,y2,z2]]))
Distance calculation:
√[(x₂-x₁)² + (y₂-y₁)² + (z₂-z₁)²]
with vectorized operationsAngle calculation:
arccos(dot_product / (mag1 × mag2))
using NumPy for efficiencyPerformance: Leverages NumPy’s optimized C implementations for mathematical operations
9. Enhanced Analysis Flow#
Updated Analysis Process:
Structure preprocessing → PDB fixing if enabled (add missing H atoms)
CCD data loading → Download/load chemical component dictionary
Parse PDB file → Extract atomic coordinates from fixed structure
Bond detection → Apply RESIDUE_LOOKUP → CONECT → Distance-based priority
Identify donors → Find N/O/S atoms bonded to H
Identify acceptors → Find N/O/S/F/Cl atoms
Distance screening → Apply H…A and D…A cutoffs (vectorized)
Angular validation → Check D-H…A geometry (batch processing)
Bond classification → Determine bond type (e.g., “N-H…O”)
Cooperativity analysis → Identify interaction chains
10. Output Structure and Analysis Summary#
Enhanced Analysis Summary:
The analysis now provides comprehensive summary information including:
Structure Information: Original vs. fixed structure statistics
PDB Fixing Details: Atoms added, bonds created, method used
Bond Detection Statistics: Counts by detection method (residue_lookup, explicit, covalent)
Performance Metrics: Analysis timing information
Interaction Counts: Detailed breakdown by interaction type
Interaction Data Classes:
Each detected interaction is stored with enhanced information:
HydrogenBond: Donor, hydrogen, acceptor atoms with geometric parameters
HalogenBond: Halogen, carbon, acceptor atoms with X-bond specifics
PiInteraction: Donor, hydrogen, aromatic ring center coordinates
CooperativityChain: Linked interaction sequences
11. Additional Features#
Halogen Bonds#
HBAT also detects halogen bonds (X-bonds) using similar geometric criteria:
Distance: X…A ≤ 4.0 Å
Angle: C-X…A ≥ 120°
Halogens: F, Cl, Br, I
π Interactions#
X-H…π interactions are detected using the aromatic ring center as a pseudo-acceptor:
Aromatic Ring Center Calculation (_calculate_aromatic_center()
)#
The center of aromatic rings is calculated as the geometric centroid of specific ring atoms:
Phenylalanine (PHE): - Ring atoms: CG, CD1, CD2, CE1, CE2, CZ (6-membered benzene ring) - Forms a planar hexagonal structure
Tyrosine (TYR): - Ring atoms: CG, CD1, CD2, CE1, CE2, CZ (6-membered benzene ring) - Same as PHE but with hydroxyl group at CZ
Tryptophan (TRP): - Ring atoms: CG, CD1, CD2, NE1, CE2, CE3, CZ2, CZ3, CH2 (9-atom indole system) - Includes both pyrrole and benzene rings
Histidine (HIS): - Ring atoms: CG, ND1, CD2, CE1, NE2 (5-membered imidazole ring) - Contains two nitrogen atoms in the ring
Centroid Calculation Process#
# For each aromatic residue:
center = Vec3D(0, 0, 0)
for atom_coord in ring_atoms_coords:
center = center + atom_coord
center = center / len(ring_atoms_coords) # Average position
π Interaction Geometry Validation (_check_pi_interaction()
)#
Once the aromatic center is calculated:
Distance Check: H…π center distance
Cutoff: ≤ 4.5 Å (from
ParametersDefault.PI_DISTANCE_CUTOFF
)Calculation: 3D Euclidean distance from hydrogen to ring centroid
Angular Check: D-H…π angle
Cutoff: ≥ 90° (from
ParametersDefault.PI_ANGLE_CUTOFF
)Calculation: Angle between donor-hydrogen vector and hydrogen-π_center vector
Uses same
angle_between_vectors()
function as regular hydrogen bonds
Geometric Interpretation#
The aromatic ring center acts as a “virtual acceptor” representing the π-electron cloud
Distance measures how close the hydrogen approaches the aromatic system
Angle ensures the hydrogen is positioned to interact with the π-electron density above/below the ring plane
Cooperativity Chains#
HBAT identifies cooperative interaction chains where molecular interactions are linked through shared atoms. This occurs when an acceptor atom in one interaction simultaneously acts as a donor in another interaction.
Chain Detection Algorithm (_find_cooperativity_chains()
)#
Step 1: Interaction Collection - Combines all detected interactions: hydrogen bonds, halogen bonds, and π interactions - Requires minimum of 2 interactions to form chains
Step 2: Atom-to-Interaction Mapping Creates two lookup dictionaries:
donor_to_interactions
: Maps each donor atom to interactions where it participatesacceptor_to_interactions
: Maps each acceptor atom to interactions where it participates
Atom keys are tuples: (chain_id, residue_sequence, atom_name)
Step 3: Chain Building Process (_build_cooperativity_chain_unified()
)
Starting from each unvisited interaction:
Initialize: Begin with starting interaction in chain
Follow Forward: Look for next interaction where current acceptor acts as donor
Validation: Ensure same atom serves dual role (acceptor → donor)
Iteration: Continue until no more connections found
Termination: π interactions cannot chain further as acceptors (no single acceptor atom)
Chain Building Logic#
# Simplified chain building process:
chain = [start_interaction]
current_interaction = start_interaction
while True:
current_acceptor = current_interaction.get_acceptor_atom()
if not current_acceptor:
break # No acceptor atom (π interactions)
# Find interaction where this acceptor acts as donor
acceptor_key = (acceptor.chain_id, acceptor.res_seq, acceptor.name)
next_interaction = None
for candidate in donor_to_interactions[acceptor_key]:
candidate_donor = candidate.get_donor_atom()
if candidate_donor matches current_acceptor:
next_interaction = candidate
break
if next_interaction is None:
break # Chain ends
chain.append(next_interaction)
current_interaction = next_interaction
Cooperativity Examples#
Example 1: Sequential H-bonds
Residue A (Donor) --H-bond--> Residue B (Acceptor/Donor) --H-bond--> Residue C (Acceptor)
Example 2: Mixed interactions
Residue A (N-H) --H-bond--> Residue B (O) --X-bond--> Residue C (halogen)