Atom Utilities#
Contains utility functions for working with PDB atoms and chemical elements.
Module Overview#
Atom Utilities
This module contains utility functions for working with PDB atoms and elements.
- hbat.utilities.atom_utils.get_element_from_pdb_atom(atom_name: str) str [source]#
Map PDB atom name to chemical element using regex patterns.
This function uses regular expressions to identify the element type from PDB atom naming conventions, handling complex cases like: - Greek letter remoteness indicators (CA, CB, CG, CD, CE, CZ, CH) - Numbered variants (C1’, H2’’, OP1, etc.) - Ion charges (CA2+, MG2+, etc.) - IUPAC hydrogen naming conventions
- Parameters:
atom_name (str) – PDB atom name (e.g., ‘CA’, ‘OP1’, ‘H2’’, ‘CA2+’)
- Returns:
Chemical element symbol (e.g., ‘C’, ‘O’, ‘H’, ‘CA’)
- Return type:
Examples
>>> get_element_from_pdb_atom('CA') 'C' >>> get_element_from_pdb_atom('OP1') 'O' >>> get_element_from_pdb_atom('CA2+') 'CA' >>> get_element_from_pdb_atom('H2'') 'H'
Functions#
Element Mapping Functions#
These functions provide atom name to element mapping for PDB structures.
- hbat.utilities.atom_utils.get_element_from_pdb_atom(atom_name: str) str [source]#
Map PDB atom name to chemical element using regex patterns.
This function uses regular expressions to identify the element type from PDB atom naming conventions, handling complex cases like: - Greek letter remoteness indicators (CA, CB, CG, CD, CE, CZ, CH) - Numbered variants (C1’, H2’’, OP1, etc.) - Ion charges (CA2+, MG2+, etc.) - IUPAC hydrogen naming conventions
- Parameters:
atom_name (str) – PDB atom name (e.g., ‘CA’, ‘OP1’, ‘H2’’, ‘CA2+’)
- Returns:
Chemical element symbol (e.g., ‘C’, ‘O’, ‘H’, ‘CA’)
- Return type:
Examples
>>> get_element_from_pdb_atom('CA') 'C' >>> get_element_from_pdb_atom('OP1') 'O' >>> get_element_from_pdb_atom('CA2+') 'CA' >>> get_element_from_pdb_atom('H2'') 'H'
Comprehensive regex-based mapping of PDB atom names to chemical elements.
Key Features:
Handles complex PDB naming conventions
Supports Greek letter remoteness indicators (CA, CB, CG, etc.)
Processes numbered variants (C1’, H2’’, OP1, etc.)
Recognizes ion charges (CA2+, MG2+, etc.)
Follows IUPAC hydrogen naming conventions
Usage Examples:
# Standard protein atoms get_element_from_pdb_atom('CA') # Returns 'C' get_element_from_pdb_atom('N') # Returns 'N' # Nucleic acid atoms get_element_from_pdb_atom('OP1') # Returns 'O' get_element_from_pdb_atom('C1\'') # Returns 'C' # Metal ions get_element_from_pdb_atom('CA2+') # Returns 'CA' get_element_from_pdb_atom('MG2+') # Returns 'MG' # Hydrogen atoms get_element_from_pdb_atom('H2\'') # Returns 'H' get_element_from_pdb_atom('HA') # Returns 'H'
- hbat.utilities.atom_utils.pdb_atom_to_element(atom_name: str) str [source]#
High-performance mapping of PDB atom name to chemical element.
Uses a pre-computed dictionary for common atoms and falls back to regex-based pattern matching for less common cases.
High-performance PDB atom name to element mapping with optimized lookup.
Performance Features:
Uses pre-computed dictionary for common atoms (fast O(1) lookup)
Falls back to regex-based pattern matching for uncommon atoms
Covers 99%+ of typical PDB atoms with direct lookup
Usage Examples:
# Fast lookup for common atoms pdb_atom_to_element('CA') # Returns 'C' (dictionary lookup) pdb_atom_to_element('N') # Returns 'N' (dictionary lookup) # Fallback for uncommon atoms pdb_atom_to_element('XYZ123') # Falls back to regex matching
Performance Notes:
Recommended for high-throughput PDB processing
Maintains full compatibility with get_element_from_pdb_atom()
Uses same underlying logic but with performance optimization
Implementation Details#
Regex Pattern Matching:
The functions use sophisticated regular expressions to handle PDB atom naming complexity:
Metal ions with charges:
^([A-Z]{1,2})[0-9]*[+-]$
Hydrogen atoms:
^H[A-Z0-9\'\"]*$
Carbon atoms:
^C[A-Z0-9\'\"]*$
(with exceptions for metals)Nitrogen atoms:
^N[A-Z0-9\'\"]*$
Oxygen atoms:
^O[A-Z0-9\'\"]*$
Sulfur atoms:
^S[A-Z0-9\'\"]*$
Phosphorus atoms:
^P[0-9]*$
Common Atom Dictionary:
The pre-computed dictionary includes:
Protein backbone atoms (N, CA, C, O)
Common side chain atoms (CB, CG, CD, etc.)
DNA/RNA backbone atoms (P, OP1, OP2, O5’, C5’, etc.)
Nucleotide base atoms (N1, C2, N3, etc.)
Standard hydrogen atoms (H, HA, HB, etc.)
Water molecules (OH2)
Common heteroatoms (F, CL, BR, I, D)
Error Handling:
Graceful fallback for unrecognized patterns
Whitespace trimming and case normalization
Returns atom name as-is if no pattern matches