PDB Structure Constants

PDB Structure Constants#

Contains PDB-specific constants including residue mappings, atom classifications, and molecular recognition patterns.

Module Overview#

This module contains constants specifically related to PDB file processing, including residue mappings, atom classifications, and molecular recognition patterns used throughout HBAT’s structure analysis components.

hbat.constants.pdb_constants.PROTEIN_SUBSTITUTIONS: Dict[str, str] = {'2AS': 'ASP', '3AH': 'HIS', '5HP': 'GLU', '5OW': 'LYS', 'ACL': 'ARG', 'AGM': 'ARG', 'AIB': 'ALA', 'ALM': 'ALA', 'ALO': 'THR', 'ALY': 'LYS', 'ARM': 'ARG', 'ASA': 'ASP', 'ASB': 'ASP', 'ASK': 'ASP', 'ASL': 'ASP', 'ASQ': 'ASP', 'AYA': 'ALA', 'BCS': 'CYS', 'BHD': 'ASP', 'BMT': 'THR', 'BNN': 'ALA', 'BUC': 'CYS', 'BUG': 'LEU', 'C5C': 'CYS', 'C6C': 'CYS', 'CAS': 'CYS', 'CCS': 'CYS', 'CEA': 'CYS', 'CGU': 'GLU', 'CHG': 'ALA', 'CLE': 'LEU', 'CME': 'CYS', 'CSD': 'ALA', 'CSO': 'CYS', 'CSP': 'CYS', 'CSS': 'CYS', 'CSW': 'CYS', 'CSX': 'CYS', 'CXM': 'MET', 'CY1': 'CYS', 'CY3': 'CYS', 'CYG': 'CYS', 'CYM': 'CYS', 'CYQ': 'CYS', 'DAH': 'PHE', 'DAL': 'ALA', 'DAR': 'ARG', 'DAS': 'ASP', 'DCY': 'CYS', 'DGL': 'GLU', 'DGN': 'GLN', 'DHA': 'ALA', 'DHI': 'HIS', 'DIL': 'ILE', 'DIV': 'VAL', 'DLE': 'LEU', 'DLY': 'LYS', 'DNP': 'ALA', 'DPN': 'PHE', 'DPR': 'PRO', 'DSN': 'SER', 'DSP': 'ASP', 'DTH': 'THR', 'DTR': 'TRP', 'DTY': 'TYR', 'DVA': 'VAL', 'EFC': 'CYS', 'FLA': 'ALA', 'FME': 'MET', 'GGL': 'GLU', 'GL3': 'GLY', 'GLZ': 'GLY', 'GMA': 'GLU', 'GSC': 'GLY', 'HAC': 'ALA', 'HAR': 'ARG', 'HIC': 'HIS', 'HIP': 'HIS', 'HMR': 'ARG', 'HPQ': 'PHE', 'HTR': 'TRP', 'HYP': 'PRO', 'IAS': 'ASP', 'IIL': 'ILE', 'IYR': 'TYR', 'KCX': 'LYS', 'LLP': 'LYS', 'LLY': 'LYS', 'LTR': 'TRP', 'LYM': 'LYS', 'LYZ': 'LYS', 'MAA': 'ALA', 'MEN': 'ASN', 'MHS': 'HIS', 'MIS': 'SER', 'MK8': 'LEU', 'MLE': 'LEU', 'MPQ': 'GLY', 'MSA': 'GLY', 'MSE': 'MET', 'MVA': 'VAL', 'NEM': 'HIS', 'NEP': 'HIS', 'NLE': 'LEU', 'NLN': 'LEU', 'NLP': 'LEU', 'NMC': 'GLY', 'OAS': 'SER', 'OCS': 'CYS', 'OMT': 'MET', 'PAQ': 'TYR', 'PCA': 'GLU', 'PEC': 'CYS', 'PHI': 'PHE', 'PHL': 'PHE', 'PR3': 'CYS', 'PRR': 'ALA', 'PTR': 'TYR', 'PYX': 'CYS', 'SAC': 'SER', 'SAR': 'GLY', 'SCH': 'CYS', 'SCS': 'CYS', 'SCY': 'CYS', 'SEL': 'SER', 'SEP': 'SER', 'SET': 'SER', 'SHC': 'CYS', 'SHR': 'LYS', 'SMC': 'CYS', 'SOC': 'CYS', 'STY': 'TYR', 'SVA': 'SER', 'TIH': 'ALA', 'TPL': 'TRP', 'TPO': 'THR', 'TPQ': 'ALA', 'TRG': 'LYS', 'TRO': 'TRP', 'TYB': 'TYR', 'TYI': 'TYR', 'TYQ': 'TYR', 'TYS': 'TYR', 'TYY': 'TYR'}#

Mapping of non-standard protein residue codes to their standard amino acid equivalents.

This comprehensive dictionary provides substitutions for modified, methylated, phosphorylated, and other chemically altered amino acid residues commonly found in PDB structures. Used by PDB fixing operations to standardize protein residue names for consistent analysis.

Examples

MSE (selenomethionine) → MET (methionine)
CSO (cysteine sulfenic acid) → CYS (cysteine)
HYP (hydroxyproline) → PRO (proline)
PCA (pyroglutamic acid) → GLU (glutamic acid)

Note: This dictionary contains only protein residue substitutions. Nucleotide modifications are handled separately.

Type:: Dict[str, str]

hbat.constants.pdb_constants.PROTEIN_RESIDUES: List[str] = ['ALA', 'ASN', 'CYS', 'GLU', 'HIS', 'LEU', 'MET', 'PRO', 'THR', 'TYR', 'ARG', 'ASP', 'GLN', 'GLY', 'ILE', 'LYS', 'PHE', 'SER', 'TRP', 'VAL']#

Standard three-letter codes for the 20 canonical amino acid residues.

This list contains all naturally occurring protein amino acids in their standard three-letter abbreviation format as used in PDB files. Used for residue type validation, protein chain identification, and analysis scope determination.

The 20 amino acids are:

Alanine (ALA), Arginine (ARG), Asparagine (ASN), Aspartic acid (ASP)
Cysteine (CYS), Glutamic acid (GLU), Glutamine (GLN), Glycine (GLY)
Histidine (HIS), Isoleucine (ILE), Leucine (LEU), Lysine (LYS)
Methionine (MET), Phenylalanine (PHE), Proline (PRO), Serine (SER)
Threonine (THR), Tryptophan (TRP), Tyrosine (TYR), Valine (VAL)

Type:: List[str]

hbat.constants.pdb_constants.RNA_RESIDUES: List[str] = ['A', 'G', 'C', 'U', 'I']#

Standard single-letter codes for RNA nucleotide residues.

Contains the five RNA nucleotides commonly found in PDB structures:

A (Adenine): Purine base forming A-U base pairs
G (Guanine): Purine base forming G-C base pairs
C (Cytosine): Pyrimidine base forming C-G base pairs
U (Uracil): Pyrimidine base forming U-A base pairs
I (Inosine): Modified nucleotide, wobble base pairing

Used for nucleic acid chain identification and RNA structure analysis.

Type:: List[str]

hbat.constants.pdb_constants.DNA_RESIDUES: List[str] = ['DA', 'DG', 'DC', 'DT', 'DI']#

Standard two-letter codes for DNA nucleotide residues.

Contains the five DNA nucleotides commonly found in PDB structures:

DA (Deoxyadenosine): Purine base forming A-T base pairs
DG (Deoxyguanosine): Purine base forming G-C base pairs
DC (Deoxycytidine): Pyrimidine base forming C-G base pairs
DT (Deoxythymidine): Pyrimidine base forming T-A base pairs
DI (Deoxyinosine): Modified nucleotide, wobble base pairing

Used for nucleic acid chain identification and DNA structure analysis. The ‘D’ prefix distinguishes DNA nucleotides from RNA nucleotides.

Type:: List[str]

hbat.constants.pdb_constants.PDB_ATOM_TO_ELEMENT: Dict[str, str] = {'BR': 'BR', 'C': 'C', "C1'": 'C', 'C2': 'C', "C2'": 'C', "C3'": 'C', 'C4': 'C', "C4'": 'C', 'C5': 'C', "C5'": 'C', 'C5M': 'C', 'C6': 'C', 'C8': 'C', 'CA': 'C', 'CB': 'C', 'CD': 'C', 'CE': 'C', 'CG': 'C', 'CL': 'CL', 'CZ': 'C', 'D': 'D', 'F': 'F', 'H': 'H', 'HA': 'H', 'HB': 'H', 'HD': 'H', 'HE': 'H', 'HG': 'H', 'HH': 'H', 'HN': 'H', 'HO': 'H', 'HOH': 'H', 'HS': 'H', 'HZ': 'H', 'I': 'I', 'N': 'N', 'N1': 'N', 'N2': 'N', 'N3': 'N', 'N4': 'N', 'N6': 'N', 'N7': 'N', 'N9': 'N', 'ND1': 'N', 'ND2': 'N', 'NE': 'N', 'NE1': 'N', 'NE2': 'N', 'NH1': 'N', 'NH2': 'N', 'NZ': 'N', 'O': 'O', 'O2': 'O', "O2'": 'O', "O3'": 'O', 'O4': 'O', "O4'": 'O', "O5'": 'O', 'O6': 'O', 'OD1': 'O', 'OD2': 'O', 'OE1': 'O', 'OE2': 'O', 'OG': 'O', 'OG1': 'O', 'OH': 'O', 'OH2': 'O', 'OP1': 'O', 'OP2': 'O', 'P': 'P', 'SD': 'S', 'SG': 'S'}#

Pre-computed mapping of common PDB atom names to their element types.

This dictionary provides fast lookup for the most frequently encountered PDB atoms. For comprehensive coverage including unusual atoms, use pdb_atom_to_element() function which uses regex-based pattern matching.

Coverage includes:

Protein backbone and common side chain atoms
DNA/RNA backbone and nucleotide base atoms
Standard hydrogen atoms
Water molecules

For full pattern-based mapping that handles:

Greek letter remoteness indicators (CA, CB, CG, CD, CE, CZ, CH)
Numbered variants (C1’, H2’’, OP1, etc.)
Ion charges (CA2+, MG2+, etc.)
IUPAC hydrogen naming conventions
Uncommon PDB atom names

Use pdb_atom_to_element() function instead.

Used for:

Looking up atomic properties (radius, mass, electronegativity)
Covalent bond detection
Van der Waals calculations
Molecular mass calculations

Type:: Dict[str, str]

hbat.constants.pdb_constants.PROTEIN_BACKBONE_ATOMS: List[str] = ['N', 'CA', 'C', 'O']#

Standard protein backbone atom names in PDB format.

Defines the four atoms that form the protein backbone (main chain):

N: Amino nitrogen atom
CA: Alpha carbon atom (central carbon)
C: Carbonyl carbon atom
O: Carbonyl oxygen atom

These atoms are present in all amino acid residues (except proline’s modified N) and form the peptide bonds that connect residues.

Type:: List[str]

hbat.constants.pdb_constants.DNA_RNA_BACKBONE_ATOMS: List[str] = ['P', 'OP1', 'OP2', "O5'", "C5'", "C4'", "O4'", "C3'", "O3'", "C2'", "O2'", "C1'"]#

Standard DNA/RNA backbone atom names in PDB format.

Sugar-phosphate backbone atoms:

P: Phosphorus atom
OP1, OP2: Non-bridging phosphate oxygens
O5’: 5’ phosphate oxygen (bridging)
C5’: 5’ carbon of ribose/deoxyribose
C4’: 4’ carbon of ribose/deoxyribose
O4’: 4’ oxygen of ribose/deoxyribose (ring oxygen)
C3’: 3’ carbon of ribose/deoxyribose
O3’: 3’ phosphate oxygen (bridging)
C2’: 2’ carbon of ribose/deoxyribose
O2’: 2’ hydroxyl oxygen (RNA only, absent in DNA)
C1’: 1’ carbon of ribose/deoxyribose (anomeric carbon)

Note: O2’ is present in RNA but absent in DNA (deoxyribose lacks 2’ hydroxyl).

Type:: List[str]

hbat.constants.pdb_constants.BACKBONE_ATOMS: List[str] = ['N', 'CA', 'C', 'O', 'P', 'OP1', 'OP2', "O5'", "C5'", "C4'", "O4'", "C3'", "O3'", "C2'", "O2'", "C1'"]#

Combined backbone atom names for proteins, DNA, and RNA in PDB format.

This list is the combination of PROTEIN_BACKBONE_ATOMS and DNA_RNA_BACKBONE_ATOMS, providing a comprehensive set of backbone atoms for all major biomolecule types.

Used for:

Backbone hydrogen bond identification across all molecule types
Secondary structure analysis
Main chain vs side chain/base classification
Nucleic acid backbone conformation analysis

Type:: List[str]

hbat.constants.pdb_constants.PROTEIN_SIDECHAIN_ATOMS: List[str] = ['CB', 'CG', 'CD', 'NE', 'CZ', 'NH1', 'NH2', 'OD1', 'ND2', 'OD2', 'SG', 'OE1', 'NE2', 'OE2', 'CD2', 'ND1', 'CE1', 'CG1', 'CG2', 'CD1', 'CE', 'NZ', 'SD', 'CE2', 'OG', 'OG1', 'NE1', 'CE3', 'CZ2', 'CZ3', 'CH2', 'OH']#

Common protein side chain atom names in PDB format.

Comprehensive list of side chain (R-group) atoms found in the 20 standard amino acids:

Aliphatic carbons: CB, CG, CD, CE, CZ (branching from CA)
Aromatic carbons: CD1/CD2, CE1/CE2/CE3, CZ2/CZ3, CH2 (ring systems)
Nitrogen atoms: NE, NH1, NH2, ND1, ND2, NE1, NE2, NZ (basic groups)
Oxygen atoms: OD1, OD2, OE1, OE2, OG, OG1, OH (acidic/hydroxyl groups)
Sulfur atoms: SG, SD (cysteine, methionine)

Used for:

Side chain interaction analysis
Functional group identification
Hydrogen bond donor/acceptor classification

Type:: List[str]

hbat.constants.pdb_constants.DNA_RNA_BASE_ATOMS: List[str] = ['N1', 'C2', 'N3', 'C4', 'C5', 'C6', 'N6', 'N7', 'C8', 'N9', 'O6', 'N2', 'O2', 'N4', 'O4', 'C5M']#

Common DNA/RNA base atom names in PDB format.

Base atoms found in nucleotides: Purine bases (Adenine, Guanine):

N1, C2, N3, C4, C5, C6: Six-membered ring atoms

N7, C8, N9: Five-membered ring atoms

N6: Amino group on adenine

O6, N2: Functional groups on guanine

Pyrimidine bases (Cytosine, Thymine, Uracil):

N1, C2, N3, C4, C5, C6: Six-membered ring atoms
O2: Carbonyl oxygen at position 2
N4: Amino group on cytosine
O4: Carbonyl oxygen at position 4 (thymine/uracil)
C5M: Methyl group on thymine (also called C7)

Used for:

Base-base interactions (hydrogen bonding, stacking)
Protein-nucleic acid recognition
Base functional group identification

Type:: List[str]

hbat.constants.pdb_constants.SIDECHAIN_ATOMS: List[str] = ['CB', 'CG', 'CD', 'NE', 'CZ', 'NH1', 'NH2', 'OD1', 'ND2', 'OD2', 'SG', 'OE1', 'NE2', 'OE2', 'CD2', 'ND1', 'CE1', 'CG1', 'CG2', 'CD1', 'CE', 'NZ', 'SD', 'CE2', 'OG', 'OG1', 'NE1', 'CE3', 'CZ2', 'CZ3', 'CH2', 'OH', 'N1', 'C2', 'N3', 'C4', 'C5', 'C6', 'N6', 'N7', 'C8', 'N9', 'O6', 'N2', 'O2', 'N4', 'O4', 'C5M']#

Combined side chain and base atoms for proteins and nucleic acids.

This list is the combination of PROTEIN_SIDECHAIN_ATOMS and DNA_RNA_BASE_ATOMS, providing a comprehensive set of non-backbone atoms for all major biomolecule types.

Used for:

Side chain/base interaction analysis
Distinguishing backbone from functional groups
Molecular recognition studies

Type:: List[str]

hbat.constants.pdb_constants.WATER_MOLECULES: List[str] = ['HOH', 'WAT', 'DOD', 'TIP3', 'TIP4', 'TIP5', 'W']#

Standard water molecule residue names in PDB files.

Recognition patterns for different water representations:

HOH: Standard PDB water molecule designation
WAT: Alternative water molecule name
DOD: Deuterated water (heavy water)
TIP3: TIP3P water model (3-point)
TIP4: TIP4P water model (4-point)
TIP5: TIP5P water model (5-point)
W: Abbreviated water designation

Used for:

Water molecule identification in PDB structures
Solvent exclusion during analysis
Water-mediated interaction detection
Hydration shell analysis

Type:: List[str]

hbat.constants.pdb_constants.RESIDUES: List[str] = ['ALA', 'ASN', 'CYS', 'GLU', 'HIS', 'LEU', 'MET', 'PRO', 'THR', 'TYR', 'ARG', 'ASP', 'GLN', 'GLY', 'ILE', 'LYS', 'PHE', 'SER', 'TRP', 'VAL', 'DA', 'DG', 'DC', 'DT', 'DI', 'A', 'G', 'C', 'U', 'I', 'HOH', 'WAT', 'DOD', 'TIP3', 'TIP4', 'TIP5', 'W']#

Combined list of all standard residue codes for proteins, DNA, and RNA.

This list is the combination of PROTEIN_RESIDUES, DNA_RESIDUES, WATER_MOLECULES, and RNA_RESIDUES, providing a comprehensive set of standard residues found in biomolecular structures.

Used for:

General residue type validation
Distinguishing standard residues from heterogens
Biomolecule type identification

Type:: List[str]

hbat.constants.pdb_constants.RESIDUES_WITH_AROMATIC_RINGS: List[str] = ['PHE', 'TYR', 'TRP', 'HIS', 'HID', 'HIE', 'HIP', 'TYI', 'TYQ', 'TYB', 'DA', 'DG', 'DC', 'DT', 'A', 'G', 'C', 'U']#

Residues containing aromatic rings in their structures. This list includes: Protein residues:

PHE: Phenylalanine (benzene ring)

TYR: Tyrosine (phenolic ring)

TRP: Tryptophan (indole ring)

HIS: Histidine (imidazole ring)

HID, HIE, HIP: Different protonation states of histidine

TYI, TYQ, TYB: Variants of tyrosine with modifications

DNA nucleotides:

DA: Deoxyadenosine (purine ring: adenine)
DG: Deoxyguanosine (purine ring: guanine)
DC: Deoxycytidine (pyrimidine ring: cytosine)
DT: Deoxythymidine (pyrimidine ring: thymine)

RNA nucleotides:

A: Adenine (purine ring)
G: Guanine (purine ring)
C: Cytosine (pyrimidine ring)
U: Uracil (pyrimidine ring)

Used for:

Aromatic interaction analysis
π-π stacking detection between proteins and nucleic acids
DNA/RNA-protein interface studies

Type:: List[str]

hbat.constants.pdb_constants.HYDROGEN_ELEMENTS: List[str] = ['H', 'D']#

Hydrogen element types including isotopes.

Contains the hydrogen element symbols commonly found in PDB structures: - H: Standard hydrogen (protium) - D: Deuterium (heavy hydrogen isotope)

Used for:

Hydrogen bond donor/acceptor detection
Identifying hydrogen atoms in molecular interactions
Mass calculations and isotope effects
NMR-related structural analysis

Type:: List[str]

hbat.constants.pdb_constants.HALOGEN_ELEMENTS: List[str] = ['F', 'CL', 'BR', 'I']#

Elements that can participate in halogen bonding as donors.

These halogens can act as electron acceptors in halogen bonds when covalently bonded to carbon (C-X…Y geometry). The halogen forms a σ-hole that can interact with electron-rich regions on acceptor atoms.

F: Fluorine (weakest halogen bond donor due to high electronegativity)
CL: Chlorine (common in drug design, moderate halogen bonding)
BR: Bromine (strong halogen bond donor, commonly studied)
I: Iodine (strongest halogen bond donor due to large, polarizable electron cloud)

Type:: List[str]

hbat.constants.pdb_constants.HYDROGEN_BOND_DONOR_ELEMENTS: List[str] = ['N', 'O', 'S', 'F']#

Elements that can act as hydrogen bond donors.

These elements can form hydrogen bonds when covalently bonded to hydrogen atoms (D-H…A geometry). They are electronegative enough to polarize the D-H bond, creating a partial positive charge on the hydrogen that can interact with electron-rich acceptor atoms.

N: Nitrogen (amino groups, ring nitrogens, strong donors)
O: Oxygen (hydroxyl groups, moderate to strong donors)
S: Sulfur (thiol groups, weak donors due to lower electronegativity)

Type:: List[str]

hbat.constants.pdb_constants.HYDROGEN_BOND_ACCEPTOR_ELEMENTS: List[str] = ['N', 'O', 'S', 'F', 'CL']#

Elements that can act as hydrogen bond acceptors.

These electronegative elements have lone pairs of electrons that can accept hydrogen bonds from donor atoms (D-H…A geometry). They can form favorable electrostatic interactions with the partial positive charge on hydrogen.

N: Nitrogen (lone pairs on amino groups, ring nitrogens)
O: Oxygen (lone pairs on carbonyl, hydroxyl, ether groups - strongest acceptors)
S: Sulfur (lone pairs on thiol, sulfide groups - weaker acceptors)
F: Fluorine (strongest electronegativity, excellent acceptor but rare in proteins)
CL: Chlorine (moderate acceptor, sometimes found in modified residues)

Type:: List[str]

hbat.constants.pdb_constants.HALOGEN_BOND_ACCEPTOR_ELEMENTS: List[str] = ['N', 'O', 'S']#

Elements that can act as halogen bond acceptors.

These electronegative atoms can donate electron density to the σ-hole of halogen atoms in halogen bonds. They typically have lone pairs of electrons that can interact with the positive electrostatic potential of the halogen.

N: Nitrogen (lone pairs on amino groups, ring nitrogens)
O: Oxygen (lone pairs on carbonyl, hydroxyl, ether groups)
S: Sulfur (lone pairs on thiol, sulfide groups, weaker than N/O)

Type:: List[str]

hbat.constants.pdb_constants.PI_INTERACTION_DONOR: List[str] = ['C']#

Elements that can act as π-interaction donors.

These atoms can participate in π-interactions when part of π-systems. Currently includes: - C: Carbon atoms

Type:: List[str]

hbat.constants.pdb_constants.PI_INTERACTION_ATOMS: List[str] = ['H', 'F', 'CL']#

Elements that can participate in π-interactions.

Type:: List[str]

hbat.constants.pdb_constants.RING_ATOMS_FOR_RESIDUES_WITH_AROMATIC_RINGS: Dict[str, List[str]] = {'A': ['N9', 'C8', 'N7', 'C5', 'C6', 'N1', 'C2', 'N3', 'C4'], 'C': ['N1', 'C2', 'N3', 'C4', 'C5', 'C6'], 'DA': ['N9', 'C8', 'N7', 'C5', 'C6', 'N1', 'C2', 'N3', 'C4'], 'DC': ['N1', 'C2', 'N3', 'C4', 'C5', 'C6'], 'DG': ['N9', 'C8', 'N7', 'C5', 'C6', 'N1', 'C2', 'N3', 'C4'], 'DT': ['N1', 'C2', 'N3', 'C4', 'C5', 'C6'], 'G': ['N9', 'C8', 'N7', 'C5', 'C6', 'N1', 'C2', 'N3', 'C4'], 'HID': ['CG', 'ND1', 'CD2', 'CE1', 'NE2'], 'HIE': ['CG', 'ND1', 'CD2', 'CE1', 'NE2'], 'HIP': ['CG', 'ND1', 'CD2', 'CE1', 'NE2'], 'HIS': ['CG', 'ND1', 'CD2', 'CE1', 'NE2'], 'PHE': ['CG', 'CD1', 'CD2', 'CE1', 'CE2', 'CZ'], 'TRP': ['CG', 'CD1', 'CD2', 'NE1', 'CE2', 'CE3', 'CZ2', 'CZ3', 'CH2'], 'TYB': ['CG', 'CD1', 'CD2', 'CE1', 'CE2', 'CZ'], 'TYI': ['CG', 'CD1', 'CD2', 'CE1', 'CE2', 'CZ'], 'TYQ': ['CG', 'CD1', 'CD2', 'CE1', 'CE2', 'CZ'], 'TYR': ['CG', 'CD1', 'CD2', 'CE1', 'CE2', 'CZ'], 'U': ['N1', 'C2', 'N3', 'C4', 'C5', 'C6']}#

Mapping of aromatic residues to their ring atom names.

This dictionary provides the specific atom names that form aromatic ring systems for each residue type containing aromatic groups:

Protein residues: Phenylalanine (PHE) and variants:

6-membered benzene ring: CG-CD1-CE1-CZ-CE2-CD2

Tyrosine (TYR, TYI, TYQ, TYB) and variants:

6-membered phenolic ring: CG-CD1-CE1-CZ-CE2-CD2
TYI: Ionized tyrosine (deprotonated hydroxyl)
TYQ: Quinone form of tyrosine
TYB: Brominated tyrosine

Tryptophan (TRP):

5-membered pyrrole ring: CG-CD1-NE1-CE2-CD2
6-membered benzene ring: CD2-CE2-CZ2-CH2-CZ3-CE3
Forms bicyclic indole system

Histidine (HIS, HID, HIE, HIP):

5-membered imidazole ring: CG-ND1-CE1-NE2-CD2
HID: Delta protonated (H on ND1)
HIE: Epsilon protonated (H on NE2)
HIP: Both nitrogens protonated (positive charge)

DNA nucleotides: Adenine (DA) and Guanine (DG) - Purine bases:

5-membered ring: N9-C8-N7-C5-C4

6-membered ring: C5-C6-N1-C2-N3-C4

Forms bicyclic purine system

Cytosine (DC) and Thymine (DT) - Pyrimidine bases:

6-membered ring: N1-C2-N3-C4-C5-C6

RNA nucleotides: Adenine (A) and Guanine (G) - Purine bases:

Same purine ring system as DNA counterparts

Cytosine (C) and Uracil (U) - Pyrimidine bases:

Same pyrimidine ring system as DNA counterparts

Used for:

Calculating aromatic ring centroids for π interactions
Identifying atoms involved in π-π stacking
Determining ring plane orientations
X-H…π interaction analysis where these atoms form the π system
DNA/RNA-protein interface interactions
Nucleotide base stacking analysis

Type:: Dict[str, List[str]]

hbat.constants.pdb_constants.HYDROPHOBIC_RESIDUES: List[str] = ['VAL', 'LEU', 'ILE', 'MET', 'PHE', 'TRP', 'PRO', 'ALA']#

Hydrophobic amino acid residues with nonpolar side chains.

These amino acids have side chains that are predominantly nonpolar and hydrophobic:

VAL (Valine): Branched aliphatic chain
LEU (Leucine): Branched aliphatic chain
ILE (Isoleucine): Branched aliphatic chain
MET (Methionine): Sulfur-containing nonpolar chain
PHE (Phenylalanine): Aromatic benzyl group
TRP (Tryptophan): Aromatic indole group
PRO (Proline): Cyclic imino acid structure
ALA (Alanine): Simple methyl group

Used for:

Hydrophobic interaction analysis
Protein folding studies
Membrane protein analysis
Hydrophobic patch identification

Type:: List[str]

hbat.constants.pdb_constants.CHARGED_RESIDUES: List[str] = ['ARG', 'LYS', 'ASP', 'GLU', 'HIS']#

Charged amino acid residues with ionizable side chains.

These amino acids carry formal charges at physiological pH:

ARG (Arginine): Positively charged guanidinium group (+1)
LYS (Lysine): Positively charged amino group (+1)
ASP (Aspartic acid): Negatively charged carboxylate group (-1)
GLU (Glutamic acid): Negatively charged carboxylate group (-1)
HIS (Histidine): Can be positively charged imidazolium group (pKa ~6)

Used for:

Electrostatic interaction analysis
Salt bridge identification
pH-dependent behavior studies
Ion binding site analysis

Type:: List[str]

hbat.constants.pdb_constants.RESIDUE_TYPES: List[str] = ['DNA', 'RNA', 'PROTEIN', 'LIGAND']#

Standard residue type classifications for molecular analysis.

Classification categories for different types of molecular residues:

DNA: Deoxyribonucleotide residues (DA, DG, DC, DT, DI)
RNA: Ribonucleotide residues (A, G, C, U, I)
PROTEIN: Amino acid residues (20 standard amino acids and variants)
LIGAND: Ligands, cofactors, metals, and other heteroatom residues

Used for:

Residue type identification and classification
Molecular component analysis
Structure validation and processing
Interaction type determination

Type:: List[str]

hbat.constants.pdb_constants.RESIDUE_TYPE_CODES: Dict[str, str] = {'DNA': 'D', 'LIGAND': 'L', 'PROTEIN': 'P', 'RNA': 'R'}#

Single letter codes for residue types.

Mapping of full residue type names to compact single letter codes:

“DNA” → “D”: Deoxyribonucleotide residues
“RNA” → “R”: Ribonucleotide residues
“PROTEIN” → “P”: Amino acid residues
“LIGAND” → “L”: Ligands, cofactors, metals, and other heteroatom residues

Used for compact representation in hydrogen bond descriptions and atom records.

Type:: Dict[str, str]

hbat.constants.pdb_constants.BACKBONE_SIDECHAIN_CODES: Dict[str, str] = {'BACKBONE': 'B', 'NOT_APPLICABLE': 'N', 'SIDECHAIN': 'S'}#

Single letter codes for backbone vs sidechain classification.

Mapping of atom structural classification to compact single letter codes:

“BACKBONE” → “B”: Main chain atoms (protein backbone, DNA/RNA sugar-phosphate)
“SIDECHAIN” → “S”: Side chain atoms (protein R-groups, nucleotide bases)

Used for describing hydrogen bond donor-acceptor relationships (e.g., S-S, S-B, B-B).

Type:: Dict[str, str]

hbat.constants.pdb_constants.AROMATIC_CODES: Dict[str, str] = {'AROMATIC': 'A', 'NON-AROMATIC': 'N'}#

Single letter codes for aromatic classification.

Mapping of aromatic property classification to compact single letter codes:

“AROMATIC” → “A”: Atoms that are part of aromatic ring systems
“NON-AROMATIC” → “N”: Atoms that are not part of aromatic ring systems

Used for identifying atoms involved in π-interactions and aromatic stacking.

Type:: Dict[str, str]

Constants#

Residue Definitions#

Standard protein, DNA, and RNA residue names and their substitutions.

Standard three-letter codes for the 20 canonical amino acid residues.

The 20 amino acids are:

Alanine (ALA), Arginine (ARG), Asparagine (ASN), Aspartic acid (ASP)
Cysteine (CYS), Glutamic acid (GLU), Glutamine (GLN), Glycine (GLY)
Histidine (HIS), Isoleucine (ILE), Leucine (LEU), Lysine (LYS)
Methionine (MET), Phenylalanine (PHE), Proline (PRO), Serine (SER)
Threonine (THR), Tryptophan (TRP), Tyrosine (TYR), Valine (VAL)

Type:: List[str]

hbat.constants.pdb_constants.DNA_RESIDUES: List[str] = ['DA', 'DG', 'DC', 'DT', 'DI']#

Standard two-letter codes for DNA nucleotide residues.

Contains the five DNA nucleotides commonly found in PDB structures:

DA (Deoxyadenosine): Purine base forming A-T base pairs
DG (Deoxyguanosine): Purine base forming G-C base pairs
DC (Deoxycytidine): Pyrimidine base forming C-G base pairs
DT (Deoxythymidine): Pyrimidine base forming T-A base pairs
DI (Deoxyinosine): Modified nucleotide, wobble base pairing

Used for nucleic acid chain identification and DNA structure analysis. The ‘D’ prefix distinguishes DNA nucleotides from RNA nucleotides.

Type:: List[str]

hbat.constants.pdb_constants.RNA_RESIDUES: List[str] = ['A', 'G', 'C', 'U', 'I']#

Standard single-letter codes for RNA nucleotide residues.

Contains the five RNA nucleotides commonly found in PDB structures:

A (Adenine): Purine base forming A-U base pairs
G (Guanine): Purine base forming G-C base pairs
C (Cytosine): Pyrimidine base forming C-G base pairs
U (Uracil): Pyrimidine base forming U-A base pairs
I (Inosine): Modified nucleotide, wobble base pairing

Used for nucleic acid chain identification and RNA structure analysis.

Type:: List[str]

Combined list of all standard residue codes for proteins, DNA, and RNA.

This list is the combination of PROTEIN_RESIDUES, DNA_RESIDUES, WATER_MOLECULES, and RNA_RESIDUES, providing a comprehensive set of standard residues found in biomolecular structures.

Used for:

General residue type validation
Distinguishing standard residues from heterogens
Biomolecule type identification

Type:: List[str]

Mapping of non-standard protein residue codes to their standard amino acid equivalents.

Examples

MSE (selenomethionine) → MET (methionine)
CSO (cysteine sulfenic acid) → CYS (cysteine)
HYP (hydroxyproline) → PRO (proline)
PCA (pyroglutamic acid) → GLU (glutamic acid)

Note: This dictionary contains only protein residue substitutions. Nucleotide modifications are handled separately.

Type:: Dict[str, str]

Atom Classifications#

Atom groupings for different molecular components.

hbat.constants.pdb_constants.PROTEIN_BACKBONE_ATOMS: List[str] = ['N', 'CA', 'C', 'O']#

Standard protein backbone atom names in PDB format.

Defines the four atoms that form the protein backbone (main chain):

N: Amino nitrogen atom
CA: Alpha carbon atom (central carbon)
C: Carbonyl carbon atom
O: Carbonyl oxygen atom

These atoms are present in all amino acid residues (except proline’s modified N) and form the peptide bonds that connect residues.

Type:: List[str]

hbat.constants.pdb_constants.DNA_RNA_BACKBONE_ATOMS: List[str] = ['P', 'OP1', 'OP2', "O5'", "C5'", "C4'", "O4'", "C3'", "O3'", "C2'", "O2'", "C1'"]#

Standard DNA/RNA backbone atom names in PDB format.

Sugar-phosphate backbone atoms:

P: Phosphorus atom
OP1, OP2: Non-bridging phosphate oxygens
O5’: 5’ phosphate oxygen (bridging)
C5’: 5’ carbon of ribose/deoxyribose
C4’: 4’ carbon of ribose/deoxyribose
O4’: 4’ oxygen of ribose/deoxyribose (ring oxygen)
C3’: 3’ carbon of ribose/deoxyribose
O3’: 3’ phosphate oxygen (bridging)
C2’: 2’ carbon of ribose/deoxyribose
O2’: 2’ hydroxyl oxygen (RNA only, absent in DNA)
C1’: 1’ carbon of ribose/deoxyribose (anomeric carbon)

Note: O2’ is present in RNA but absent in DNA (deoxyribose lacks 2’ hydroxyl).

Type:: List[str]

hbat.constants.pdb_constants.BACKBONE_ATOMS: List[str] = ['N', 'CA', 'C', 'O', 'P', 'OP1', 'OP2', "O5'", "C5'", "C4'", "O4'", "C3'", "O3'", "C2'", "O2'", "C1'"]#

Combined backbone atom names for proteins, DNA, and RNA in PDB format.

This list is the combination of PROTEIN_BACKBONE_ATOMS and DNA_RNA_BACKBONE_ATOMS, providing a comprehensive set of backbone atoms for all major biomolecule types.

Used for:

Backbone hydrogen bond identification across all molecule types
Secondary structure analysis
Main chain vs side chain/base classification
Nucleic acid backbone conformation analysis

Type:: List[str]

Common protein side chain atom names in PDB format.

Comprehensive list of side chain (R-group) atoms found in the 20 standard amino acids:

Aliphatic carbons: CB, CG, CD, CE, CZ (branching from CA)
Aromatic carbons: CD1/CD2, CE1/CE2/CE3, CZ2/CZ3, CH2 (ring systems)
Nitrogen atoms: NE, NH1, NH2, ND1, ND2, NE1, NE2, NZ (basic groups)
Oxygen atoms: OD1, OD2, OE1, OE2, OG, OG1, OH (acidic/hydroxyl groups)
Sulfur atoms: SG, SD (cysteine, methionine)

Used for:

Side chain interaction analysis
Functional group identification
Hydrogen bond donor/acceptor classification

Type:: List[str]

hbat.constants.pdb_constants.DNA_RNA_BASE_ATOMS: List[str] = ['N1', 'C2', 'N3', 'C4', 'C5', 'C6', 'N6', 'N7', 'C8', 'N9', 'O6', 'N2', 'O2', 'N4', 'O4', 'C5M']#

Common DNA/RNA base atom names in PDB format.

Base atoms found in nucleotides: Purine bases (Adenine, Guanine):

N1, C2, N3, C4, C5, C6: Six-membered ring atoms

N7, C8, N9: Five-membered ring atoms

N6: Amino group on adenine

O6, N2: Functional groups on guanine

Pyrimidine bases (Cytosine, Thymine, Uracil):

N1, C2, N3, C4, C5, C6: Six-membered ring atoms
O2: Carbonyl oxygen at position 2
N4: Amino group on cytosine
O4: Carbonyl oxygen at position 4 (thymine/uracil)
C5M: Methyl group on thymine (also called C7)

Used for:

Base-base interactions (hydrogen bonding, stacking)
Protein-nucleic acid recognition
Base functional group identification

Type:: List[str]

Combined side chain and base atoms for proteins and nucleic acids.

This list is the combination of PROTEIN_SIDECHAIN_ATOMS and DNA_RNA_BASE_ATOMS, providing a comprehensive set of non-backbone atoms for all major biomolecule types.

Used for:

Side chain/base interaction analysis
Distinguishing backbone from functional groups
Molecular recognition studies

Type:: List[str]

Molecular Interaction Elements#

Element lists for detecting different types of molecular interactions.

hbat.constants.pdb_constants.HYDROGEN_ELEMENTS: List[str] = ['H', 'D']#

Hydrogen element types including isotopes.

Contains the hydrogen element symbols commonly found in PDB structures: - H: Standard hydrogen (protium) - D: Deuterium (heavy hydrogen isotope)

Used for:

Hydrogen bond donor/acceptor detection
Identifying hydrogen atoms in molecular interactions
Mass calculations and isotope effects
NMR-related structural analysis

Type:: List[str]

hbat.constants.pdb_constants.HALOGEN_ELEMENTS: List[str] = ['F', 'CL', 'BR', 'I']#

Elements that can participate in halogen bonding as donors.

F: Fluorine (weakest halogen bond donor due to high electronegativity)
CL: Chlorine (common in drug design, moderate halogen bonding)
BR: Bromine (strong halogen bond donor, commonly studied)
I: Iodine (strongest halogen bond donor due to large, polarizable electron cloud)

Type:: List[str]

hbat.constants.pdb_constants.HYDROGEN_BOND_DONOR_ELEMENTS: List[str] = ['N', 'O', 'S', 'F']#

Elements that can act as hydrogen bond donors.

N: Nitrogen (amino groups, ring nitrogens, strong donors)
O: Oxygen (hydroxyl groups, moderate to strong donors)
S: Sulfur (thiol groups, weak donors due to lower electronegativity)

Type:: List[str]

hbat.constants.pdb_constants.HYDROGEN_BOND_ACCEPTOR_ELEMENTS: List[str] = ['N', 'O', 'S', 'F', 'CL']#

Elements that can act as hydrogen bond acceptors.

N: Nitrogen (lone pairs on amino groups, ring nitrogens)
O: Oxygen (lone pairs on carbonyl, hydroxyl, ether groups - strongest acceptors)
S: Sulfur (lone pairs on thiol, sulfide groups - weaker acceptors)
F: Fluorine (strongest electronegativity, excellent acceptor but rare in proteins)
CL: Chlorine (moderate acceptor, sometimes found in modified residues)

Type:: List[str]

hbat.constants.pdb_constants.HALOGEN_BOND_ACCEPTOR_ELEMENTS: List[str] = ['N', 'O', 'S']#

Elements that can act as halogen bond acceptors.

N: Nitrogen (lone pairs on amino groups, ring nitrogens)
O: Oxygen (lone pairs on carbonyl, hydroxyl, ether groups)
S: Sulfur (lone pairs on thiol, sulfide groups, weaker than N/O)

Type:: List[str]

Aromatic Ring Systems#

Residues and atoms involved in aromatic interactions.

hbat.constants.pdb_constants.RESIDUES_WITH_AROMATIC_RINGS: List[str] = ['PHE', 'TYR', 'TRP', 'HIS', 'HID', 'HIE', 'HIP', 'TYI', 'TYQ', 'TYB', 'DA', 'DG', 'DC', 'DT', 'A', 'G', 'C', 'U']#

Residues containing aromatic rings in their structures. This list includes: Protein residues:

PHE: Phenylalanine (benzene ring)

TYR: Tyrosine (phenolic ring)

TRP: Tryptophan (indole ring)

HIS: Histidine (imidazole ring)

HID, HIE, HIP: Different protonation states of histidine

TYI, TYQ, TYB: Variants of tyrosine with modifications

DNA nucleotides:

DA: Deoxyadenosine (purine ring: adenine)
DG: Deoxyguanosine (purine ring: guanine)
DC: Deoxycytidine (pyrimidine ring: cytosine)
DT: Deoxythymidine (pyrimidine ring: thymine)

RNA nucleotides:

A: Adenine (purine ring)
G: Guanine (purine ring)
C: Cytosine (pyrimidine ring)
U: Uracil (pyrimidine ring)

Used for:

Aromatic interaction analysis
π-π stacking detection between proteins and nucleic acids
DNA/RNA-protein interface studies

Type:: List[str]

Mapping of aromatic residues to their ring atom names.

This dictionary provides the specific atom names that form aromatic ring systems for each residue type containing aromatic groups:

Protein residues: Phenylalanine (PHE) and variants:

6-membered benzene ring: CG-CD1-CE1-CZ-CE2-CD2

Tyrosine (TYR, TYI, TYQ, TYB) and variants:

6-membered phenolic ring: CG-CD1-CE1-CZ-CE2-CD2
TYI: Ionized tyrosine (deprotonated hydroxyl)
TYQ: Quinone form of tyrosine
TYB: Brominated tyrosine

Tryptophan (TRP):

5-membered pyrrole ring: CG-CD1-NE1-CE2-CD2
6-membered benzene ring: CD2-CE2-CZ2-CH2-CZ3-CE3
Forms bicyclic indole system

Histidine (HIS, HID, HIE, HIP):

5-membered imidazole ring: CG-ND1-CE1-NE2-CD2
HID: Delta protonated (H on ND1)
HIE: Epsilon protonated (H on NE2)
HIP: Both nitrogens protonated (positive charge)

DNA nucleotides: Adenine (DA) and Guanine (DG) - Purine bases:

5-membered ring: N9-C8-N7-C5-C4

6-membered ring: C5-C6-N1-C2-N3-C4

Forms bicyclic purine system

Cytosine (DC) and Thymine (DT) - Pyrimidine bases:

6-membered ring: N1-C2-N3-C4-C5-C6

RNA nucleotides: Adenine (A) and Guanine (G) - Purine bases:

Same purine ring system as DNA counterparts

Cytosine (C) and Uracil (U) - Pyrimidine bases:

Same pyrimidine ring system as DNA counterparts

Used for:

Calculating aromatic ring centroids for π interactions
Identifying atoms involved in π-π stacking
Determining ring plane orientations
X-H…π interaction analysis where these atoms form the π system
DNA/RNA-protein interface interactions
Nucleotide base stacking analysis

Type:: Dict[str, List[str]]

Water and Solvent#

Water molecule recognition patterns.

hbat.constants.pdb_constants.WATER_MOLECULES: List[str] = ['HOH', 'WAT', 'DOD', 'TIP3', 'TIP4', 'TIP5', 'W']#

Standard water molecule residue names in PDB files.

Recognition patterns for different water representations:

HOH: Standard PDB water molecule designation
WAT: Alternative water molecule name
DOD: Deuterated water (heavy water)
TIP3: TIP3P water model (3-point)
TIP4: TIP4P water model (4-point)
TIP5: TIP5P water model (5-point)
W: Abbreviated water designation

Used for:

Water molecule identification in PDB structures
Solvent exclusion during analysis
Water-mediated interaction detection
Hydration shell analysis

Type:: List[str]

Atom Name Mapping#

PDB atom name to element conversion utilities.

Pre-computed mapping of common PDB atom names to their element types.

Coverage includes:

Protein backbone and common side chain atoms
DNA/RNA backbone and nucleotide base atoms
Standard hydrogen atoms
Water molecules

For full pattern-based mapping that handles:

Greek letter remoteness indicators (CA, CB, CG, CD, CE, CZ, CH)
Numbered variants (C1’, H2’’, OP1, etc.)
Ion charges (CA2+, MG2+, etc.)
IUPAC hydrogen naming conventions
Uncommon PDB atom names

Use pdb_atom_to_element() function instead.

Used for:

Looking up atomic properties (radius, mass, electronegativity)
Covalent bond detection
Van der Waals calculations
Molecular mass calculations

Type:: Dict[str, str]

Functions#

hbat.utilities.atom_utils.get_element_from_pdb_atom(atom_name: str) → str[source]#

Map PDB atom name to chemical element using regex patterns.

This function uses regular expressions to identify the element type from PDB atom naming conventions, handling complex cases like: - Greek letter remoteness indicators (CA, CB, CG, CD, CE, CZ, CH) - Numbered variants (C1’, H2’’, OP1, etc.) - Ion charges (CA2+, MG2+, etc.) - IUPAC hydrogen naming conventions

Parameters:: atom_name (str) – PDB atom name (e.g., ‘CA’, ‘OP1’, ‘H2’’, ‘CA2+’)
Returns:: Chemical element symbol (e.g., ‘C’, ‘O’, ‘H’, ‘CA’)
Return type:: str

Examples

>>> get_element_from_pdb_atom('CA')
'C'
>>> get_element_from_pdb_atom('OP1')
'O'
>>> get_element_from_pdb_atom('CA2+')
'CA'
>>> get_element_from_pdb_atom('H2'')
'H'

hbat.utilities.atom_utils.pdb_atom_to_element(atom_name: str) → str[source]#

High-performance mapping of PDB atom name to chemical element.

Uses a pre-computed dictionary for common atoms and falls back to regex-based pattern matching for less common cases.

Parameters:: atom_name (str) – PDB atom name
Returns:: Chemical element symbol
Return type:: str

PDB Structure Constants

Contents

PDB Structure Constants#

Module Overview#

Constants#

Residue Definitions#

Atom Classifications#

Molecular Interaction Elements#

Aromatic Ring Systems#

Water and Solvent#

Atom Name Mapping#

Functions#