CCD Analyzer#
This module provides efficient parsing and lookup functionality for CCD BinaryCIF files, with automatic download capabilities and in-memory data structures optimized for fast atom and bond lookups by residue and atom IDs.
Classes#
- class hbat.ccd.ccd_analyzer.CCDDataManager(ccd_folder: str | None = None)[source]#
Bases:
object
Manages Chemical Component Dictionary data with efficient lookup capabilities.
This class handles automatic download of CCD BinaryCIF files and provides optimized in-memory data structures for fast lookups of atoms and bonds by component ID and atom ID.
Methods
Ensure CCD BinaryCIF files exist, downloading if necessary.
Load and parse atom data from CCD BinaryCIF file into memory.
Load and parse bond data from CCD BinaryCIF file into memory.
Get all atoms for a specific component.
Get all bonds for a specific component.
Get a specific atom by component and atom ID.
Get all bonds involving a specific atom.
Get set of all available component IDs.
Get summary information for a component.
Extract bond information for a list of residues in a format suitable for constants generation.
Attributes
- atom_url: str#
URL for downloading CCD atom data (https://models.rcsb.org/cca.bcif)
- bond_url: str#
URL for downloading CCD bond data (https://models.rcsb.org/ccb.bcif)
- __init__(ccd_folder: str | None = None)[source]#
Initialize the CCD data manager.
- Parameters:
ccd_folder – Path to folder for storing CCD BinaryCIF files. If None, uses the user’s ~/.hbat/ccd-data directory.
- ensure_files_exist() bool [source]#
Ensure CCD BinaryCIF files exist, downloading if necessary.
- Returns:
True if files are available, False if download failed
- load_atoms_data() bool [source]#
Load and parse atom data from CCD BinaryCIF file into memory.
- Returns:
True if successful, False otherwise
- load_bonds_data() bool [source]#
Load and parse bond data from CCD BinaryCIF file into memory.
- Returns:
True if successful, False otherwise
- get_component_atoms(comp_id: str) List[Dict] [source]#
Get all atoms for a specific component.
- Parameters:
comp_id – Component identifier (e.g., ‘ALA’, ‘GLY’)
- Returns:
List of atom dictionaries for the component
- get_component_bonds(comp_id: str) List[Dict] [source]#
Get all bonds for a specific component.
- Parameters:
comp_id – Component identifier (e.g., ‘ALA’, ‘GLY’)
- Returns:
List of bond dictionaries for the component
- get_atom_by_id(comp_id: str, atom_id: str) Dict | None [source]#
Get a specific atom by component and atom ID.
- Parameters:
comp_id – Component identifier
atom_id – Atom identifier
- Returns:
Atom dictionary if found, None otherwise
- get_bonds_involving_atom(comp_id: str, atom_id: str) List[Dict] [source]#
Get all bonds involving a specific atom.
- Parameters:
comp_id – Component identifier
atom_id – Atom identifier
- Returns:
List of bond dictionaries involving the atom
- get_available_components() Set[str] [source]#
Get set of all available component IDs.
- Returns:
Set of component identifiers
- get_component_summary(comp_id: str) Dict [source]#
Get summary information for a component.
- Parameters:
comp_id – Component identifier
- Returns:
Dictionary with component summary
- extract_residue_bonds_data(residue_list: List[str]) Dict[str, Dict] [source]#
Extract bond information for a list of residues in a format suitable for constants generation.
- Parameters:
residue_list – List of residue codes to extract data for
- Returns:
Dictionary mapping residue codes to their bond information
Examples#
Basic usage of CCDDataManager:
from hbat.ccd.ccd_analyzer import CCDDataManager
# Initialize with default directory
manager = CCDDataManager()
# Or specify custom directory
manager = CCDDataManager("/path/to/ccd/data")
# Ensure files are downloaded
if manager.ensure_files_exist():
print("CCD files ready")
# Get all atoms for a residue
ala_atoms = manager.get_component_atoms("ALA")
for atom in ala_atoms:
print(f"{atom['atom_id']}: {atom['type_symbol']}")
# Get bonds for a residue
ala_bonds = manager.get_component_bonds("ALA")
for bond in ala_bonds:
print(f"{bond['atom_id_1']} - {bond['atom_id_2']}: {bond['value_order']}")
# Get specific atom
ca_atom = manager.get_atom_by_id("ALA", "CA")
if ca_atom:
print(f"CA atom type: {ca_atom['type_symbol']}")
# Get bonds involving specific atom
ca_bonds = manager.get_bonds_involving_atom("ALA", "CA")
print(f"CA participates in {len(ca_bonds)} bonds")
# Get component summary
summary = manager.get_component_summary("ALA")
print(f"Alanine has {summary['atom_count']} atoms and {summary['bond_count']} bonds")
Data Format#
The CCD data includes the following information:
- Atom Data (from cca.bcif):
comp_id
: Component/residue identifier (e.g., “ALA”)atom_id
: Atom name within the component (e.g., “CA”)type_symbol
: Element symbol (e.g., “C”, “N”, “O”)Additional properties like charge, coordinates, etc.
- Bond Data (from ccb.bcif):
comp_id
: Component/residue identifieratom_id_1
: First atom in the bondatom_id_2
: Second atom in the bondvalue_order
: Bond order (“SING”, “DOUB”, “AROM”, etc.)pdbx_aromatic_flag
: Aromaticity indicator (“Y” or “N”)
Performance Notes#
Data is loaded lazily on first access
In-memory lookup structures provide O(1) access by component and atom ID
Initial loading may take a few seconds for the full CCD dataset
Once loaded, lookups are extremely fast