CCD Analyzer#

This module provides efficient parsing and lookup functionality for CCD BinaryCIF files, with automatic download capabilities and in-memory data structures optimized for fast atom and bond lookups by residue and atom IDs.

Classes#

class hbat.ccd.ccd_analyzer.CCDDataManager(ccd_folder: str | None = None)[source]#

Bases: object

Manages Chemical Component Dictionary data with efficient lookup capabilities.

This class handles automatic download of CCD BinaryCIF files and provides optimized in-memory data structures for fast lookups of atoms and bonds by component ID and atom ID.

Methods

ensure_files_exist

Ensure CCD BinaryCIF files exist, downloading if necessary.

load_atoms_data

Load and parse atom data from CCD BinaryCIF file into memory.

load_bonds_data

Load and parse bond data from CCD BinaryCIF file into memory.

get_component_atoms

Get all atoms for a specific component.

get_component_bonds

Get all bonds for a specific component.

get_atom_by_id

Get a specific atom by component and atom ID.

get_bonds_involving_atom

Get all bonds involving a specific atom.

get_available_components

Get set of all available component IDs.

get_component_summary

Get summary information for a component.

extract_residue_bonds_data

Extract bond information for a list of residues in a format suitable for constants generation.

Attributes

ccd_folder: str#

Path to folder storing CCD BinaryCIF files

atom_file: str#

Path to the CCD atom data file (cca.bcif)

bond_file: str#

Path to the CCD bond data file (ccb.bcif)

atom_url: str#

URL for downloading CCD atom data (https://models.rcsb.org/cca.bcif)

bond_url: str#

URL for downloading CCD bond data (https://models.rcsb.org/ccb.bcif)

__init__(ccd_folder: str | None = None)[source]#

Initialize the CCD data manager.

Parameters:

ccd_folder – Path to folder for storing CCD BinaryCIF files. If None, uses the user’s ~/.hbat/ccd-data directory.

ensure_files_exist() bool[source]#

Ensure CCD BinaryCIF files exist, downloading if necessary.

Returns:

True if files are available, False if download failed

load_atoms_data() bool[source]#

Load and parse atom data from CCD BinaryCIF file into memory.

Returns:

True if successful, False otherwise

load_bonds_data() bool[source]#

Load and parse bond data from CCD BinaryCIF file into memory.

Returns:

True if successful, False otherwise

get_component_atoms(comp_id: str) List[Dict][source]#

Get all atoms for a specific component.

Parameters:

comp_id – Component identifier (e.g., ‘ALA’, ‘GLY’)

Returns:

List of atom dictionaries for the component

get_component_bonds(comp_id: str) List[Dict][source]#

Get all bonds for a specific component.

Parameters:

comp_id – Component identifier (e.g., ‘ALA’, ‘GLY’)

Returns:

List of bond dictionaries for the component

get_atom_by_id(comp_id: str, atom_id: str) Dict | None[source]#

Get a specific atom by component and atom ID.

Parameters:
  • comp_id – Component identifier

  • atom_id – Atom identifier

Returns:

Atom dictionary if found, None otherwise

get_bonds_involving_atom(comp_id: str, atom_id: str) List[Dict][source]#

Get all bonds involving a specific atom.

Parameters:
  • comp_id – Component identifier

  • atom_id – Atom identifier

Returns:

List of bond dictionaries involving the atom

get_available_components() Set[str][source]#

Get set of all available component IDs.

Returns:

Set of component identifiers

get_component_summary(comp_id: str) Dict[source]#

Get summary information for a component.

Parameters:

comp_id – Component identifier

Returns:

Dictionary with component summary

extract_residue_bonds_data(residue_list: List[str]) Dict[str, Dict][source]#

Extract bond information for a list of residues in a format suitable for constants generation.

Parameters:

residue_list – List of residue codes to extract data for

Returns:

Dictionary mapping residue codes to their bond information

Examples#

Basic usage of CCDDataManager:

from hbat.ccd.ccd_analyzer import CCDDataManager

# Initialize with default directory
manager = CCDDataManager()

# Or specify custom directory
manager = CCDDataManager("/path/to/ccd/data")

# Ensure files are downloaded
if manager.ensure_files_exist():
    print("CCD files ready")

# Get all atoms for a residue
ala_atoms = manager.get_component_atoms("ALA")
for atom in ala_atoms:
    print(f"{atom['atom_id']}: {atom['type_symbol']}")

# Get bonds for a residue
ala_bonds = manager.get_component_bonds("ALA")
for bond in ala_bonds:
    print(f"{bond['atom_id_1']} - {bond['atom_id_2']}: {bond['value_order']}")

# Get specific atom
ca_atom = manager.get_atom_by_id("ALA", "CA")
if ca_atom:
    print(f"CA atom type: {ca_atom['type_symbol']}")

# Get bonds involving specific atom
ca_bonds = manager.get_bonds_involving_atom("ALA", "CA")
print(f"CA participates in {len(ca_bonds)} bonds")

# Get component summary
summary = manager.get_component_summary("ALA")
print(f"Alanine has {summary['atom_count']} atoms and {summary['bond_count']} bonds")

Data Format#

The CCD data includes the following information:

Atom Data (from cca.bcif):
  • comp_id: Component/residue identifier (e.g., “ALA”)

  • atom_id: Atom name within the component (e.g., “CA”)

  • type_symbol: Element symbol (e.g., “C”, “N”, “O”)

  • Additional properties like charge, coordinates, etc.

Bond Data (from ccb.bcif):
  • comp_id: Component/residue identifier

  • atom_id_1: First atom in the bond

  • atom_id_2: Second atom in the bond

  • value_order: Bond order (“SING”, “DOUB”, “AROM”, etc.)

  • pdbx_aromatic_flag: Aromaticity indicator (“Y” or “N”)

Performance Notes#

  • Data is loaded lazily on first access

  • In-memory lookup structures provide O(1) access by component and atom ID

  • Initial loading may take a few seconds for the full CCD dataset

  • Once loaded, lookups are extremely fast