RDKit interop¶

Lahuta exposes RDKit-based objects directly through lahuta.rdkit and through topology accessors. This is useful when you want chemistry-oriented graph operations (atoms, bonds, conformers, monomer metadata) while still using Lahuta for parsing and high-performance structural analysis.

In most biology workflows, the pattern is: parse structure with LahutaSystem, build topology once, then read RDKit handles from topology.

from lahuta import LahutaSystem

system = LahutaSystem("core/data/ubi.cif")
topology = system.get_or_build_topology()

mol = topology.molecule()
conf = topology.conformer(0)

print("atoms:", mol.getNumAtoms())
print("bonds:", mol.getNumBonds())
print("conformer atoms:", conf.getNumAtoms())

atom0 = mol.getAtomWithIdx(0)
print("atom 0:", atom0.getSymbol(), atom0.getAtomicNum())

Scientifically, this gives you two complementary views of the same structure:

LahutaSystem.props for fast NumPy-style coordinate and label analysis.
topology.molecule() for chemistry graph traversal and atom/bond metadata.

You can also construct and edit molecules directly from lahuta.rdkit when needed, though we really do not recommend doing this, as it's not the intended usecase of Lahuta.

import numpy as np
from lahuta.rdkit import BondType, Conformer, RWMol

mol = RWMol()
c0 = mol.addAtom(6)  # carbon
c1 = mol.addAtom(6)  # carbon
o2 = mol.addAtom(8)  # oxygen
mol.addBond(c0, c1, BondType.SINGLE)
mol.addBond(c1, o2, BondType.SINGLE)

conf = Conformer(mol.getNumAtoms())
conf.setAllAtomPositions(
    np.array([[0.0, 0.0, 0.0], [1.5, 0.0, 0.0], [2.7, 0.0, 0.0]], dtype=np.float64)
)
mol.addConformer(conf, assign_id=True)

RDKit interop also works inside pipeline tasks. If your task needs chemistry-level operations, depend on topology and then access ctx.get_topology().molecule().

from lahuta.pipeline import Pipeline, PipelineContext
from lahuta.sources import FileSource


def summarize_graph(ctx: PipelineContext) -> dict:
    top = ctx.get_topology()
    mol = top.molecule() if top is not None else None
    return {
        "atoms": int(mol.getNumAtoms()) if mol is not None else 0,
        "bonds": int(mol.getNumBonds()) if mol is not None else 0,
    }


p = Pipeline(FileSource(["core/data/ubi.cif"]))
p.add_task(name="graph_summary", task=summarize_graph, depends=["topology"])

For most users, the best default is to stay in Lahuta APIs first and use RDKit interop where you specifically need graph-centric chemistry features.