Skip to content

Topology

Topology is Lahuta's computed interpretation layer on top of a parsed LahutaSystem. In biological terms, LahutaSystem gives you atoms, residues, chains, and coordinates as read from the structure file, while Topology adds computed structural chemistry: neighborhood relationships, inferred bonds, rings, functional groups, and atom typing used by contact analysis.

This separation is intentional. You can inspect structure arrays quickly from system.props, then build topology only when you need chemistry-aware analysis.

from pathlib import Path

from lahuta import LahutaSystem

system = LahutaSystem(str(Path("core/data/ubi.cif")))
ok = system.build_topology()
if not ok:
    raise RuntimeError("Topology build failed")

top = system.get_topology()
print("residues:", len(top.residues))
print("rings:", len(top.rings))
print("groups:", len(top.groups))
print("typed atoms:", len(top.atom_records))

Customizing Topology Build

You can customize how topology is computed with TopologyBuildingOptions and TopologyComputers.

TopologyBuildingOptions controls numerical and chemistry options:

  • cutoff: neighbor-search cutoff used during bond perception.
  • compute_nonstandard_bonds: include nonstandard/coordination-style bonds.
  • atom_typing_method: atom typing backend (MolStar, Arpeggio, GetContacts).
  • auto_heal: resolve required internal dependencies automatically.

TopologyComputers controls which computation stages are requested. It is a bitmask enum, so you can combine stages with | or pass a list.

from lahuta import AtomTypingMethod, TopologyBuildingOptions, TopologyComputers

opts = TopologyBuildingOptions()
opts.cutoff = 4.5
opts.compute_nonstandard_bonds = True
opts.atom_typing_method = AtomTypingMethod.MolStar

ok = system.build_topology(
    opts,
    include=TopologyComputers.Standard | TopologyComputers.Rings | TopologyComputers.AtomTyping,
)
if not ok:
    raise RuntimeError("Topology build failed")

top = system.get_topology()
print("rings computed:", top.has_computed(TopologyComputers.Rings))
print("typing computed:", top.has_computed(TopologyComputers.AtomTyping))

These stage flags are most meaningful on the generic, non alphafold, pathway.
In model mode (InputType.AlphaFold), Lahuta uses the specialized model topology route rather than the full generic inference approach.

Dependency Resolution And Incremental Build

Topology stages depend on each other, and Lahuta resolves those dependencies automatically when needed.

The core relationships are:

  • Bonds requires Neighbors
  • Rings requires Bonds and Residues
  • AtomTyping requires Rings

That means you can request a high-level stage and Lahuta will run the required prerequisites. In practice, this gives a "what you need is what you pay for" approach: if you only need Neighbors, you can stop there. If you later need Rings, Lahuta adds only the missing work.

from lahuta import TopologyComputers

sys = LahutaSystem("core/data/ubi.cif")

# First request: only neighbors
sys.build_topology(include=TopologyComputers.Neighbors)
top = sys.get_topology()
print("neighbors:", top.has_computed(TopologyComputers.Neighbors))  # True
print("bonds:", top.has_computed(TopologyComputers.Bonds))          # False

# Later request: bonds
sys.build_topology(include=TopologyComputers.Bonds)
print("bonds:", top.has_computed(TopologyComputers.Bonds))          # True

# Later request: atom typing (pulls required dependencies)
sys.build_topology(include=TopologyComputers.AtomTyping)
print("rings:", top.has_computed(TopologyComputers.Rings))          # True
print("typing:", top.has_computed(TopologyComputers.AtomTyping))    # True

Calling build_topology(...) again is normal. Previously completed stages stay available, and new calls focus on any newly requested stages instead of starting from scratch.

If you only need specific stages, request only those stages:

system2 = LahutaSystem("core/data/ubi.cif")
system2.build_topology(include=TopologyComputers.Neighbors)
top2 = system2.get_topology()

print("neighbors:", top2.has_computed(TopologyComputers.Neighbors))
print("bonds:", top2.has_computed(TopologyComputers.Bonds))

Working With Topology Data

Once topology exists, you can access:

  • top.residues: residue container view.
  • top.atom_records: per-atom typed records.
  • top.rings: ring views with geometry (center, normal).
  • top.groups: functional-group views.
top = system.get_topology()

first_res = top.residues[0]
print("first residue:", first_res)

if top.rings:
    ring0 = top.rings[0]
    print("ring size:", ring0.size, "aromatic:", ring0.aromatic)

from lahuta import AtomType

hydrophobic = top.atoms_with_type(AtomType.Hydrophobic)
print("hydrophobic atoms:", len(hydrophobic))

You can also re-run atom typing on an existing topology:

from lahuta import AtomTypingMethod

top.assign_typing(AtomTypingMethod.Arpeggio)

Generic vs Model Topology

Lahuta has two topology pathways, selected from how the system is created.

Generic pathway:

  • Default when you create LahutaSystem(path) without model mode.
  • Intended for regular structure inputs (experimental PDB/mmCIF-style parsing, with ligands, DNA/RNA, ions, lipids, etc.).
  • Runs the full generic topology flow (neighbor/bond/ring/typing stages as requested).

Model pathway:

  • Use for AlphaFold-derived model CIF inputs.
  • Create the system with input_type=InputType.AlphaFold.
  • Uses a specialized, stricter model path optimized for AlphaFold-like files.
  • Does not run the full generic neighbor/bond/ring/atom-typing inference flow.
from lahuta import InputType, LahutaSystem

model_system = LahutaSystem("core/data/fubi.cif", input_type=InputType.AlphaFold)
model_system.build_topology()
print("is model origin:", model_system.is_model)

In pipeline analysis, this same choice is controlled with:

  • p.params("system").is_model = True for AlphaFold-derived model CIF inputs.
  • p.params("topology").flags = ... and p.params("topology").atom_typing_method = ... to customize topology computation.