Skip to content

System

LahutaSystem is the main Python object to parse structure data. You start with a structure file and keep this object through most analysis. Lahuta support PDB, CIF (either compressed or not) and GRO + XTC files (for MD data).

from pathlib import Path

from lahuta import LahutaSystem

path = Path("core/data/ubi.cif")
system = LahutaSystem(str(path))

print("atoms:", system.n_atoms)
print("source:", system.file_name)
print("is_model_input:", system.is_model)

Once the system is loaded, system.props gives you NumPy-based structural arrays. These arrays are the fast analysis layer for atom-level work: coordinates, atom names, residue ids, residue names, and chain labels.

p = system.props

print("positions:", p.positions.shape, p.positions.dtype)  # copy
print("centroid [A]:", p.positions.mean(axis=0))
print("atom names:", p.names[:8])
print("residue names:", p.resnames[:8])
print("residue ids:", p.resids[:8])
print("chains:", p.chainlabels[:8])

For coordinates, Lahuta exposes both a copy (p.positions) and a zero-copy view (p.positions_view). Use positions_view when you want to minimize memory movement in large runs.

xyz_copy = p.positions
xyz_view = p.positions_view
print("same shape:", xyz_copy.shape == xyz_view.shape)

However, note that in the vast majority of cases, the processing that you do after getting the positions is much more expensive than the difference between copy/view of the underlying data. Unless, you know what you are doing, you should be using p.positions.

You can also inspect residue-level grouping directly from the parsed system, even before topology is built:

residues = system.residues
print("n_residues:", len(residues))
print("first residue:", residues[0])

Accessing system.residues does not trigger topology computation. It is a direct residue view over the parsed structure.

When you move to contact analysis or other chemistry-aware analysis, you explicitly build topology from this system. The next page covers that in detail: Topology.

For AlphaFold-derived model CIF files, use model input mode explicitly:

from lahuta import InputType, LahutaSystem

model_system = LahutaSystem("core/data/fubi.cif", input_type=InputType.AlphaFold)
model_system.build_topology()
print("is_model_input:", model_system.is_model)

InputType.AlphaFold is optimized for AlphaFold-like model CIF inputs and is stricter than generic parsing, but much more performant.

Continue to Topology for the full topology concepts and APIs.