System¶
LahutaSystem is the main Python object to parse structure data. You start with a structure file and keep this object through most analysis.
Lahuta support PDB, CIF (either compressed or not) and GRO + XTC files (for MD data).
from pathlib import Path
from lahuta import LahutaSystem
path = Path("core/data/ubi.cif")
system = LahutaSystem(str(path))
print("atoms:", system.n_atoms)
print("source:", system.file_name)
print("is_model_input:", system.is_model)
Once the system is loaded, system.props gives you NumPy-based structural arrays.
These arrays are the fast analysis layer for atom-level work: coordinates, atom names, residue ids, residue names, and chain labels.
p = system.props
print("positions:", p.positions.shape, p.positions.dtype) # copy
print("centroid [A]:", p.positions.mean(axis=0))
print("atom names:", p.names[:8])
print("residue names:", p.resnames[:8])
print("residue ids:", p.resids[:8])
print("chains:", p.chainlabels[:8])
For coordinates, Lahuta exposes both a copy (p.positions) and a zero-copy view (p.positions_view). Use positions_view when you want to minimize memory movement in large runs.
xyz_copy = p.positions
xyz_view = p.positions_view
print("same shape:", xyz_copy.shape == xyz_view.shape)
However, note that in the vast majority of cases, the processing that you do after getting the positions is much more expensive than the difference between copy/view of the
underlying data. Unless, you know what you are doing, you should be using p.positions.
You can also inspect residue-level grouping directly from the parsed system, even before topology is built:
residues = system.residues
print("n_residues:", len(residues))
print("first residue:", residues[0])
Accessing system.residues does not trigger topology computation. It is a direct residue view over the parsed structure.
When you move to contact analysis or other chemistry-aware analysis, you explicitly build topology from this system. The next page covers that in detail: Topology.
For AlphaFold-derived model CIF files, use model input mode explicitly:
from lahuta import InputType, LahutaSystem
model_system = LahutaSystem("core/data/fubi.cif", input_type=InputType.AlphaFold)
model_system.build_topology()
print("is_model_input:", model_system.is_model)
InputType.AlphaFold is optimized for AlphaFold-like model CIF inputs and is stricter than generic parsing, but much more performant.
Continue to Topology for the full topology concepts and APIs.