Skip to content

CLI

The Lahuta CLI is a convenient way to run structural analyses without writing Python. It is useful when you want fast, repeatable runs over many structures, then post-process JSONL outputs.

Showcase

Start with calling --help to see available commands and options.

$ lahuta --help
Usage: lahuta [global-options] <subcommand> [subcommand-options]
Author: Besian I. Sejdiu (@bisejdiu)

Lahuta runs structural analysis, such as contacts computations, at scale.

Available subcommands:
  compaction-rg    Compute radius of gyration with pLDDT/DSSP trimming.
  contacts         Compute inter-atomic contacts.
  createdb         Create a database from alphafold2 models.
  dssp             Compute DSSP secondary-structure assignments.
  extract          Extract data from AlphaFold2 model files or databases.
  positions        Extract 3D atomic coordinates from model files or databases.
  quality-metrics  Compute per-protein group metrics for pLDDT and DSSP signals.
  sasa-sr          Compute Shrake-Rupley solvent accessible surface area.
  shape-metrics    Compute tensor-based shape metrics with pLDDT/DSSP trimming.

Compute contacts from a directory of files

$ lahuta contacts -d core/data -e .cif -p arpeggio --stdout --threads 4
[info] Writing to: stdout
{"file_path":"core/data/1kx2_small.cif","success":true,"provider":"arpeggio","contact_type":"All","contacts":[...]}
[info] Contact computation completed successfully!

See --provider for how to switch interaction engines.

When you process AlphaFold model directories, add --is_af2_model so Lahuta can use builtin optimizations.

bsejdiu@EU-L221330 ~/p/lahuta (licenses)> build/cli/lahuta contacts -d /Users/bsejdiu/data/UP000005640_9606_HUMAN --is_af2_model -o out.jsonl -t 10
[info] Source directory: /Users/bsejdiu/data/UP000005640_9606_HUMAN
[info] Extensions: .cif, .cif.gz
[info] Recursive: No
[info] Writing to: out.jsonl
[info] Directory: scanning directory '/Users/bsejdiu/data/UP000005640_9606_HUMAN', filter='*.cif, *.cif.gz'; batch_size=200 (recursive=no)
[progress] contacts: done=23586 ok=23586 skip=0 inflight=0 rate=1814.83/s elapsed=00:13.0
[info] contacts pipeline summary (in seconds):
[info]   total:              13.050
[info]   cpu:               129.309
[info]   io:                  0.085
[info]   ingest:              0.075
[info]   prepare:             0.010
[info]   flush:               0.000
[info]   setup:               0.006
[info]   compute:           129.304
[info] contacts pipeline items:
[info]   total:               23586
[info]   processed:           23586
[info]   skipped:                 0
[info]   throughput:        1807.33 items/s
[info] contacts pipeline resources:
[info]   stages:                  1
[info]   threads requested:      10
[info]   threads used:           10
[info]   all thread safe:       yes
[info]   run token:               1
[info] Contact computation completed successfully!
[info] Output written to: out.jsonl

Extract AlphaFold model metadata

For AlphaFold-like model files, extract gives sequence-level information such as sequence, pLDDT confidence classes, and DSSP annotations.

$ lahuta extract --fields sequence,plddt -f core/data/models/AF-P0CL56-F1-model_v4.cif.gz --is_af2_model --stdout
[info] Writing to: stdout
{"model":"core/data/models/AF-P0CL56-F1-model_v4.cif.gz","sequence":"MRLKKRFKKFFISRKEYEKIEEILDIGLAKAMEETKDDELLTYDEIKELLGDK"}
{"model":"core/data/models/AF-P0CL56-F1-model_v4.cif.gz","plddt_sequence":"PLLLLLLHHHHEEEEEEEEEEEEEEEEEEEEEEEEHEEHHHHHEEHHEHHHHL"}
[info] Extraction completed successfully!

Compute SASA-SR on an AlphaFold directory

sasa-sr computes Shrake-Rupley solvent accessible surface area. It's currently only enabled for AlphaFold models (--is_af2_model)

$ lahuta sasa-sr -d /Users/bsejdiu/data/UP000005640_9606_HUMAN --is_af2_model -o out.jsonl -t 10
[info] Writing to: out.jsonl
[info] SASA-SR probe radius: 1.4
[info] SASA-SR points: 128
[info] Directory: scanning directory '/Users/bsejdiu/data/UP000005640_9606_HUMAN', filter='*.cif, *.cif.gz, *.pdb, *.pdb.gz'; batch_size=512 (recursive=no)
[progress] sasa-sr: done=23586 ok=23586 skip=0 inflight=0 rate=317.47/s elapsed=01:13.5
[info] sasa-sr pipeline summary (in seconds):
[info]   total:              73.528
[info]   cpu:               734.975
[info]   io:                  0.076
[info]   ingest:              0.065
[info]   prepare:             0.011
[info]   flush:               0.000
[info]   setup:               0.004
[info]   compute:           734.971
[info] sasa-sr pipeline items:
[info]   total:               23586
[info]   processed:           23586
[info]   skipped:                 0
[info]   throughput:         320.78 items/s
[info] sasa-sr pipeline resources:
[info]   stages:                  2
[info]   threads requested:      10
[info]   threads used:           10
[info]   all thread safe:       yes
[info]   run token:               1
[info] SASA-SR computation completed successfully!
[info] Output written to: out.jsonl

Compute DSSP assignments

DSSP summarizes local secondary structure (helix, strand, coil-like states).

$ lahuta dssp -f core/data/1kx2_small.cif --stdout
[info] Writing to: stdout
{"model":"core/data/1kx2_small.cif","dssp":"CCCSCHHHHHHHSTTSSTTTTGGGPPPTTCSTTTHHHHTTCSTTHHHHHHHTCTTSPGGGGCSSPPHHHHHHHHHHHTSCC"}
[info] DSSP computation completed successfully!

Build a database and read from it

For better performance, create a database once, then run extraction and analytics from the database.

$ lahuta createdb -d /data/UP000005640_9606_HUMAN -o human_af_db --threads 10 --batch-size 1000
[info] Creating database...
[info] Database creation completed successfully!

$ lahuta extract --fields sequence,plddt --database human_af_db -o human_seq_plddt.jsonl --threads 10
[info] Writing to: human_seq_plddt.jsonl
[info] Extraction completed successfully!