CLI¶
The Lahuta CLI is a convenient way to run structural analyses without writing Python. It is useful when you want fast, repeatable runs over many structures, then post-process JSONL outputs.
Showcase¶
Start with calling --help to see available commands and options.
$ lahuta --help
Usage: lahuta [global-options] <subcommand> [subcommand-options]
Author: Besian I. Sejdiu (@bisejdiu)
Lahuta runs structural analysis, such as contacts computations, at scale.
Available subcommands:
compaction-rg Compute radius of gyration with pLDDT/DSSP trimming.
contacts Compute inter-atomic contacts.
createdb Create a database from alphafold2 models.
dssp Compute DSSP secondary-structure assignments.
extract Extract data from AlphaFold2 model files or databases.
positions Extract 3D atomic coordinates from model files or databases.
quality-metrics Compute per-protein group metrics for pLDDT and DSSP signals.
sasa-sr Compute Shrake-Rupley solvent accessible surface area.
shape-metrics Compute tensor-based shape metrics with pLDDT/DSSP trimming.
Compute contacts from a directory of files¶
$ lahuta contacts -d core/data -e .cif -p arpeggio --stdout --threads 4
[info] Writing to: stdout
{"file_path":"core/data/1kx2_small.cif","success":true,"provider":"arpeggio","contact_type":"All","contacts":[...]}
[info] Contact computation completed successfully!
See --provider for how to switch interaction engines.
When you process AlphaFold model directories, add --is_af2_model so Lahuta can use builtin optimizations.
bsejdiu@EU-L221330 ~/p/lahuta (licenses)> build/cli/lahuta contacts -d /Users/bsejdiu/data/UP000005640_9606_HUMAN --is_af2_model -o out.jsonl -t 10
[info] Source directory: /Users/bsejdiu/data/UP000005640_9606_HUMAN
[info] Extensions: .cif, .cif.gz
[info] Recursive: No
[info] Writing to: out.jsonl
[info] Directory: scanning directory '/Users/bsejdiu/data/UP000005640_9606_HUMAN', filter='*.cif, *.cif.gz'; batch_size=200 (recursive=no)
[progress] contacts: done=23586 ok=23586 skip=0 inflight=0 rate=1814.83/s elapsed=00:13.0
[info] contacts pipeline summary (in seconds):
[info] total: 13.050
[info] cpu: 129.309
[info] io: 0.085
[info] ingest: 0.075
[info] prepare: 0.010
[info] flush: 0.000
[info] setup: 0.006
[info] compute: 129.304
[info] contacts pipeline items:
[info] total: 23586
[info] processed: 23586
[info] skipped: 0
[info] throughput: 1807.33 items/s
[info] contacts pipeline resources:
[info] stages: 1
[info] threads requested: 10
[info] threads used: 10
[info] all thread safe: yes
[info] run token: 1
[info] Contact computation completed successfully!
[info] Output written to: out.jsonl
Extract AlphaFold model metadata¶
For AlphaFold-like model files, extract gives sequence-level information such as sequence,
pLDDT confidence classes, and DSSP annotations.
$ lahuta extract --fields sequence,plddt -f core/data/models/AF-P0CL56-F1-model_v4.cif.gz --is_af2_model --stdout
[info] Writing to: stdout
{"model":"core/data/models/AF-P0CL56-F1-model_v4.cif.gz","sequence":"MRLKKRFKKFFISRKEYEKIEEILDIGLAKAMEETKDDELLTYDEIKELLGDK"}
{"model":"core/data/models/AF-P0CL56-F1-model_v4.cif.gz","plddt_sequence":"PLLLLLLHHHHEEEEEEEEEEEEEEEEEEEEEEEEHEEHHHHHEEHHEHHHHL"}
[info] Extraction completed successfully!
Compute SASA-SR on an AlphaFold directory¶
sasa-sr computes Shrake-Rupley solvent accessible surface area. It's currently only enabled for AlphaFold models (--is_af2_model)
$ lahuta sasa-sr -d /Users/bsejdiu/data/UP000005640_9606_HUMAN --is_af2_model -o out.jsonl -t 10
[info] Writing to: out.jsonl
[info] SASA-SR probe radius: 1.4
[info] SASA-SR points: 128
[info] Directory: scanning directory '/Users/bsejdiu/data/UP000005640_9606_HUMAN', filter='*.cif, *.cif.gz, *.pdb, *.pdb.gz'; batch_size=512 (recursive=no)
[progress] sasa-sr: done=23586 ok=23586 skip=0 inflight=0 rate=317.47/s elapsed=01:13.5
[info] sasa-sr pipeline summary (in seconds):
[info] total: 73.528
[info] cpu: 734.975
[info] io: 0.076
[info] ingest: 0.065
[info] prepare: 0.011
[info] flush: 0.000
[info] setup: 0.004
[info] compute: 734.971
[info] sasa-sr pipeline items:
[info] total: 23586
[info] processed: 23586
[info] skipped: 0
[info] throughput: 320.78 items/s
[info] sasa-sr pipeline resources:
[info] stages: 2
[info] threads requested: 10
[info] threads used: 10
[info] all thread safe: yes
[info] run token: 1
[info] SASA-SR computation completed successfully!
[info] Output written to: out.jsonl
Compute DSSP assignments¶
DSSP summarizes local secondary structure (helix, strand, coil-like states).
$ lahuta dssp -f core/data/1kx2_small.cif --stdout
[info] Writing to: stdout
{"model":"core/data/1kx2_small.cif","dssp":"CCCSCHHHHHHHSTTSSTTTTGGGPPPTTCSTTTHHHHTTCSTTHHHHHHHTCTTSPGGGGCSSPPHHHHHHHHHHHTSCC"}
[info] DSSP computation completed successfully!
Build a database and read from it¶
For better performance, create a database once, then run extraction and analytics from the database.
$ lahuta createdb -d /data/UP000005640_9606_HUMAN -o human_af_db --threads 10 --batch-size 1000
[info] Creating database...
[info] Database creation completed successfully!
$ lahuta extract --fields sequence,plddt --database human_af_db -o human_seq_plddt.jsonl --threads 10
[info] Writing to: human_seq_plddt.jsonl
[info] Extraction completed successfully!