-
Notifications
You must be signed in to change notification settings - Fork 25
/
Copy pathloos.dox
3546 lines (3035 loc) · 149 KB
/
loos.dox
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
/*!
\mainpage Lightweight Object-Oriented Structure library (LOOS)
Copyright © 2008-2024, Tod D. Romo, Alan Grossfield\n
Department of Biochemistry and Biophysics\n
School of Medicine & Dentistry, University of Rochester\n
\image html https://www.gnu.org/graphics/gplv3-with-text-84x42.png
<hr>
<h2>Quick Links</h2>
- <a href="https://github.com/GrossfieldLab/loos">Clone LOOS from GitHub</a>
- <a href="https://github.com/GrossfieldLab/loos/wiki">Online tutorials and documentation</a>
- <a href="http://membrane.urmc.rochester.edu/sites/default/files/loos.pdf">Download a tutorial (pdf) on using LOOS</a>
- <a href="https://t.co/bznyDZBNd4?amp=1">YouTube video describing LOOS</a>
- Ask the developers a question at [email protected] or by opening an issue on GitHub.
- \subpage building "Building LOOS"
- \subpage tools "Tools included with LOOS"
- \subpage selections "Selection Language"
- \subpage common_options "Common Options for Tools"
- \subpage formats "Supported File Formats"
- \subpage citing "Citing LOOS in published work"
- \subpage exceptions "Exceptions in LOOS and PyLOOS"
- \subpage changes "Changes"
- \subpage faq "FAQ"
- \ref faq_pyloos "FAQ for PyLOOS"
- Posters and presentations on LOOS
- <a href="http://membrane.urmc.rochester.edu/sites/default/files/posters_bps_2024/loos-2024.pdf">2020 Biophysical Society Poster</a>
\section intro Introduction
Welcome to LOOS, a product of the Grossfield Lab at the University of
Rochester Medical School and the Department of Biochemistry and Biophysics.
Our goal in developing LOOS is to make it easier to analyze molecular
dynamics simulations. LOOS is a code library for developing new analysis
applications, designed to simplify the common tasks (reading structure and
trajectory files, selecting atoms, computing geometric quantities) found in
almost every application. Moreover, it is distributed with a
number of useful standalone tools, ranging from simple things like radial
distribution functions and radii of gyration to principal component
analysis to sophisticated methods for analyzing statistical errors.
LOOS has several major features:
- It transparently reads the native file formats for most major
biomolecular simulation pacakges, including CHARMM, NAMD, gromacs,
AMBER, and Tinker.
- It uses an expressive syntax to allow selection of atoms using
all available metadata (e.g. atom number, residue name, etc).
For more information see \subpage selections "Selection Language".
- It allows novel analysis applications to be developed rapidly, with
minimal programming skill required. For example, atoms are referred
to using reference-counted shared pointers, which retain the benefits
of using pointers (e.g. rapid, lightweight copying) without requiring
the developer to do manual memory management.
- It should run on any unix-like environment, and is tested under
multiple linux versions and Mac OSX.
For assistance using LOOS, to suggest a patch, to request a feature, or simply
to offer positive feedback, email loos.maintainer [AT] gmail.com. The
latest version of LOOS can be found at our <a href="https://github.com/GrossfieldLab/loos">GitHub page</a>.
We strongly suggest that you also follow LOOS on Github.
\section Applications
Although we primarily view LOOS as a development platform -- a tool for
making tools -- it is distributed with a number of prebuilt applications.
The included tools were developed in the course of research in the
Grossfield lab, but we believe them to be generally useful enough to merit
their inclusion. Some of the code for these programs is found in the Tools/
directories, while other related programs are grouped together as Packages
(e.g. Packages/Convergence/).
For more information, see the
\subpage tools "Tools page"
In addition to providing valuable functionality (principal component
analysis, structure alignment, etc), these applications can also be useful
as templates for developing new applications using LOOS. We have taken a
general design approach of developing relatively simple, single-purpose
tools, as we believe that makes it easier to quickly add functionality and
experiment with analysis methods, without the overhead of integrating with
a larger package. Many (if not most) analysis involve the same sequence of
steps: read a description of the system (e.g. a PDB, PSF, parmtop, or gro
file), select which atoms will be examined, and then, for each frame in a
trajectory, compute some geometric quantity using the coodinates (e.g.
their centroid or moments of inertia).
\section Bugs
There are none...only features. So don't worry about them!
Either mail us directly (loos.maintainer [AT] gmail.com) or raise an issue on GitHub.
\section future Future Plans
<ul>
<li> More extensive manual, including developer's tutorial
<li> More applications
</ul>
<hr>
\section license License
<I>LOOS (Lightweight Object-Oriented Structure library)</I>\n
Copyright © 2008-2024, Tod D. Romo, Alan Grossfield\n
Department of Biochemistry and Biophysics\n
School of Medicine & Dentistry, University of Rochester\n
This package (LOOS) is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation under version 3 of the License.
This package is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
\image html https://raw.github.com/GrossfieldLab/loos/main/images/grossfield_logo.jpg
*/
/*! \page citing Citing LOOS in published work
If you use LOOS in your work, please reference the following
publications:
Romo, T.D., Grossfield, A. "LOOS: An extensible platform for the
structural analysis of simulations." 31st Annual International
Conference of the IEEE EMBS (2009): 2332-2335
Romo, T. D., Leioatts, N. and Grossfield, A., "Lightweight Object Oriented Structure Analysis: Tools for building tools to analyze molecular dynamics simulations", J. Comput. Chem. (2014): 2305-2318
In addition, we would appreciate it if you also mention the GitHub page [https://github.com/GrossfieldLab/loos](https://github.com/GrossfieldLab/loos)
*/
/*! \page formats Supported File Formats
LOOS reads the native file formats for most major biomolecular simulation packages.
<H3> Structure Files </H3>
<table align="center" border="1" style="width:80%">
<tr align="center"><th>Package</th> <th>File Suffix</th> <th>Notes</th></tr>
<tr align="center"> <td> Amber </td> <td> prmtop </td> <td> </td> </tr>
<tr align="center"> <td> CHARMM/NAMD </td> <td> pdb </td> <td> </td> </tr>
<tr align="center"> <td> CHARMM/NAMD </td> <td> psf </td> <td> </td> </tr>
<tr align="center"> <td> CHARMM </td> <td> crd </td> <td> </td> </tr>
<tr align="center"> <td> Gromacs </td> <td> gro </td> <td> </td> </tr>
<tr align="center"> <td> Tinker </td> <td> xyz </td> <td> </td> </tr>
</table>
<H3> Trajectory Files </H3>
<table align="center" border="1" style="width:80%">
<tr align="center"><th>Package</th> <th>File Suffix</th> <th>Notes</th></tr>
<tr align="center"> <td> Amber </td> <td> nc, netcdf </td> <td> velocities </td> </tr>
<tr align="center"> <td> Amber</td> <td> crd, mdcrd </td> <td> velocities (if NetCDF)</td> </tr>
<tr align="center"> <td> Amber</td> <td> inpcrd, rst, rst7 </td> <td> </td> </tr>
<tr align="center"> <td> CHARMM/NAMD </td> <td> dcd </td> <td> </td> </tr>
<tr align="center"> <td> CHARMM/NAMD </td> <td> pdb </td> <td> concatenated PDBs or sets of PDBs</td> </tr>
<tr align="center"> <td> Gromacs </td> <td> xtc </td> <td> </td> </tr>
<tr align="center"> <td> Gromacs </td> <td> trr </td> <td> velocities, pressure, virial, forces <br> (Only velocities available with Trajectory Class)</td> </tr>
<tr align="center"> <td> Tinker </td> <td> arc </td> <td> </td> </tr>
</table>
LOOS can write structures out in PDB format or a native pseudo-xml format, and can write
trajectories in either DCD or XTC format.
*/
/*! \page selections Selection Language
\section Language Description
The selection string parser is a relatively simpled parser patterned
after C/PERL expressions and includes support for PERL-style regular
expressions via Boost. There are two kinds of literals supported:
strings and numbers. Numbers are any valid integer. Strings are
delimited by either single quotes or double quotes, so both of the
following are valid strings:
\verbatim
"a string"
'another string'
\endverbatim
An important caveat to integer numbers is that LOOS assumes that none
will be negative. In other words, no atomid nor resid nor number
extracted from a segid (see \ref magops_explained magical ops
below) will evaluate to a
negative number. The relational operators < and <= will behave
differently if either operand is a negative number. In this case,
they will evaluate to false, for reasons that will become obvious when
you read about the magical operators below...
The parser also recognizes a small set of keywords that evaluate to
Atom properties. These keywords fall into two types as well: those
that evaluate to a number (id, resid) and those that evaluate to a
string (name, resname, chainid, and segname or segid). Keep in mind that keywords
are not substitutions, but are more like a pre-defined function that
returns that atom property. So you cannot put a keyword in a string
and expect it to be substituted with the appropriate value, for example.
\subsection relops Relational Operators
<table align="center" border="1" style="width:80%">
<tr align="left"><th>Operator</th><th>Operation</th> <th>Strings</th><th>Numbers</th><th>Example</th></tr>
<tr align="left"><td>></td><td>Greater than</td><td>yes</td><td>yes</td><td>resid > 10</td></td>
<tr align="left"><td>>=</td><td>Greater than or equals</td><td>yes</td><td>yes</td><td>resid >= 10</td></td>
<tr align="left"><td><=</td><td>Less than or equals</td><td>yes</td><td>yes</td><td>resid <= 50</td></td>
<tr align="left"><td><</td><td>Less than</td><td>yes</td><td>yes</td><td>resid < 50</td></td>
<tr align="left"><td>==</td><td>Exactly equals</td><td>yes</td><td>yes</td><td>name == "CA"</td></td>
<tr align="left"><td>!=</td><td>Doesn't equals exactly</td><td>yes</td><td>yes</td><td>segname != "SOLV"</td></td>
<tr align="left"><td>=~</td><td>Regular expression match</td><td>yes</td><td>no</td><td>name =~ "^(C[A]?|N|O)$"</td></td>
</table>
\subsection logops Logical Operators
<table align="center" border="1" style="width:80%">
<tr align="left"><th>Operator</th><th>Operation</th><th>Example</th></tr>
<tr align="left"><td>&&</td><td>Logical And</td><td>name == "CA" && segid == "PROT"</td></tr>
<tr align="left"><td>||</td><td>Logical Or</td><td>segid == "SOLV" || segid == "BULK"</td></tr>
<tr align="left"><td>!</td><td>Not (Negate)</td><td>!(segid == "SOLV")</td></tr>
</table>
\subsection magops Magical Operators
<table align="center" border="1" style="width:80%">
<tr align="left"><th>Operator</th><th>Operation</th><th>Example</th></tr>
<tr align="left"><td>-></td><td>Extracts a number from a string</td><td>segid -> "L(\d+)"</td></tr>
</table>
\subsection keywords Keywords
<table align="center" border="1" style="width:80%">
<tr align="left" valign="top"><th>Keyword</th><th>Atom Property</th><th>Evaluates to...</th><th>Operators</th></tr>
<tr align="left" valign="top"><td>name</td><td>Atom name</td><td>string</td><td>>, >=, <=, <, ==, !=, =~</td></tr>
<tr align="left" valign="top"><td>id</td><td>Atom ID</td><td>number</td><td>>, >=, <=, <, ==, !=</td></tr>
<tr align="left" valign="top"><td>index</td><td>Atom index in model file (0=based)</td><td>number</td><td>>, >=, <=, <, ==, !=</td></tr>
<tr align="left" valign="top"><td>resname</td><td>Residue name</td><td>string</td><td>>, >=, <=, <, ==, !=, =~</td></tr>
<tr align="left" valign="top"><td>resid</td><td>Residue ID</td><td>number</td><td>>, >=, <=, <, ==, !=</td></tr>
<tr align="left" valign="top"><td>segid</td><td>Atom segid</td><td>string</td><td>>, >=, <=, <, ==, !=, =~</td></tr>
<tr align="left" valign="top"><td>segname</td><td>Synonym for segid</td><td>string</td><td>>, >=, <=, <, ==, !=, =~</td></tr>
<tr align="left" valign="top"><td>chainid</td><td>Chain ID</td><td>string</td><td>>, >=, <=, <, ==, !=, =~</td></tr>
<tr align="left" valign="top"><td>all</td><td>Evaluates to true</td><td>number</td><td></td></tr>
<tr align="left" valign="top"><td>hydrogen</td><td>Evaluates to true if atom is a hydrogen</td><td>number</td><td></td></tr>
<tr align="left" valign="top"><td>backbone</td><td>Evaluates to true if atom is a backbone atom (nucleic acids and proteins, and includes hydrogens)</td><td>number</td><td></td></tr>
</table>
Notes:\n
The \c hydrogen selector looks for low-mass atoms with names starting with H. In
order to work correctly when hydrogen mass repartitioning is used, the threshold
mass has been set to 4.1 amu. This means the selector will produce false positive
matches if the system contains helium.
The \c all keyword is used to force a selection string to match all
atoms in instances where a selection is required. For example, a
program to align frames of a trajectory DCD to a reference structure
might require a selection to pick which atoms to use when computing
the rotations and then another selection to pick which atoms are
actually rotated. If you wanted to apply the rotation to all atoms,
you just use the \c all keyword, i.e.
\verbatim
aligner --selection='name='CA' && segid =~ "BAR[12]"' --transform='all' foo.pdb foo.dcd newfoo
\endverbatim
\subsection regexps Regular Expression Matching
The regular expression matching operator "=~" deserves special
attention. It's use is more restrictive than the other operators in
that it can only take a keyword that evaluates to a string on the
left-hand side and a string on the right-hand side. So, the following
expressions are valid:
\verbatim
name =~ "CA"
name =~ "^(C|O|N)$"
segid =~ "PROT|HEME"
\endverbatim
While the following are not valid:
\verbatim
resid =~ "10[0-9][0-9]"
segid =~ 0010
name =~ resname
\endverbatim
The regular expression syntax supported is the PERL syntax as
implemented by the Boost libraries. While you can write regular
expressions that look a lot like globbing (a la VMD selections), keep
in mind that it isn't globbing. It's a regular expression, which is
more powerful anyway... You do need to be careful though that your
shell does not munge any of the regex operators. It's a good idea to
use single quotes when you're writing regex's in a shell, or to use configuration
files to do the arguments instead (see the
<a href="https://github.com/GrossfieldLab/loos/wiki/Using-config-files-instead-of-the-command-line">wiki</a>
for a discussion of how to do that).
The string equality operators ("==" and "!=") both consider the
<I>entire</I> string.
\verbatim
"CA" == "C" --> false
"C" == "C" --> true
\endverbatim
You can use the "=~" operator to perform a substring match.
\verbatim
"CA" == "C" --> false
"C" == "C" --> true
"CA" =~ "C" --> true
\endverbatim
This brings up an important point about using regular expressions: be
careful of unexpected substring matches. For example, let's say you
are wanting to pick out all backbone atoms and you write this
selection string:
\verbatim
name =~ "C|CA|O|N"
\endverbatim
Now look what happens when the following atom names are matched:
\verbatim
"CG" --> true
"CD1" --> true
"NE" --> true
"OH2" --> true
\endverbatim
The problem is that the regular expression is not constrained, so even
though you explicitly put "CA" and "CB" in there, you also have a "C"
which says <I>any</I> atom name with a "C" in it is a match. If
you want to match a string <I>exactly</I> with a regular expression,
you must anchor it:
\verbatim
name =~ "^(C|CA|CB|O|N)$"
\endverbatim
\subsection magops_explained Magical Operations
There is currently only one "magical operator" defined: "->". This
operator takes a string keyword on the left-hand side (i.e. name,
resname, or segid/segname) and a string on the right-hand side
representing a regular expression pattern. It will then try to
extract a numeric value (integer) from the subexpression matches. For
example, suppose you have a range of segments that all follow a
pattern such as "PG1", "PG2", "PG3", ..., "PG120". The regular
expression "PG(\d+)" matches these and the pattern within the
parenthesis is a subexpression. So,
\verbatim
(segid->"L(\d+)") >= 10 && (segid->"L(\d+)") <= 50
\endverbatim
will match segid's "L10" through "L50". Since each matched
subexpression will be examined for a valid integer conversion, the
following will work as expected:
\verbatim
segid->"(L|PG)(\d+)"
\endverbatim
There is a small hitch with the magical operator. If there is no
match, it evaluates to -1. But this is a valid int, so you cannot do
the following:
\verbatim
segid->"L(\d+)" <= 100
\endverbatim
since it will match all segids. You can't, unless the <= operator
is also a little bit special. Fortunately, it is. If either operand
is a negative number, both the < and <= operands assume that
this is a flag for a null-match, and will result in a false value
being returned. It's a bit of a kludge, but it works...
<hr>
\section kahuna Putting It All Together...
When you perform a selection on an AtomicGroup using the selection
language, the expression is evaluated once for each atom in the
group. If it evaluates to "true" (integer 1), then the atom is added
to the new selection. Only one atom is considered at a time.
Here are some example selections:
\verbatim
Extract C-alphas:
name == "CA"
Solvent:
segid == "SOLV" || segid == "BULK"
Solvent heavy atoms (oxygens only)
name =~ "O" && (segid == "SOLV" || segid == "BULK")
C-alphas from a range of residues:
name == "CA" && resid >= 10 && resid <= 50
\endverbatim
\subsection Usage
Most tools based on LOOS will accept selection strings from the
command-line. They must be enclosed in quotes though so they are all
one argument to the tool. If you're using regular expressions, it's a
good idea to use single quotes to prevent your shell from
misinterpreting the regular expression operators and as mentioned
before, back-slash escapes may need doubling.
You can store your selection in a file if you want. To use it then,
use the back-quote feature of your shell to "cat" your selection
file. Since your selection must be one argument, you must enclose the
back-quote within double-quotes, i.e.
\verbatim
a_tool_name "`cat myselection.txt`" arg arg arg
\endverbatim
If you store your selection in a file, then you can also use
comments. A comment is anything after a "#" on a line. Here's an
example of a selection in a file:
\verbatim
### Select water oxygens only...
# Pick out any atom that contains an oxygen
name =~ "O" &&
(segid == "SOLV" || # any segment named SOLV
segid == "BULK") # or named BULK
\endverbatim
*/
/*! \page common_options Common Command-Line Options for Tools
Many LOOS tools use a common set of command-line options (through
the \c OptionsFramework ). These options are organized into
groups that tend to be used together. Not all tools will support
all options. Options may also appear in two forms: a "long" form
where the option is written out following two hyphens, or a "short"
form that is a single character following a single hyphen. An option may
also have a value associated with it, and it can be assigned either
using an equals sign, e.g. \c --verbosity=3, or by just following
the option with a space (optional with the short forms), e.g. \c --verbosity 3. Additionally, some
options are "boolean" in that they turn on or off specific behavior.
These options are turned on by assigning a 1 (true) to it, or a 0 (false),
for example, \c --brief=1 turns on brief output.
\subsection common_options_table Common Options
<table align="center" border="1" style="width:100%">
<tr align="left"><th>Long Name</th> <th>Short Name</th> <th>Description</th> <th>Example</th></tr>
<tr align="left" valign="top">
<td>\-\-fullhelp</td>
<td></td>
<td>Give a lot more information about how the tool works and how to use it.</td>
<td>\-\-fullhelp</td>
</tr>
<tr align="left" valign="top">
<td>\-\-prefix</td>
<td>\-p</td>
<td>Sets the prefix for files written out by the tool.</td>
<td> \-p sim1 </td>
</tr>
<tr align="left" valign="top">
<td>\-\-verbosity</td>
<td>\-v</td>
<td>Sets the output/logging level of a tool. Higher numbers means more verbose output.</td>
<td> \-v 3 </td>
</tr>
<tr align="left" valign="top">
<td>\-\-selection</td>
<td>\-s</td>
<td>Select atoms for the tool to operate on</td>
<td> \-s backbone </td>
</tr>
<tr align="left" valign="top">
<td>\-\-modeltype</td>
<td></td>
<td> Specify the type of model file being used. LOOS will automatically
assign a file-type based on the suffix for a filename (e.g. pdb, psf, ...).
If you use a different convention, you may need to manually tell
the tool what kind of file you are using. The tool's \c --fullhelp
output will available types.</td>
<td> \-\-modeltype pdb </td>
</tr>
<tr align="left" valign="top">
<td>\-\-trajtype</td>
<td></td>
<td> Specify the type of trajectory file being used. LOOS will automatically
assign a file-type based on the suffix for a filename (e.g. dcd, xtc, ...).
If you use a different convention, you may need to manually tell
the tool what kind of file you are using. The tool's \c --fullhelp
output will available types.</td>
<td> \-\-trajtype dcd </td>
</tr>
<tr align="left" valign="top">
<td>\-\-outtrajtype</td>
<td></td>
<td> Specify the type of trajectory file being written. LOOS will automatically
assign a file-type based on the suffix for a filename (e.g. dcd, xtc, ...).
If you use a different convention, you may need to manually tell
the tool what kind of file you are using. The tool's \c --fullhelp
output will available types.</td>
<td> \-\-trajtype xtc </td>
</tr>
<tr align="left" valign="top">
<td>\-\-skip</td>
<td>\-k</td>
<td> Skip the first N frames of the trajectory (or trajectories)</td>
<td> \-k 50 </td>
</tr>
<tr align="left" valign="top">
<td>\-\-stride</td>
<td>\-i</td>
<td> Read every ith frame from the trajectory (or trajectories)</td>
<td> \-i 10 </td>
</tr>
<tr align="left" valign="top">
<td>\-\-range</td>
<td>\-r</td>
<td> Specifies a range-list of frames to operate on from a trajectory. See below
for more details.</td>
<td> -r 50:10: </td>
</tr>
</table>
\subsection ranges Specifying Ranges
The range option (through \c parseIndexRange() ) in LOOS tools is a versatile method of picking exactly what
frames from a trajectory (or \c MultiTrajectory) you want the tool to use. A range-spec
can be a frame number, a range of frames, or a range of frames with a stride.
A range-spec can also be list of range-specs separated by commas. So, you can pick a single
frame by giving the frame number. For example,
\code{.sh}
foo -r 9 model.pdb sim.dcd
\endcode
\c foo will only use the 10th frame from \c sim.dcd. <B>Remember, frame numbers are 0-based!</B> Similarly,
\code{.sh}
foo -r 0,1,2,3,4 model.pdb sim.dcd
\endcode
\c foo will only use the first 5 frames.
Literal ranges of frames can be
specified using an octave/matlab-like syntax of start:stop (<B>inclusive</B>),
\code{.sh}
foo -r 0:99 model.pdb sim.dcd
\endcode
\c foo will use the first 100 frames, while
\code{.sh}
foo -r 100:199 model.pdb sim.dcd
\endcode
will skip the first 100 frames, and use the next 100 frames.
A stride can also be given with the syntax of start:stride:stop,
\code{.sh}
foo -r 10:2:99 model.pdb sim.dcd
\endcode
\c foo will now skip the first 10 frames, then take every other frame through the 100th frame.
You do not need to know how long a trajectory is in order to use the range notation to
specify a skip and a stride (this was not true in older versions of LOOS). Simply
leave off the end,
\code{.sh}
foo -r 10: model.pdb sim.dcd
\endcode
Will use all but the first 10 frames of the trajectory. Note that you must have a colon
after the number, otherwise you will be telling use to use <em>only</em> frame index 10 (i.e.
the 11th frame).
You can also set a stride without knowing how long a trajectory is,
\code{.sh}
foo -r 10:2: model.pdb sim.dcd
\endcode
Here, the first 10 frames a skipped and then every other frame is used for
the rest of the trajectory.
*/
}
/*! \page building Building LOOS
The best source of information on building LOOS is <A
href="https://github.com/GrossfieldLab/loos/blob/master/INSTALL.md">INSTALL.md</A>,
found in the top level directory of the LOOS distribution. It includes detailed
instructions for building LOOS on various Linux distributions and OSX, as well
as using Conda.
/*! \page tools Summary of Tools
Below is a summary of the tools currently distributed with LOOS. To get a
detailed summary of the command line arguments, run the program without
arguments or using "-h". Nearly all tools also support the
"--fullhelp" options, which will display more detailed help
information, including examples of how to use the tool.
In the documentation, the term "system" or "model" refers to a file that
describes the contents of a system, such as a PDB file or one of the inputs
from the various simulation packages (PSF, parmtop, gro, etc). All of the
tools are now package agnostic, in the sense that they will take any of the
supported file formats as input. However, not all files provide all of the
needed information; for example, the order_params tool requires connectivity
information to function properly, so the user must run it with a system file
type that has that information, such as a CHARMM/NAMD PSF file or a PDB with
CONECT records.
At present, LOOS assumes that all periodic boxes are rectangular, and
will produce incorrect answers if trajectories using different box shapes
(eg truncated octahedron) are used in programs which make use of
periodicity (eg rdf). We have no immediate plans to generalize the
code to handle other periodicities, but are willing to reconsider if
there is significant demand from users.
LOOS uses angstroms as the output unit of distance, even when the input
coordinates are in other units (e.g. nanometers for GROMACS files). All output
is in plain text format, and follows general unix/linux conventions. Every
program's output starts with a series of comment lines, marked by beginning
with a "#", echoing the command line used to invoke the program, the user, the
date the program was run, the working directory, and the version of LOOS used.
Any additional information, such as the meanings of various columns of output,
is also provide on lines marked with "#".
LOOS tools are designed to make it easy to plot the results generated. As a
rule, files are formatted such that they can be cleanly plotted using the
gnuplot plotting program, standard with most modern linux distributions. In
addition, if matrices or vectors are written out (e.g. from the svd tool or one
of the tools from the ENM package), the format is consistent with that used by
Matlab, Octave and numpy.
<HR>
<B><I> Categories of Tools </I> </B>
- \subpage other "Macromolecule Tools"
- \subpage manipulation "Manipulating trajectory files"
- \subpage convergence "Assessing statistical errors and convergence"
- \subpage density "3D density distributions"
- \subpage hbond "Hydrogen Bonding"
- \subpage pca "Principal component analysis"
- \subpage membranes "Membrane systems"
- \subpage voronoi "Voronoi decomposition"
- \subpage enm "Elastic network models"
- \subpage cluster "Clustering"
- \subpage user "User-created tools"
\page cluster Clustering
<h2> Clustering Tools </h2>
There are a few different sets of clustering tools. Some are in Packages/Clustering while others are python-based.
<DL>
<DT> <B> cluster-kgs </B>
<DD> Performs hierarchical clustering, using an NMRClust-like method to determine the optimal number of clusters to retain. Takes a frame-to-frame distance matrix as input, eg the output of rmsds, multi-rmsds, or all_contacts.py. See also cluster_pops.py
<DT> <B> cluster-structures.py </B>
<DD> Performs k-means clustering on one or more trajectories, using RMSD as the metric. Has the option to do a t-SNE transformation first.
<DT> <B> hierarchical-cluster.py </B>
<DD> Performs hierarchical clustering given a distance matrix, using scipy's hierarchical clustering library. If you gave it multiple trajectories, the tool can report the cluster populations for each trajectory.
<DT> <B> cluster_pops.py </B>
<DD> Extracts the cluster populations from the output of cluster-kgs
</DL>
<DT> <B> frame-picker.py </B>
<DD> Create a trajectory for each cluster from cluster-kgs
</DL>
\page convergence Convergence
<h2>Convergence Analysis Tools</h2>
A collection of tools for assessing statistical error and
convergence, found in the Packages/Convergence/ directory:
<DL>
<DT> <B> assign_frames </B>
<DD> Given a trajectory and a set of fiducial structures (histogram
centers), assign each frame in the trajectory to a histogram bin.
Part of the workflow for computing the effective sample size. (See
effsize.pl)
<DT> <B> avgconv </B>
<DD> Computes the RMSD between the average structure for time i
and i+1 for a trajectory. The "locally optimal" flag determines
whether the trajectory is globally aligned first or whether each
block of frames used in the average is aligned prior to averaging.
<DT> <B> bcom </B>
<DD> Implements the block covariance overlap method. Briefly,
think of block-averaging where the trajectory is broken up into
blocks of a given size, the PCA computed for the block, and then
the covariance overlap is calculated between the block's PCA and
the PCA for the entire trajectory. Then this is repeated for
increasing block sizes. A Z-score for the bcom result can also
be calculated (using the --zscore=1 flag and optionally setting
the number of "tries" to use).
<DT> <B>block_average</B>
<DD> Reads a simple columnated text file, and computes the block-averaged
standard error as a function of block size. The plateau value
is the best estimate for the true standard error.
Reference: Flyvbjerg, H. & Petersen, H. G.
J. Chem. Phys., 1989, 91, 461-466
<DT> <B> block_avgconv </B>
<DD> Block-averaging of RMSD between average structures for a
trajectory. "Range" in this case is the range of block sizes
and not stricly which frames of the trajectory to use.
<DT> <B> bootstrap_overlap.pl </B>
<DD> PERL program to compute the bcom and bootstrapped bcom for
a trajectory, generating a plot of their ratio and an
exponential fit. Also generates a plot of the residual error in
the fit. Use the "--help" option for more details. Note that
the number of block sizes used is somewhat conservative, so it's
probably a good idea to use a low number of block sizes
initially to get a quick idea of how good or bad the sampling
is, and then use the higher number of blocks for a more detailed
analysis. Also note that plotting requires gnuplot. If you do
not have gnuplot installed (or do not like gnuplot), use the
"--noplot" flag to disable this.
<DT> <B> boot_bcom </B>
<DD> Bootstrapped bcom is similar to bcom above, but rather than
using contiguous blocks, it uses a bootstrap procedure by
randomly selecting frames from the trajectory to build
decorrelated blocks. If no seed for the random number generator
is given, LOOS will pick a default (based on the current system
clock). The --replicates option determines how many blocks are
generated for a given size.
<DT> <B> chist </B>
<DD> Calculates either a cumulative histogram (where each output
row is the histogram up to that point), or a windowed histogram.
<DT> <B> coscon </B>
<DD> Computes the cosine content for varying windows of a
trajectory, based on Hess, B. "Convergence of sampling in
protein simulations." Phys Rev E (2002) 65(3):031910
<DT> <B> decorr_time </B>
<DD> Decorrelaton time as computed by structural histogram
analysis. The default values for the range of N-values,
repetitions, and bin fraction are taken from the paper below and
may need to be changed, particularly if you are using a
trajectory you suspect is undersampled.
Reference: Lyman & Zuckerman, J Phys Chem B (2007) 111:1287-82
<DT> <B> effsize.pl </B>
<DD> PERL front-end to the effective sample size tools (ufidpick,
assign_frames, hierarchy, neff). If you want to apply the
Zuckerman-style effective sample size method (see the entry for neff,
below), you probably should use this script instead of the individual
tools, since this tool automates the process of picking fiducial
structures (the frames that will be the centers of your histogram
bins), assigning the frames from the trajectories to those bins,
working out the mean first passage time between bins, and computing
the effective sample size. Reference: Lyman & Zuckerman, Biophys J
(2006) 91:164-72
<DT> <B> fidpick </B>
<DD> Picks fiducial structures for structural histograms.
Reference: Lyman & Zuckerman, Biophys J (2006) 91:164-72
<DT> <B> hierarchy </B>
<DD>Given a trajectory whose structures have been binned into
states via reference structures, computes the mean first passage time
between states and then constructs a hierarchy of states based on
exchange rates. Used to generate input for neff.
Based on Zhang, Bhatt, and Zuckerman; JCTC, DOI: 10.1021/ct1002384
and code provided by the Zuckerman Lab
(http://www.ccbb.pitt.edu/Faculty/zuckerman/software.html)
<DT> <B> neff </B>
<DD> Computes effective sample size given an assignment and
state file (from hierarchy).
Based on Zhang, Bhatt, and Zuckerman; JCTC, DOI: 10.1021/ct1002384
and code provided by the Zuckerman Lab
(http://www.ccbb.pitt.edu/Faculty/zuckerman/software.html)
<DT> <B> qcoscon </B>
<DD> Computes a "quick" cosine content using the entire trajectory
for the top few modes, based on Hess, B. "Convergence of
sampling in protein simulations." Phys Rev E (2002) 65(3):031910
<DT> <B> sortfids </B>
<DD>Sorts fiducials (from fidpick) based on a decreasing bin
population.
<DT> <B> ufidpick </B>
<DD> Picks a set of fiducial structures from a trajectory using
a uniform distribution.
Reference: Lyman & Zuckerman, J Phys Chem B (2007) 111:12876-82
</DL>
\page density
<h2>Density Tools</h2>
A collection of tools for working with 3D density distributions,
found in Packages/DensityTools. This includes new classes such as
loos::DensityTools::DensityGrid and loos::DensityTools::SimpleMeta.
Some of the grid tools read and write to stdout so they can be
chained, for example:
<pre>
water-hist foo.pdb foo.dcd | gridgauss 4 2 | grid2xplor >foo.map
</pre>
Water-specific tools may allow you to specify "internal" waters to
proteins. This is done by one of 3 methods: axis, box, and grid.
Axis picks waters that are within a given radius from the principal
axis of the protein. Box uses the bounding box of a protein. Grid
uses a grid mask to only consider waters within the non-zero portion
of the mask. Note that "water" and "protein" are just LOOS
selections so there are no restrictions on what part of your system
can be used for building the histogram (e.g. ligand density in a
binding simulation).
<DL>
<DT> <B> blobid </B>
<DD> Identifies "blobs" in a grid using a flood-fill algorithm
<DT> <B> blob_stats </B>
<DD> Takes a blobid'd grid and prints out statistics about each blob
<DT> <B> contained </B>
<DD> Given a thresholded grid and a trajectory, counts the
number of atoms that are contained within the grid segment at
each time point.
<DT> <B> grid2ascii </B>
<DD> Converts a grid into a serialized ASCII representation
<DT> <B> grid2xplor </B>
<DD> Converts a grid into an XPLOR compatible ASCII electron
density map. Requires that the "type" of the grid be specified
(i.e. float, double [default], etc)
<DT> <B> gridgauss </B>
<DD> Convolves a grid with a Gaussian kernel for smoothing
<DT> <B> gridinfo </B>
<DD> Prints out basic information about a grid
<DT> <B> gridmask </B>
<DD> Applies a binary mask (a grid containing ints) to a density grid
of doubles. Any grid point where the mask is non-zero is
copied into a new density grid. All other voxels are zero.
Use this to "clip" out unwanted blobs in a grid.
<DT> <B> gridscale </B>
<DD> Applies a constant scaling to a grid. Assumes a density
grid (i.e. grid of doubles)
<DT> <B> gridstat </B>
<DD> Simple statistics about the data stored in a density grid
(i.e. grid of doubles)
<DT> <B> peakify </B>
<DD> Finds peaks in a density grid using a threshold cutoff.
Blobs are found by flood-filling the grid and the centroid of
the blob is used as the peak.
<DT> <B> pick_blob </B>
<DD> Given a grid mask (i.e. grid of ints), create another grid
mask containing only the blobs that are requested (near an
atom or a gridpoint).
<DT> <B> water-count </B>
<DD> Counts the number of waters inside a protein at each
time-point. Requires an "internal water matrix" (see water-inside).
<DT> <B> water-extract </B>
<DD> Extracts "internal" waters from a trajectory and
concatenates them into a single PDB for visualizing water
pockets/channels.
<DT> <B> water-hist </B>
<DD> Creates a density grid (histogram) that represents water
locations throughout the trajectory. Bulk water can be
explicitly added into the histogram by using the --bulked option
(this is useful for transmembrane proteins).
Scaling the density by the bulk water density currently only
works for membrane systems (or systems where the bulk water
lies in a plane). Use the --scale option along with the --bulk
option to specify what z-range to use for the bulk density estimate.
<DT> <B> water-inside </B>
<DD> Classifies waters as being inside a protein (by various
user-specified criteria) over the course of a trajectory. The
state of all waters is written as a large matrix where the rows
represent time, columns represent different waters, and a 1
means the water is inside at time t.
<DT> <B> water-sides </B>
<DD> Similar to water-inside, but classifies water based on
which side of a membrane it lies (or whether it's internal).
</DL>
\page enm
<h2>Elastic Network Models</h2>
A collection of tools for working with ENMs, found in the
Packages/ElasticNetworks/ directory:
<DL>
<DT> <B>anm</B>
<DD> Computes the anisotropic network model for a structure.
Reference: Atilgan, et al., Biophys. J. 80, 505-515, (2001).
<DT> <B>gnm</B>
<DD> Computes the gaussian network model for a structure. Reference:
Bahar, et al, Folding and Design 2, 173-181, 1997.
<DT> <B>vsa</B>
<DD> Computes the vibrational subsystem analysis model for a
structure. Reference: Woodcock, et al, J. Chem. Phys., 129, 214109-9,
2008
<DT> <B>psf-masses</B>
<DD> Copies atom masses from a PSF into the occupancy field of a PDB
<DT> <B>heavy-ca</B>
<DD> Places the total mass for a residue into its CA (for a PDB with masses)
<DT> <B>enmovie</B>
<DD> Creates a DCD depicting motion along the axes taken from an ENM result
<DT> <B>flucc2b</B>
<DD> Computes B-values based on ENM (<I>now deprecated</I>)
<DT> <B>eigenflucc</B>
<DD> Computes the fluctuations from either ENM or PCA output.
These can be mapped to B-values in a structure.
</DL>
<H3>Important note</H3><p>
The ENM tools return <i>all</i> eigenpairs, including the
zero-modes. The results are ordered such that the first 6 entries
correspond to the zero modes in a typical case. This means that
when you specify a mode to subsequent analysis tools (such as
porcupine or eigenflucc), you <i>must</i> account for this
(i.e. add 6 to the mode requested).
\page hbond
<h2>Hydrogen Bonding Tools</h2>
A set of tools for analyzing hydrogen bonding, found in
Packages/HydrogenBonds/ directory:
<DL>
<DT><B>hbonds</B>
<DD> Finds putative h-bonds based on angle and distance
<DT><B>hmatrix</B>
<DD> Writes out a binary matrix indicating which atoms
have possible h-bonds at each time-point in a trajectory.
<DT><B>hcorrelation</B>
<DD> Computes the time-correlation function for h-bonds.
<DT><B>native-hbs.py</B>
<DD> Track hydrogen bonds from a reference across a trajectory
</DL>
\page manipulation Trajectory manipulation
<h2> Trajectory manipulation tools </h2>
A set of tools for manipulating structure and trajectory
files, found in the Tools/ directory:
<DL>
<DT> <B>aligner</B>
<DD> Align structures in a trajectory against the average using an
iterative refinement scheme. Can read any LOOS trajectory format, but
will write the aligned trajectory as a DCD.
<DT><B>center-molecule</B>
<DD>A more flexible tool for centering molecules. Can use
different subsets for calculating the center, controlling what is
translated, and what goes to the output. It can also reimage the
molecule and center only within the x,y plane.
<DT><B>center-pdb</B>
<DD>Read in a structure file, shift its centroid to the origin, and
write a new pdb file to stdout.
<DT><B>clipper</B>
<DD>Manually clip a model using arbitrary sets of clipping planes.
Outputs a PDB.
<DT><B>concat-selection</B>
<DD>Concatenates atoms from a trajectory into a single PDB. Useful
for seeing where something has been...
<DT><B>convert2pdb</B>
<DD>Read in a structure file in a LOOS-supported format, and write it
out as a pdb file.