Skip to content

Commit

Permalink
Moving files to monitor, adding LICENSE and README
Browse files Browse the repository at this point in the history
  • Loading branch information
aterrel committed Nov 29, 2011
1 parent 0113d86 commit 7df79a5
Show file tree
Hide file tree
Showing 57 changed files with 627 additions and 137 deletions.
458 changes: 458 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

160 changes: 23 additions & 137 deletions README
Original file line number Diff line number Diff line change
@@ -1,146 +1,32 @@
tacc_stats John L. Hammond
Jun 14 2011
tacc_stats
==========

tacc_stats is a job-oriented and logically structured version of the
conventional sysstat system monitor.
A job-oriented system monitor, analyzation, and visualization tool.

1. Invocation

On each Ranger compute node tacc_stats runs every 10-minutes (through
cron), and at the beginning and end of every job (through SGE
prolog/epilog). In addition, tacc_stats may be directly invoked by
the user (or application) although we have not advertised this.
Authors
-------

WARNING Lack of coordination.
John Hammond ([email protected]) - system monitors
Andy R. Terrel ([email protected]) - analysis and visualization

2. Data Handling
Copyright
---------
(C) 2011 University of Texas at Austin

On each invocation, tacc_stats collects and records system statistics
to a structured text file on ram backed storage local to the node.
Stats files are rotated at every night at 23:55 localtime, and
archived at sometime between 02:00--04:00 localtime to Ranger's
/scratch filesystem. A stats file created at epoch time EPOCH, on
node HOSTNAME, will be stored locally as /var/log/tacc_stats/EPOCH,
and archived at
/scratch/projects/tacc_stats/archive/HOSTNAME/EPOCH.gz. For example
stats collected on Jun 14 2011 on i101-101, might correspond to files
/var/log/tacc_stats/1308027601 and
/scratch/projects/tacc_stats/archive/i101-101.ranger.tacc.utexas.edu/1308027601.gz.
License
-------

WARNING Do not expect all stats files to be created at midnight
exactly, or even approximately. As nodes are rebooted, new
stats_files will be created as soon as a job begins or the cron task
runs.
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.

WARNING Stats from a given job on a give host may span multiple files.
This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.

WARNING Expect stats files to be missing occasionally, as nodes may
crash before they can be archived. Since we use ram backed storage
these files do not survive a reboot.

3. Stats File Format

A stats file consists of a multiline header, followed my one or more
record groups. The first few lines of the header identify the version
of tacc_stats, the FQDN of the host, it's uname, it's uptime in seconds, and
other properties to be specified.

$tacc_stats 1.0.2
$hostname i101-101.ranger.tacc.utexas.edu
$uname Linux x86_64 2.6.18-194.32.1.el5_TACC #18 SMP Mon Mar 14 22:24:19 CDT 2011
$uptime 4753669

These are followed by schema descriptors for each of the types collected:

!amd64_pmc CTL0,C CTL1,C CTL2,C CTL3,C CTR0,E,W=48 CTR1,E,W=48 CTR2,E,W=48 CTR3,E,W=48
!cpu user,E,U=cs nice,E,U=cs system,E,U=cs idle,E,U=cs iowait,E,U=cs irq,E,U=cs softirq,E,U=cs
!lnet tx_msgs,E rx_msgs,E rx_msgs_dropped,E tx_bytes,E,U=B rx_bytes,E,U=B rx_bytes_dropped,E
!ps ctxt,E processes,E load_1 load_5 load_15 nr_running nr_threads
...

A schema descriptor consists of the character '!' followed by the
type, followed by a space separated list of elements. Each element
consists of a key name, followed by a comma-separated list of options;
the options currently used are:
E meaning that the counter is an event counter,
W=<BITS> meaning that the counter is <BITS> wide (as opposed to 64),
C meaning that the value is a control register, not a counter,
U=<STR> meaning that the value is in units specified by <STR>.

Note especially the event and width options. Certain counters, such
as the performance counters are subject to rollover, and as such their
widths must be known for the values to be interpreted correctly.

WARNING The archived stats files do not account for rollover. This
task is left for postprocessing.

A record group consists of a blank line, a line containing the epoch
time of the record and the current jobid, zero of more lines of marks
(each starting with the % character), and several lines of statistics.


1307509201 1981063
%begin 1981063
amd64_pmc 11 4259958 4391234 4423427 4405240 235835341001110 187269740525248 62227761639015 177902917871843
amd64_pmc 10 4259958 4391234 4405239 4423427 221601328309784 187292967300939 47879507215852 174113618669738
amd64_pmc 13 4259958 4405238 4391234 4423427 211997466129346 215850892876689 2218837366391 233806061617899
amd64_pmc 12 4392928 4259958 4391234 4423427 6782043270201 102683296940807 2584394368284 174209034378272
...
cpu 11 429720418 0 1685980 43516346 447875 155 3443
cpu 10 429988676 0 1675476 43150935 559410 8 283
...
net ib0 0 0 55915434547 0 0 0 0 0 0 0 0 0 159301288 0 46963995550 0 0 97 0 0 0 31404022 0
...
ps - 4059349377 507410 1600 1600 1600 18 373
...

Each line of statistics contains the type (amd64_pmc, cpu, net,
ps,...), the device (11,10,13,12,...,ib0,-...), followed by the
counter values in the order given by the schema. Note that when we
cannot meaningfully attach statistics to a device, we use '-' as the
device name.

4. Types

The types currently collected on Ranger are:

amd64_pmc AMD Opteron performance counters (per core),
block block device statistics (per device),
cpu scheduler accounting (per CPU),
ib_sw InfiniBand usage,
llite Lustre filesystem usage (per mount),
lnet Lustre network usage,
mem memory usage (per socket)
net network device usage (per device)
numa weird NUMA statistics (per socket),
ps process statistics,
sysv_shm SysV shared memory segment usage,
tmpfs ram-backed filesystem usage (per mount),
vfs dentry/file/inode cache usage,
vm virtual memory statistics.

For the keys associated with each type, see the appropriate schema.
For the source and meanings of the counters, see the tacc_stats source
https://github.com/jhammond/tacc_stats, the CentOS 5.6 kernel source,
especially Documentation/*, and the manpages, especially proc(5).

I have not tracked down the meanings of all counters. However, if I
did (and it wasn't obvious from the counter name) then I put that
information in the source (see for example block.c).

WARNING Due to a bug in Lustre, llite overreports read_bytes.

WARNING Some event counters (from ib_sw, numa, and possibly others)
suffer from occasional dips. This may be due to non-atomic accesses
in the (kernel) code that presents the counter, a bug in tacc_stats,
or some other condition. Spurious rollover is easy to detect,
however, because a naive adjustment produced a riduculously large
delta.

WARNING We never reset counters, thus to determine the number of
events that occurred during a job, you must subtract the value at
begin from end.

WARNING Due to a quirk in the Opteron performance counter
architecture, we do not assign the same set of events to each core,
see amd64_pmc.c in the tacc_stats source for details.
You should have received a copy of the GNU Lesser General Public
License along with this library; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
File renamed without changes.
File renamed without changes.
146 changes: 146 additions & 0 deletions monitor/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
tacc_stats John L. Hammond
Jun 14 2011

tacc_stats is a job-oriented and logically structured version of the
conventional sysstat system monitor.

1. Invocation

On each Ranger compute node tacc_stats runs every 10-minutes (through
cron), and at the beginning and end of every job (through SGE
prolog/epilog). In addition, tacc_stats may be directly invoked by
the user (or application) although we have not advertised this.

WARNING Lack of coordination.

2. Data Handling

On each invocation, tacc_stats collects and records system statistics
to a structured text file on ram backed storage local to the node.
Stats files are rotated at every night at 23:55 localtime, and
archived at sometime between 02:00--04:00 localtime to Ranger's
/scratch filesystem. A stats file created at epoch time EPOCH, on
node HOSTNAME, will be stored locally as /var/log/tacc_stats/EPOCH,
and archived at
/scratch/projects/tacc_stats/archive/HOSTNAME/EPOCH.gz. For example
stats collected on Jun 14 2011 on i101-101, might correspond to files
/var/log/tacc_stats/1308027601 and
/scratch/projects/tacc_stats/archive/i101-101.ranger.tacc.utexas.edu/1308027601.gz.

WARNING Do not expect all stats files to be created at midnight
exactly, or even approximately. As nodes are rebooted, new
stats_files will be created as soon as a job begins or the cron task
runs.

WARNING Stats from a given job on a give host may span multiple files.

WARNING Expect stats files to be missing occasionally, as nodes may
crash before they can be archived. Since we use ram backed storage
these files do not survive a reboot.

3. Stats File Format

A stats file consists of a multiline header, followed my one or more
record groups. The first few lines of the header identify the version
of tacc_stats, the FQDN of the host, it's uname, it's uptime in seconds, and
other properties to be specified.

$tacc_stats 1.0.2
$hostname i101-101.ranger.tacc.utexas.edu
$uname Linux x86_64 2.6.18-194.32.1.el5_TACC #18 SMP Mon Mar 14 22:24:19 CDT 2011
$uptime 4753669

These are followed by schema descriptors for each of the types collected:

!amd64_pmc CTL0,C CTL1,C CTL2,C CTL3,C CTR0,E,W=48 CTR1,E,W=48 CTR2,E,W=48 CTR3,E,W=48
!cpu user,E,U=cs nice,E,U=cs system,E,U=cs idle,E,U=cs iowait,E,U=cs irq,E,U=cs softirq,E,U=cs
!lnet tx_msgs,E rx_msgs,E rx_msgs_dropped,E tx_bytes,E,U=B rx_bytes,E,U=B rx_bytes_dropped,E
!ps ctxt,E processes,E load_1 load_5 load_15 nr_running nr_threads
...

A schema descriptor consists of the character '!' followed by the
type, followed by a space separated list of elements. Each element
consists of a key name, followed by a comma-separated list of options;
the options currently used are:
E meaning that the counter is an event counter,
W=<BITS> meaning that the counter is <BITS> wide (as opposed to 64),
C meaning that the value is a control register, not a counter,
U=<STR> meaning that the value is in units specified by <STR>.

Note especially the event and width options. Certain counters, such
as the performance counters are subject to rollover, and as such their
widths must be known for the values to be interpreted correctly.

WARNING The archived stats files do not account for rollover. This
task is left for postprocessing.

A record group consists of a blank line, a line containing the epoch
time of the record and the current jobid, zero of more lines of marks
(each starting with the % character), and several lines of statistics.


1307509201 1981063
%begin 1981063
amd64_pmc 11 4259958 4391234 4423427 4405240 235835341001110 187269740525248 62227761639015 177902917871843
amd64_pmc 10 4259958 4391234 4405239 4423427 221601328309784 187292967300939 47879507215852 174113618669738
amd64_pmc 13 4259958 4405238 4391234 4423427 211997466129346 215850892876689 2218837366391 233806061617899
amd64_pmc 12 4392928 4259958 4391234 4423427 6782043270201 102683296940807 2584394368284 174209034378272
...
cpu 11 429720418 0 1685980 43516346 447875 155 3443
cpu 10 429988676 0 1675476 43150935 559410 8 283
...
net ib0 0 0 55915434547 0 0 0 0 0 0 0 0 0 159301288 0 46963995550 0 0 97 0 0 0 31404022 0
...
ps - 4059349377 507410 1600 1600 1600 18 373
...

Each line of statistics contains the type (amd64_pmc, cpu, net,
ps,...), the device (11,10,13,12,...,ib0,-...), followed by the
counter values in the order given by the schema. Note that when we
cannot meaningfully attach statistics to a device, we use '-' as the
device name.

4. Types

The types currently collected on Ranger are:

amd64_pmc AMD Opteron performance counters (per core),
block block device statistics (per device),
cpu scheduler accounting (per CPU),
ib_sw InfiniBand usage,
llite Lustre filesystem usage (per mount),
lnet Lustre network usage,
mem memory usage (per socket)
net network device usage (per device)
numa weird NUMA statistics (per socket),
ps process statistics,
sysv_shm SysV shared memory segment usage,
tmpfs ram-backed filesystem usage (per mount),
vfs dentry/file/inode cache usage,
vm virtual memory statistics.

For the keys associated with each type, see the appropriate schema.
For the source and meanings of the counters, see the tacc_stats source
https://github.com/jhammond/tacc_stats, the CentOS 5.6 kernel source,
especially Documentation/*, and the manpages, especially proc(5).

I have not tracked down the meanings of all counters. However, if I
did (and it wasn't obvious from the counter name) then I put that
information in the source (see for example block.c).

WARNING Due to a bug in Lustre, llite overreports read_bytes.

WARNING Some event counters (from ib_sw, numa, and possibly others)
suffer from occasional dips. This may be due to non-atomic accesses
in the (kernel) code that presents the counter, a bug in tacc_stats,
or some other condition. Spurious rollover is easy to detect,
however, because a naive adjustment produced a riduculously large
delta.

WARNING We never reset counters, thus to determine the number of
events that occurred during a job, you must subtract the value at
begin from end.

WARNING Due to a quirk in the Opteron performance counter
architecture, we do not assign the same set of events to each core,
see amd64_pmc.c in the tacc_stats source for details.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 comments on commit 7df79a5

Please sign in to comment.