Skip to content

Commit

Permalink
First attempt search and replace for new name
Browse files Browse the repository at this point in the history
  • Loading branch information
stephenlienharrell committed Jan 10, 2025
1 parent 45353f9 commit 6cc76f4
Show file tree
Hide file tree
Showing 48 changed files with 186 additions and 186 deletions.
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ RUN pip install -r requirements.txt

# copy project
COPY --chown=hpcperfstats:hpcperfstats . .
# This includes the tacc_stats.ini
#COPY --chown=hpcperfstats:hpcperfstats ./tacc_stats.ini .
# This includes the hpcperfstats.ini
#COPY --chown=hpcperfstats:hpcperfstats ./hpcperfstats.ini .


RUN pip install .
Expand Down
6 changes: 3 additions & 3 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
recursive-include tacc_stats/site/machine/templates *.html
recursive-include tacc_stats/site/tacc_stats_site/templates *html
recursive-include tacc_stats/site/tacc_stats_site/media *png
recursive-include hpcperfstats/site/machine/templates *.html
recursive-include hpcperfstats/site/hpcperfstats_site/templates *html
recursive-include hpcperfstats/site/hpcperfstats_site/media *png
56 changes: 28 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
tacc_stats Documentation {#mainpage}
hpcperfstats Documentation {#mainpage}
========================

[![DOI](https://zenodo.org/badge/21212519.svg)](https://zenodo.org/badge/latestdoi/21212519)
Expand All @@ -18,19 +18,19 @@ Albert Lu <br />

Description
-----------------
The tacc_stats package provides the tools to monitor resource usage of HPC systems at multiple levels of resolution.
The hpcperfstats package provides the tools to monitor resource usage of HPC systems at multiple levels of resolution.

The package is split into an `autotools`-based `monitor` subpackage and a Python `setuptools`-based `tacc_stats` subpackage. `monitor` performs the online data collection and transmission in a production environment while `tacc_stats` performs the data curation and analysis in an offline environment.
The package is split into an `autotools`-based `monitor` subpackage and a Python `setuptools`-based `hpcperfstats` subpackage. `monitor` performs the online data collection and transmission in a production environment while `hpcperfstats` performs the data curation and analysis in an offline environment.

Building and installing the `tacc_stats-2.3.5-1.el7.x86_64.rpm` package with the `taccstats.spec` file will build and install a systemd service `taccstats`. This service launches a daemon with an overhead of 3% on a single core when configured to sample at a frequency of 1Hz. It is typically configured to sample at 5 minute intervals, with samples taken at the start and end of every job as well. The TACC Stats daemon, `tacc_statsd`, is controlled by the `taccstats` service and sends the data directly to a RabbitMQ server over the administrative ethernet network. RabbitMQ must be installed and running on the server in order for the data to be received.
Building and installing the `hpcperfstats-2.3.5-1.el7.x86_64.rpm` package with the `taccstats.spec` file will build and install a systemd service `taccstats`. This service launches a daemon with an overhead of 3% on a single core when configured to sample at a frequency of 1Hz. It is typically configured to sample at 5 minute intervals, with samples taken at the start and end of every job as well. The TACC Stats daemon, `hpcperfstatsd`, is controlled by the `taccstats` service and sends the data directly to a RabbitMQ server over the administrative ethernet network. RabbitMQ must be installed and running on the server in order for the data to be received.

Installing the `tacc_stats` module will setup a Django-based web application along with tools for extracting the data from the RabbitMQ server and feeding them into a PostgreSQL database.
Installing the `hpcperfstats` module will setup a Django-based web application along with tools for extracting the data from the RabbitMQ server and feeding them into a PostgreSQL database.

Code Access
-----------
To get access to the tacc_stats source code clone this repository:
To get access to the hpcperfstats source code clone this repository:

git clone https://github.com/TACC/tacc_stats
git clone https://github.com/TACC/hpcperfstats


----------------------------------------------------------------------------
Expand All @@ -43,7 +43,7 @@ First ensure the RabbitMQ library and header file are installed on the build and

[librabbitmq-devel-0.5.2-1.el6.x86_64](ftp://fr2.rpmfind.net/linux/epel/6/x86_64/librabbitmq-devel-0.5.2-1.el6.x86_64.rpm)

`./configure; make; make install` will then successfully build the `tacc_statsd` executable for many systems. If Xeon Phi coprocessors are present on your system they can be monitored with the `--enable-mic` flag. Additionally the configuration options, `--disable-infiniband`, `--disable-lustre`, `--disable-hardware` will disable infiniband, Lustre Filesystem, and Hardware Counter monitoring which are all enabled by default. Disabling RabbitMQ will result in a legacy build of `tacc_statsd` that relies on the shared filesystem to transmit data. This mode is not recommended and currently used for testing purposes only. If libraries or header files are not found than add their paths to the include and library paths with the `CPPFLAGS` and/or `LDFLAGS` vars as is standard in autoconf based installations.
`./configure; make; make install` will then successfully build the `hpcperfstatsd` executable for many systems. If Xeon Phi coprocessors are present on your system they can be monitored with the `--enable-mic` flag. Additionally the configuration options, `--disable-infiniband`, `--disable-lustre`, `--disable-hardware` will disable infiniband, Lustre Filesystem, and Hardware Counter monitoring which are all enabled by default. Disabling RabbitMQ will result in a legacy build of `hpcperfstatsd` that relies on the shared filesystem to transmit data. This mode is not recommended and currently used for testing purposes only. If libraries or header files are not found than add their paths to the include and library paths with the `CPPFLAGS` and/or `LDFLAGS` vars as is standard in autoconf based installations.

There will be a configuration file, `/etc/taccstats/taccstats.conf`, after installation. This file contains the fields

Expand All @@ -56,7 +56,7 @@ There will be a configuration file, `/etc/taccstats/taccstats.conf`, after insta
`freq 600`


`server` should be set to the hostname or IP hosting the RabbitMQ server, `queue` to the system/cluster name that is being monitored, `port` to the RabbitMQ port (5672 is default), and `freq` to the desired sampling frequency in seconds. The file and settings can be reloaded into a running `tacc_statsd` daemon with a SIGHUP signal.
`server` should be set to the hostname or IP hosting the RabbitMQ server, `queue` to the system/cluster name that is being monitored, `port` to the RabbitMQ port (5672 is default), and `freq` to the desired sampling frequency in seconds. The file and settings can be reloaded into a running `hpcperfstatsd` daemon with a SIGHUP signal.

An RPM can be built for deployment using the `taccstats.spec` file. The most straightforward approach to build this is to setup your rpmbuild directory then run

Expand All @@ -68,15 +68,15 @@ The `taccstats.spec` file `sed`s the `taccstats.conf` file to the correct server

`sed -i 's/default/frontera/' src/taccstats.conf`

`tacc_statsd` can be started, stopped, and restarted using `systemctl start taccstats`, `systemctl stop taccstats`, and `systemctl restart taccstats`.
`hpcperfstatsd` can be started, stopped, and restarted using `systemctl start taccstats`, `systemctl stop taccstats`, and `systemctl restart taccstats`.

In order to notify `tacc_stats` of a job beginning, echo the job id into `/var/run/TACC_jobid` on each node where the job is running. It order to notify
In order to notify `hpcperfstats` of a job beginning, echo the job id into `/var/run/TACC_jobid` on each node where the job is running. It order to notify
it of a job ending echo `-` into `/var/run/TACC_jobid` on each node where the job is running. This can be accomplished in the job scheduler prolog and
epilog for example.

#### Job Scheduler Configuration
-------
In order for tacc_stats to correcly label records with JOBIDs it is required that
In order for hpcperfstats to correcly label records with JOBIDs it is required that
the job scheduler prolog and epilog contain the lines


Expand All @@ -96,23 +96,23 @@ for example,

1837137|sharrell|project140208|2018-08-01T18:18:51|2018-08-02T11:44:51|2018-07-29T08:05:43|normal|1-00:00:00|jobname|COMPLETED|8|104|c420-[024,073],c421-[051-052,063-064,092-093]

If using SLURM the `sacct_gen.py` script that will be installed with the `tacc_stats` subpackage may be used.
If using SLURM the `sacct_gen.py` script that will be installed with the `hpcperfstats` subpackage may be used.
This script generates a file for each date with the name format `year-month-day.txt`, e.g. `2018-11-01.txt`.

#### `tacc_stats` subpackage
#### `hpcperfstats` subpackage
To install TACC Stats on the machine where data will be processed, analyzed, and the webserver hosted follow these
steps:

1. Download the package and setup the Python3 virtual environment. TACC Stats is Python3 dependent.
```
$ virtualenv machinename --system-site-packages
$ cd machinename; source bin/activate
$ git clone https://github.com/TACC/tacc_stats
$ git clone https://github.com/TACC/hpcperfstats
```
`tacc_stats` is a pure Python package. Dependencies should be automatically downloaded
`hpcperfstats` is a pure Python package. Dependencies should be automatically downloaded
and installed when installed via `pip`. The package must first be configured however
in the `tacc_stats.ini` file.
2. The initialization file, `tacc_stats.ini`, controls all the configuration options and has
in the `hpcperfstats.ini` file.
2. The initialization file, `hpcperfstats.ini`, controls all the configuration options and has
the following content and descriptions
```
## Basic configuration options - modify these
Expand All @@ -121,7 +121,7 @@ the following content and descriptions
# data_dir = where data is stored
[DEFAULT]
machine = ls5
data_dir = /hpc/tacc_stats_site/%(machine)s
data_dir = /hpc/hpcperfstats_site/%(machine)s
server = tacc-stats02.tacc.utexas.edu
## RabbitMQ Configuration
Expand All @@ -139,17 +139,17 @@ host_name_ext = %(machine)s.tacc.utexas.edu
dbname = %(machine)s_db
```
Set these paths as needed. The `accounting_path` will contain an accounting file for each date, e.g. `2018-11-01.txt`. The raw stats data will be stored in the `archive_dir` and processed stats data in the TimeScale database `dbname`. `machine` should match the system name used in the RabbitMQ server `QUEUE` field and is the RabbitMQ `QUEUE` that the monitoring agent sends the data too. This is the only field that needs to match settings in the `monitor` subpackage. `host_name_ext` is the extension required to each compute node hostname in order to build a FQDN. This will match to directory names created in the `archive_dir`.
3. Install `tacc_stats`
3. Install `hpcperfstats`
```
$ pip install -e tacc_stats/
$ pip install -e hpcperfstats/
```
4. Start the RabbitMQ server reader in the background, e.g.
```
$ nohup listend.py > /tmp/listend.log
```
Raw stats files will now be generated in the `archive_dir`.
5. A PostgreSQL database must be setup on the host. To do this, after installation of PostgreSQL
and the `tacc_stats` Python module
and the `hpcperfstats` Python module
```
$ sudo su - postgres
$ psql
Expand Down Expand Up @@ -187,12 +187,12 @@ ServerAdmin [email protected]
ServerName stats.webserver.tacc.utexas.edu
ServerAlias stats.webserver.tacc.utexas.edu
WSGIDaemonProcess s2-stats python-home=/stats/stampede2 python-path=/stats/stampede2/tacc_stats:/stats/stampede2/lib/python3.7/site-packages user=sharrell
WSGIDaemonProcess s2-stats python-home=/stats/stampede2 python-path=/stats/stampede2/hpcperfstats:/stats/stampede2/lib/python3.7/site-packages user=sharrell
WSGIProcessGroup s2-stats
WSGIScriptAlias / /tacc_stats/site/tacc_stats_site/wsgi.py process-group=s2-stats
WSGIScriptAlias / /hpcperfstats/site/hpcperfstats_site/wsgi.py process-group=s2-stats
WSGIApplicationGroup %{GLOBAL}
<Directory /stats/stampede2/tacc_stats/tacc_stats/site/tacc_stats_site>
<Directory /stats/stampede2/hpcperfstats/hpcperfstats/site/hpcperfstats_site>
<Files wsgi.py>
Require all granted
</Files>
Expand All @@ -210,11 +210,11 @@ where the 4 optional arguments have the following meaning

- `start_date` : the start of the date range, e.g. `"2013-09-25"` (default is today)
- `end_date` : the end of the date range, e.g. `"2013-09-26"` (default is `start_date`)
- `-dir` : the directory to store pickled dictionaries (default is set in tacc_stats.ini)
- `-dir` : the directory to store pickled dictionaries (default is set in hpcperfstats.ini)
- `-jobids` : individual jobids to pickle (default is all jobs)

No arguments results in all jobs from the previous day getting pickled and stored in the `pickles_dir`
defined in `tacc_stats.ini`. On Stampede argumentless `job_pickles.py` is run every 24 hours as a `cron` job
defined in `hpcperfstats.ini`. On Stampede argumentless `job_pickles.py` is run every 24 hours as a `cron` job
set-up by the user.


Expand All @@ -226,7 +226,7 @@ dictionary with the following key layers:

job : 1st key Job ID
host : 2nd key Host node used by Job ID
type : 3rd key TYPE specified in tacc_stats
type : 3rd key TYPE specified in hpcperfstats
device : 4th key device belonging to type

For example, to access Job ID `101`'s stats data on host `c560-901` for
Expand Down
4 changes: 2 additions & 2 deletions docker-instructions.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
dnf install docker git podman-compose

# May need to use an ssh git address if you plan to commit to the repo
git clone https://github.com/TACC/tacc_stats.git
git clone https://github.com/TACC/hpcperfstats.git

cd tacc_stats
cd hpcperfstats

git checkout sharrell-docker

Expand Down
2 changes: 1 addition & 1 deletion monitor/ChangeLog
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
3/24/2016 The TACC Stats online monitoring binary, tacc_stats, is now built using autotools.
3/24/2016 The TACC Stats online monitoring binary, hpcperfstats, is now built using autotools.
The binary can be built in a cron driven mode or a daemon driven mode (preferred).
2 changes: 1 addition & 1 deletion monitor/NEWS
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
The TACC Stats online monitoring binary, tacc_stats, is now built using autotools.
The TACC Stats online monitoring binary, hpcperfstats, is now built using autotools.

14 changes: 7 additions & 7 deletions monitor/README
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
Stats Data
----------

### Raw stats data: generated by `tacc_stats`
### Raw stats data: generated by `hpcperfstats`

A raw stats file consists of a multiline header, followed my one or more
record groups. The first few lines of the header identify the version
of tacc_stats, the FQDN of the host, it's uname, it's uptime in seconds, and
of hpcperfstats, the FQDN of the host, it's uname, it's uptime in seconds, and
other properties to be specified.

$tacc_stats 1.0.2
$hpcperfstats 1.0.2
$hostname i101-101.ranger.tacc.utexas.edu
$uname Linux x86_64 2.6.18-194.32.1.el5_TACC #18 SMP Mon Mar 14 22:24:19 CDT 2011
$uptime 4753669
Expand Down Expand Up @@ -134,8 +134,8 @@ There is a large variety of data collected and summarized below:
`vm` virtual memory statistics.


For the source and meanings of the counters, see the tacc_stats source
`https://github.com/TACC/tacc_stats`, the CentOS 5.6 kernel source,
For the source and meanings of the counters, see the hpcperfstats source
`https://github.com/TACC/hpcperfstats`, the CentOS 5.6 kernel source,
especially `Documentation/*`, and the manpages, especially proc(5).


Expand All @@ -149,7 +149,7 @@ architectures.

\warning Some event counters (from ib_sw, numa, and possibly others)
suffer from occasional dips. This may be due to non-atomic accesses
in the (kernel) code that presents the counter, a bug in tacc_stats,
in the (kernel) code that presents the counter, a bug in hpcperfstats,
or some other condition. Spurious rollover is easy to detect,
however, because a naive adjustment produced a riduculously large
delta.
Expand All @@ -160,5 +160,5 @@ begin from end.

\warning Due to a quirk in the Opteron performance counter
architecture, we do not assign the same set of events to each core,
see `amd64_pmc.c` in the tacc_stats source for details.
see `amd64_pmc.c` in the hpcperfstats source for details.

2 changes: 1 addition & 1 deletion monitor/configure.ac
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
AC_INIT([tacc_stats], [2.3.5], [[email protected]])
AC_INIT([hpcperfstats], [2.3.5], [[email protected]])
AM_INIT_AUTOMAKE([-Wall -Werror -Wno-portability])
AC_PROG_CC
AM_PROG_CC_C_O
Expand Down
34 changes: 17 additions & 17 deletions monitor/src/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,13 @@ systemduserunit_DATA = \
taccstats.conf
endif

bin_PROGRAMS = tacc_statsd
bin_PROGRAMS = hpcperfstatsd

STATS_DIR_PATH = /var/log/tacc_stats
STATS_LOCK_PATH = /var/lock/tacc_stats
STATS_DIR_PATH = /var/log/hpcperfstats
STATS_LOCK_PATH = /var/lock/hpcperfstats
JOBID_FILE_PATH = /var/run/TACC_jobid

tacc_statsd_CPPFLAGS = \
hpcperfstatsd_CPPFLAGS = \
-D_GNU_SOURCE \
-DSTATS_PROGRAM=\"@PACKAGE@\" \
-DSTATS_VERSION=\"@PACKAGE_VERSION@\" \
Expand All @@ -24,11 +24,11 @@ tacc_statsd_CPPFLAGS = \
-DJOBID_FILE_PATH=\"$(JOBID_FILE_PATH)\"

if DEBUG
tacc_statsd_CPPFLAGS += \
hpcperfstatsd_CPPFLAGS += \
-DDEBUG
endif

tacc_statsd_SOURCES = \
hpcperfstatsd_SOURCES = \
amd64_pmc.h \
amd64_df.h \
collect.h \
Expand All @@ -53,14 +53,14 @@ tacc_statsd_SOURCES = \
trace.h

if RABBITMQ
tacc_statsd_SOURCES += \
hpcperfstatsd_SOURCES += \
monitor.c \
stats_buffer.c \
stats_buffer.h
tacc_statsd_CPPFLAGS += \
hpcperfstatsd_CPPFLAGS += \
-DRABBITMQ
else
tacc_statsd_SOURCES += \
hpcperfstatsd_SOURCES += \
main.c \
stats_file.c \
stats_file.h
Expand Down Expand Up @@ -122,17 +122,17 @@ TYPES += \
ib_sw.c
endif

tacc_statsd_LDFLAGS =
hpcperfstatsd_LDFLAGS =
if OPA
TYPES += \
opa.c
tacc_statsd_LDFLAGS += \
hpcperfstatsd_LDFLAGS += \
-lpthread \
-lmemusage
endif

if LUSTRE
tacc_statsd_SOURCES += \
hpcperfstatsd_SOURCES += \
lustre_obd_to_mnt.c \
lustre_obd_to_mnt.h

Expand All @@ -143,23 +143,23 @@ TYPES += \
endif

if MIC
tacc_statsd_SOURCES += \
hpcperfstatsd_SOURCES += \
miclib.h
TYPES += \
mic.c
endif

tacc_statsd_SOURCES += \
hpcperfstatsd_SOURCES += \
$(TYPES)

tacc_statsd_LDFLAGS += \
hpcperfstatsd_LDFLAGS += \
-lm

nodist_tacc_statsd_SOURCES = \
nodist_hpcperfstatsd_SOURCES = \
stats.x

if GPU
nodist_tacc_statsd_SOURCES += \
nodist_hpcperfstatsd_SOURCES += \
nvml.h
TYPES += \
nvidia_gpu.c
Expand Down
2 changes: 1 addition & 1 deletion monitor/src/amqp_listen.c
Original file line number Diff line number Diff line change
Expand Up @@ -251,7 +251,7 @@ int main(int argc, char *argv[])
close(STDIN_FILENO);
close(STDOUT_FILENO);
close(STDERR_FILENO);
syslog(LOG_INFO, "Starting tacc_stats consuming daemon.\n");
syslog(LOG_INFO, "Starting hpcperfstats consuming daemon.\n");
consume(hostname, port, archive_dir);

exit(EXIT_SUCCESS);
Expand Down
2 changes: 1 addition & 1 deletion monitor/src/main.c
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ int main(int argc, char *argv[])
else
FATAL("invalid command `%s'\n", cmd_str);

// Ensures only one tacc_stats is running at any time
// Ensures only one hpcperfstats is running at any time
lock_fd = open_lock_timeout(STATS_LOCK_PATH, lock_timeout);
if (lock_fd < 0)
FATAL("cannot acquire lock\n");
Expand Down
Loading

0 comments on commit 6cc76f4

Please sign in to comment.