Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add Performance-CoPilot on compute nodes #341

Open
wants to merge 27 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
1952c0c
Add pcp role to compute build playbook.
eatluri Dec 19, 2019
701139f
variable to enable/disable pcp install.
eatluri Dec 19, 2019
8c9315f
Install pcp packages on compute node.
eatluri Dec 19, 2019
8e511be
Template out PCP archive config file to C-nodes
eatluri Dec 19, 2019
70f1ea2
Add pcp archive logger config template
eatluri Dec 19, 2019
62c85f6
Add var to define storage path for PCP archives.
eatluri Dec 19, 2019
0e761e9
Remove file that defines the secondary logger.
eatluri Dec 19, 2019
35ad1c0
Add task to template out supremm config to C-nodes
eatluri Dec 19, 2019
eeae982
Add pmlogger-supremm config file template.
eatluri Dec 19, 2019
a72a500
Task to template out process logging config file.
eatluri Dec 19, 2019
bd8d5e9
Add process logging config file template.
eatluri Dec 19, 2019
001c29e
Task to enable logging module for desired Agents(PMDA)
eatluri Dec 19, 2019
b60efdd
Task templating out global process capture config.
eatluri Dec 19, 2019
bcbf480
Add global process capture config file template.
eatluri Dec 19, 2019
2a513f0
Task templating out pcp-pmlogger config to C-nodes.
eatluri Dec 19, 2019
ac4edbf
Add pcp-pmlogger config file template.
eatluri Dec 19, 2019
d6c8609
Restart pmcd to apply config changes to PMDAs
eatluri Dec 19, 2019
9784101
Merge branch 'patch-fix-easybuild-error' into feat-add-pcp-on-compute
eatluri Dec 19, 2019
c25ead4
Merge branch 'patch-fix-easybuild-error' into feat-add-pcp-on-compute
eatluri Jan 13, 2020
e3320f4
Merge branch 'patch-fix-easybuild-error' into feat-add-pcp-on-compute
eatluri Jan 14, 2020
7a79138
Merge branch 'feat-openstack' into feat-add-pcp-on-compute
eatluri Mar 6, 2020
4830378
Enable and start pcp
eatluri Mar 6, 2020
270febd
Create user pcp and add it to pcp group
eatluri Mar 9, 2020
b795ef2
Remove pcp user creation, user created during rpm install.
eatluri Apr 6, 2020
0c3c6ab
Correct the task name
eatluri Apr 6, 2020
537debb
Change the pcp log dir value to pcp home dir.
eatluri Apr 17, 2020
9d0ef59
Add pcp 4.3.2 to compute nodes for XDMoD 9.0
eatluri May 21, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions compute-build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@
roles:
- compute_build_image
- install_ww_bin
- {role: pcp, tags: pcp, when: enable_pcp}
4 changes: 4 additions & 0 deletions group_vars/all
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,10 @@
user_create_scripts_path: "/opt/{{ user_create_scripts }}"
user_create_script_repo: "https://gitlab.rc.uab.edu/tr27p/ohpc_user_create.git"

# PCP
enable_pcp: true
PCP_LOG_DIR: "/home/pcp/supremm"

# RabbitMQ
rabbitmq_provision: false
rabbitmq_user: "reggie"
Expand Down
88 changes: 88 additions & 0 deletions roles/pcp/tasks/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
---

- name: Install PCP related packages
yum:
name: "{{ item }}"
state: present
loop:
- pcp-4.3.2
- pcp-manager-4.3.2
- pcp-conf-4.3.2
- pcp-libs-4.3.2
- python-pcp-4.3.2
- perl-PCP-PMDA-4.3.2
- pcp-system-tools-4.3.2
- pcp-pmda-slurm-4.3.2
- pcp-pmda-gpfs-4.3.2
- pcp-pmda-lustre-4.3.2
- pcp-pmda-infiniband-4.3.2
- pcp-pmda-mic-4.3.2
- pcp-pmda-nvidia-gpu-4.3.2
- pcp-pmda-nfsclient-4.3.2
- pcp-pmda-perfevent-4.3.2
- pcp-pmda-json-4.3.2

- name: pmlogger control file template
template:
src: control.j2
dest: /etc/pcp/pmlogger/control
owner: root
group: root
mode: 0644

- name: remove existing files under /etc/pcp/pmlogger/control.d/
file:
path: /etc/pcp/pmlogger/control.d/local
state: absent

- name: pmlogger supremm config file template
template:
src: pmlogger-supremm.config.j2
dest: /etc/pcp/pmlogger/pmlogger-supremm.config
owner: root
group: root
mode: 0644

- name: process logging config file template
template:
src: hotproc.conf.j2
dest: /var/lib/pcp/pmdas/proc/hotproc.conf
owner: root
group: root
mode: 0644

- name: enable logging modules PMDAs
file:
path: '/var/lib/pcp/pmdas/{{ item }}'
state: touch
owner: root
group: root
loop:
- slurm/.NeedInstall
- nvidia/.NeedInstall
- gpfs/.NeedInstall
- nfsclient/.NeedInstall
- perfevent/.NeedInstall
- mic/.NeedInstall

- name: Configure Global process capture
template:
src: pmcd.conf.j2
dest: /etc/pcp/pmcd/pmcd.conf
owner: root
group: root
mode: 0644

- name: Disable daily archive rollup
template:
src: pcp-pmlogger.j2
dest: /etc/cron.d/pcp-pmlogger
owner: root
group: root
mode: 0644

- name: enable pmcd and pmlogger
command: systemctl enable pmcd pmlogger

- name: start pmcd and pmlogger
command: systemctl start pmcd pmlogger
52 changes: 52 additions & 0 deletions roles/pcp/templates/control.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
#
# PCP archive logging configuration/control
#
# This file is used by various of the PCP archive logging administrative
# tools to perform maintenance on the pmlogger instances running on
# the local host.
#
# This file contains one line per host to be logged, fields are
# Host name of host to be logged
# P(rimary) is this the primary logger? y or n
# S(ocks) should this logger be launched with pmsocks? y or n
# Directory full pathname to directory where archive logs are
# to be maintained ... note all scripts "cd" to here as
# a first step
# Args optional additional arguments to pmlogger and/or pmnewlog
#

# === VARIABLE ASSIGNMENTS ===
#
# DO NOT REMOVE OR EDIT THE FOLLOWING LINE
$version=1.1

# if pmsocks is being used, edit the IP address for $SOCKS_SERVER
#$SOCKS_SERVER=123.456.789.123

# for remote loggers running over a WAN with potentially long delays
$PMCD_CONNECT_TIMEOUT=150
$PMCD_REQUEST_TIMEOUT=120

# === LOGGER CONTROL SPECIFICATIONS ===
#
#Host P? S? directory args

# local primary logger
#
# (LOCALHOSTNAME is expanded to local: in the first column,
# and to `hostname` in the fourth (directory) column.)
#
LOCALHOSTNAME y n "{{ PCP_LOG_DIR }}/pmlogger/$(date +%Y)/$(date +%m)/LOCALHOSTNAME/$(date +%Y)-$(date +%m)-$(date +%d)" -r -c /etc/pcp/pmlogger/pmlogger-supremm.config

# Note: if multiple pmloggers for the same host (e.g. both primary and
# non-primary loggers are active), then they MUST use different
# directories

# local non-primary logger
#LOCALHOSTNAME n n PCP_LOG_DIR/pmlogger/mysummary -r -T24h10m -c config.Summary

# remote host
#remote n n PCP_LOG_DIR/pmlogger/remote -r -T24h10m -c config.remote

# thru the firewall via socks
#distant n y PCP_LOG_DIR/pmlogger/distant -r -T24h10m -c config.distant
4 changes: 4 additions & 0 deletions roles/pcp/templates/hotproc.conf.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#pmdahotproc
Version 1.0

( (uname != "root") && (uname != "rpc") && (uname != "rpcuser") && (uname != "dbus") && (uname != "avahi") && (uname != "munge") && (uname != "ntp") && (uname != "nagios") && (uname != "postfix") && (uname != "pcp") && (uname != "libstoragemgmt") && (uname != "chrony") && (uname != "polkitd") ) || cpuburn > 0.1
9 changes: 9 additions & 0 deletions roles/pcp/templates/pcp-pmlogger.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#
# Performance Co-Pilot crontab entries for a monitored site
# with one or more pmlogger instances running
#
# daily processing of archive logs (with compression enabled)
#10 0 * * * pcp /usr/libexec/pcp/bin/pmlogger_daily -X xz -x 3
10 0 * * * pcp /usr/libexec/pcp/bin/pmlogger_daily -M -k forever
# every 30 minutes, check pmlogger instances are running
25,55 * * * * pcp /usr/libexec/pcp/bin/pmlogger_check -C
21 changes: 21 additions & 0 deletions roles/pcp/templates/pmcd.conf.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#
# Name Id IPC IPC Params File/Cmd
# Performance Metrics Domain Specifications
# This file is automatically generated during the build
root 1 pipe binary /var/lib/pcp/pmdas/root/pmdaroot
pmcd 2 dso pmcd_init /var/lib/pcp/pmdas/pmcd/pmda_pmcd.so
proc 3 pipe binary /var/lib/pcp/pmdas/proc/pmdaproc -d -A
xfs 11 pipe binary /var/lib/pcp/pmdas/xfs/pmdaxfs -d 11
slurm 23 pipe binary perl /var/lib/pcp/pmdas/slurm/pmdaslurm.pl
linux 60 pipe binary /var/lib/pcp/pmdas/linux/pmdalinux
nfsclient 62 pipe binary perl /var/lib/pcp/pmdas/nfsclient/pmdanfsclient.pl
mmv 70 dso mmv_init /var/lib/pcp/pmdas/mmv/pmda_mmv.so
nvidia 120 pipe binary /var/lib/pcp/pmdas/nvidia/pmdanvidia -d 120
jbd2 122 dso jbd2_init /var/lib/pcp/pmdas/jbd2/pmda_jbd2.so
perfevent 127 pipe binary /var/lib/pcp/pmdas/perfevent/pmdaperfevent -d 127
gpfs 135 pipe binary perl /var/lib/pcp/pmdas/gpfs/pmdagpfs.pl

[access]
disallow ".*" : store;
disallow ":*" : store;
allow "local:*" : all;
Loading