Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(slurmctld): Support InfluxDB #70

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

jamesbeedy
Copy link
Contributor

@jamesbeedy jamesbeedy commented Jan 27, 2025

These changes add and modify code pertaining to the support of InfluxDB for a job profiling database.

Added pip requirements: netifaces-plus, influxdb

Changes:

  • cluster_name changed to a public property of the charm and is now stored in the peer-relation application data.

  • Removed default cluster-name charm config and replaced with generated name "charmed-hpc-XXXX" if no charm config is supplied.

  • Added _on_start charm event to allow slurmctld-peer interface time to join after _on_install hook completes and before the slurm.conf is written (now in _on_start()).

  • Added influxdb interface for relating to influxdb charm.

  • Improve the influxdb relation by using relation-joined instead of relation-changed and fix an error with departing units stuck in the new_nodes stored state by removing them from new_nodes when a unit departs.

  • Add new_node stored state reconciliation on slurmd_departed for nodes that have been removed from the relation but remain in new_nodes in stored state.

  • Fix the slurmdbd joined/changed race by adding logic to the config rendering that will disallow the AltAuth/jwt_key from making it into the configuration until we actually have the key from slurmctld.

  • Remove unused slurmd override template.

  • Update slurmdbd.pid filepath and add slurmdbd override.

  • Adjusted unit tests to account for new cluster_name changes.

  • Added slurmctld-peer interface for storing cluster_name.

  • Drive by fix for slurmdbd failing to start and false positive status with extra conditional in check_status, Fixes Slurmdbd false positive active status #74 and Slurmdbd failing to start #75

Fixes: #52, #56, #18
Depends on: charmed-hpc/hpc-libs#66, charmed-hpc/hpc-libs#65, charmed-hpc/hpc-libs#64

TODO: Integration tests

These changes add and modify code pertaining pertaining to
the support of InfluxDB for a job profiling database.

Added pip requirements: netifaces-plus, influxdb

Changes:
* cluster_name changed to a public property of the charm and is now stored in the peer-relation application data.

* Removed default cluster-name charm config and replaced with generated name "charmed-hpc-XXXX" if no charm config is supplied.

* Added _on_start charm event to allow slurmctld-peer interface time to join after _on_start hook is called and before the slurm.conf is written.

* Added influxdb interface for relating to influxdb charm.

* Adjusted unit tests to account for new cluster_name changes.

* Added slurmctld-peer interface for storing cluster_name.
jamesbeedy and others added 4 commits January 27, 2025 12:26
A race condition occurs when slurmctld is related to slurmrestd
and hasn't written the slurm config file yet.

These changes add an additional check in check_status for the
existance of the slurm.conf.

Fixes: charmed-hpc#71
@jamesbeedy jamesbeedy changed the title Support InfluxDB feat(slurmctld): Support InfluxDB Jan 29, 2025
These changes improve the influxdb relation by using
relation-joined instead of relation-changed and fix an error
with departing units stuck in the new_nodes stored state by
removing them from new_nodes when a unit departs.

* Add logging for new_node reconcile accounting.

* These changes fix the slurmdbd joined/changed race by
adding logic to the config rendering that will disallow
the AltAuth/jwt_key from making it into the configuration
until we actually have the key from slurmctld.

* Remove unused slurmd overried template.

* Update slurmdbd.pid filepath and add slurmdbd override.
@jamesbeedy jamesbeedy marked this pull request as ready for review January 29, 2025 23:01
@jamesbeedy jamesbeedy requested a review from a team as a code owner January 29, 2025 23:01
@jamesbeedy jamesbeedy requested review from jedel1043 and dsloanm and removed request for a team January 29, 2025 23:01
@NucciTheBoss NucciTheBoss self-requested a review January 30, 2025 14:21
@NucciTheBoss NucciTheBoss added the enhancement New feature or request label Jan 30, 2025
jamesbeedy and others added 2 commits January 31, 2025 08:34
Use the same command to test with slurm and ssh, add the `-s` option to the `hostname` command.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants