-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(slurmctld): Support InfluxDB #70
Open
jamesbeedy
wants to merge
8
commits into
charmed-hpc:main
Choose a base branch
from
jamesbeedy:influxdb_interface
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
These changes add and modify code pertaining pertaining to the support of InfluxDB for a job profiling database. Added pip requirements: netifaces-plus, influxdb Changes: * cluster_name changed to a public property of the charm and is now stored in the peer-relation application data. * Removed default cluster-name charm config and replaced with generated name "charmed-hpc-XXXX" if no charm config is supplied. * Added _on_start charm event to allow slurmctld-peer interface time to join after _on_start hook is called and before the slurm.conf is written. * Added influxdb interface for relating to influxdb charm. * Adjusted unit tests to account for new cluster_name changes. * Added slurmctld-peer interface for storing cluster_name.
jamesbeedy
commented
Jan 27, 2025
A race condition occurs when slurmctld is related to slurmrestd and hasn't written the slurm config file yet. These changes add an additional check in check_status for the existance of the slurm.conf. Fixes: charmed-hpc#71
Signed-off-by: Jason C. Nucciarone <[email protected]>
These changes improve the influxdb relation by using relation-joined instead of relation-changed and fix an error with departing units stuck in the new_nodes stored state by removing them from new_nodes when a unit departs. * Add logging for new_node reconcile accounting. * These changes fix the slurmdbd joined/changed race by adding logic to the config rendering that will disallow the AltAuth/jwt_key from making it into the configuration until we actually have the key from slurmctld. * Remove unused slurmd overried template. * Update slurmdbd.pid filepath and add slurmdbd override.
jamesbeedy
force-pushed
the
influxdb_interface
branch
from
January 29, 2025 22:58
5cd7aac
to
552329a
Compare
jamesbeedy
requested review from
jedel1043 and
dsloanm
and removed request for
a team
January 29, 2025 23:01
jamesbeedy
commented
Jan 31, 2025
Use the same command to test with slurm and ssh, add the `-s` option to the `hostname` command.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
These changes add and modify code pertaining to the support of InfluxDB for a job profiling database.
Added pip requirements: netifaces-plus, influxdb
Changes:
cluster_name changed to a public property of the charm and is now stored in the peer-relation application data.
Removed default cluster-name charm config and replaced with generated name "charmed-hpc-XXXX" if no charm config is supplied.
Added _on_start charm event to allow slurmctld-peer interface time to join after _on_install hook completes and before the slurm.conf is written (now in _on_start()).
Added influxdb interface for relating to influxdb charm.
Improve the influxdb relation by using relation-joined instead of relation-changed and fix an error with departing units stuck in the new_nodes stored state by removing them from new_nodes when a unit departs.
Add new_node stored state reconciliation on slurmd_departed for nodes that have been removed from the relation but remain in new_nodes in stored state.
Fix the slurmdbd joined/changed race by adding logic to the config rendering that will disallow the AltAuth/jwt_key from making it into the configuration until we actually have the key from slurmctld.
Remove unused slurmd override template.
Update slurmdbd.pid filepath and add slurmdbd override.
Adjusted unit tests to account for new cluster_name changes.
Added slurmctld-peer interface for storing cluster_name.
Drive by fix for slurmdbd failing to start and false positive status with extra conditional in check_status, Fixes Slurmdbd false positive active status #74 and Slurmdbd failing to start #75
Fixes: #52, #56, #18
Depends on: charmed-hpc/hpc-libs#66, charmed-hpc/hpc-libs#65, charmed-hpc/hpc-libs#64
TODO: Integration tests