Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cod slurm role #489

Merged
merged 6 commits into from
Nov 1, 2024
Merged

Conversation

eesaanatluri
Copy link

@eesaanatluri eesaanatluri commented Oct 31, 2024

This PR will fix errors seen during the cod build. The scontrol reconfigure failed because of changes to slurm.conf in the cod_slurm role

Fixes https://gitlab.rc.uab.edu/rc/gpfs5-migration/-/issues/62

This will avoid errors showing up in slurmctld status that fastschedule is
deprecated and should be removed rfom slurm 20.x
The restart of CMDaemon should come immediately after changes to freeze slurm
conf file update by CMD service. Otherwise, CMD will undo the changes to put
default slurm config
Fixes the error seen while running scontrol reconfigure in next step.
cod_slurm templates the slurm config and freezes all overwrites to slurm.conf by
CMD. The desired config should be available by the start of compute node build
not after, so that they will picked up during the build.
This way slurmd restart is not needed on the compute node.
This will give some time for slurmctld service to startup after a
restart so that during scontrol reconfigure it won't error saying
slurmctld is unreachable
@diedpigs diedpigs merged commit a4e655b into jprorama:dev Nov 1, 2024
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants