Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot load stack-mvapich2/2.3.7 module on Hercules #1985

Closed
zach1221 opened this issue Nov 7, 2023 · 17 comments
Closed

Cannot load stack-mvapich2/2.3.7 module on Hercules #1985

zach1221 opened this issue Nov 7, 2023 · 17 comments
Assignees
Labels
bug Something isn't working

Comments

@zach1221
Copy link
Collaborator

zach1221 commented Nov 7, 2023

Description

When running UFS-WM regression tests, gnu compilation fails on Hercules due to inability to load stack-mvapich2 as part of ufs_hercules.gnu.lua file.

image

@zach1221 zach1221 added the bug Something isn't working label Nov 7, 2023
@zach1221
Copy link
Collaborator Author

zach1221 commented Nov 7, 2023

#1942
@natalie-perlin fyi

@natalie-perlin
Copy link
Collaborator

A module stack-mvapich2/2.3.7 has two prerequsites listed, slurm/22.05.8 and mvapich2/2.3.7. However, the mvapich2/2.3.7 loads slurm/23.02.6, and so the presequsited for stack-mvapich2/2.3.7 cannot be met. The modulefile needs to be changed to resolve this conflict. Opening an issue in spack-stack repostitory.

@natalie-perlin
Copy link
Collaborator

Added a comment to the existing issue where Hercules mvapich2 module has been discussed.
JCSDA/spack-stack#861 (comment)

@climbfuji
Copy link
Collaborator

Can you switch to spack-stack-1.5.1 there? Otherwise I'll fix it on Hercules in the 1.5.0 tree.

@zach1221
Copy link
Collaborator Author

zach1221 commented Nov 7, 2023

Does anything else need to be changed in the Hercules gnu lua file to use 1.5.1? Other than the modulefile path of course.

@climbfuji
Copy link
Collaborator

climbfuji commented Nov 7, 2023 via email

@zach1221
Copy link
Collaborator Author

zach1221 commented Nov 7, 2023

which of these 1.5.1 stacks installations should I use? ue-gcc12-mvap2 or unified-env
image

@climbfuji
Copy link
Collaborator

unified-env please

@climbfuji
Copy link
Collaborator

we rebuilt it after testing in ue-gcc12-mvap2

@zach1221
Copy link
Collaborator Author

zach1221 commented Nov 7, 2023

Do you know what could have changed on hercules to cause this issue with 1.5.0 between when we tested/merged PR 1920, and now?

@climbfuji
Copy link
Collaborator

I think I know what's going on and I can fix it this morning after my meetings.

@zach1221
Copy link
Collaborator Author

zach1221 commented Nov 7, 2023

I think I know what's going on and I can fix it this morning after my meetings.

Ok, let me know if you're able to fix the existing 1.5.0. I wonder if Hercules system admins are messing with the slurm modules?

@climbfuji
Copy link
Collaborator

I think I know what's going on and I can fix it this morning after my meetings.

Ok, let me know if you're able to fix the existing 1.5.0. I wonder if Hercules system admins are messing with the slurm modules?

They did, yes.

@climbfuji
Copy link
Collaborator

Basically they removed the slurm version that mvapich2 was built with for 1.5.0, shortly after that build was done, and replaced it with the new version 23.x.y. I used that to build mvapich2 for 1.5.1. But I think it's possible to change the 1.5.0 modulefiles and it still works. I will work on this now.

@zach1221
Copy link
Collaborator Author

zach1221 commented Nov 7, 2023

Basically they removed the slurm version that mvapich2 was built with for 1.5.0, shortly after that build was done, and replaced it with the new version 23.x.y. I used that to build mvapich2 for 1.5.1. But I think it's possible to change the 1.5.0 modulefiles and it still works. I will work on this now.

Ok, thank you, much appreciated. Would it be worth it to reach out to the Hercules admins to try to prevent something like this from happening again? Maybe going forward they can leave the old slurm module when adding new versions.

@climbfuji
Copy link
Collaborator

Basically they removed the slurm version that mvapich2 was built with for 1.5.0, shortly after that build was done, and replaced it with the new version 23.x.y. I used that to build mvapich2 for 1.5.1. But I think it's possible to change the 1.5.0 modulefiles and it still works. I will work on this now.

Ok, thank you, much appreciated. Would it be worth it to reach out to the Hercules admins to try to prevent something like this from happening again? Maybe going forward they can leave the old slurm module when adding new versions.

No, please not ;-) who knows what they'll end up doing. About going forward, yes, I did talk to them about that already and they are aware of the need for consistency and some sort of backward compatibility.

@climbfuji
Copy link
Collaborator

@zach1221 I fixed this on Hercules in spack-stack 1.5.0. It's a manual fix, but since we won't be updating 1.5.0 anymore (1.5.1 is ready), that's good enough. I ran the following commands successfully on Hercules:

./compile.sh hercules "-DAPP=S2SWA -D32BIT=ON -DCCPP_SUITES=FV3_GFS_v17_coupled_p8,FV3_GFS_v17_coupled_p8_ugwpv1" "" gnu YES YES 2>&1 |tee compile.log
./rt.sh  -e -n cpld_control_p8 gnu -a gsd-hpcs 2>&1 | tee rt_cpld_control_p8_gnu.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

3 participants