Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation about configuring SLURM on GH200 #12

Merged
merged 12 commits into from
Feb 20, 2025
Merged

Conversation

msimberg
Copy link
Contributor

This is a starting point for docs about configuring slurm for GH200. It covers single and multiple ranks per GPU. It mentions the default process mode which can be a big footgun, and recommends use of MPS. The MPS wrapper script is the one from the current knowledge base.

I expect some refactoring might be useful if/when there are MI300 docs for slurm as some of it might be similar, but I haven't attempted to put that in a generic section yet at this point.

@msimberg msimberg requested a review from bcumming February 17, 2025 12:53
@msimberg msimberg force-pushed the slurm-gh200 branch 2 times, most recently from 7922cd8 to 3211a0b Compare February 17, 2025 16:21
Copy link

preview available: https://docs.tds.cscs.ch/12

Copy link

preview available: https://docs.tds.cscs.ch/12

Copy link

preview available: https://docs.tds.cscs.ch/12

Copy link

preview available: https://docs.tds.cscs.ch/12

@msimberg msimberg requested a review from RMeli February 19, 2025 10:20
Copy link

preview available: https://docs.tds.cscs.ch/12

@bcumming bcumming merged commit a2be996 into main Feb 20, 2025
1 check passed
@RMeli RMeli deleted the slurm-gh200 branch February 20, 2025 08:37
@RMeli RMeli mentioned this pull request Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants