Skip to content

Commit

Permalink
Update running.md
Browse files Browse the repository at this point in the history
  • Loading branch information
yandthj authored Jul 30, 2024
1 parent e058a73 commit 4051324
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions docs/Documentation/Systems/Kestrel/running.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Standard CPU-based compute nodes on Kestrel have 104 cores and 240G of usable RA


### GPU Nodes
Kestrel has 132 GPU nodes with 4 NVIDIA H100 GPUs, each with 80GB memory. These have Dual socket AMD Genoa 64-core processors (128 cores total) with about 350G of usable RAM. The GPU nodes also have 3.20TB of NVMe local disk.
Kestrel has 132 GPU nodes with 4 NVIDIA H100 GPUs, each with 80GB memory. These have Dual socket AMD Genoa 64-core processors (128 cores total) with about 350G of usable RAM. The GPU nodes also have 3.4TB of NVMe local disk.



Expand Down Expand Up @@ -88,7 +88,7 @@ To request use of a GPU, use the flag `--gpus=<quantity>` with sbatch, srun, or

**If your job will require more than the default 1 CPU core and 1G of CPU RAM per core allocated**, you must request the quantity of cores and/or RAM that you will need, by using additional flags such as `--ntasks=` or `--mem=`. To request all of the memory available on the GPU node, use `--mem=0`.

The GPU nodes also have 3.20 TB of local disk space. Note that other jobs running on the same GPU node could also be using this space. Slurm is unable to divide this space to separate jobs on the same node like it does for memory or CPUs. If you need to ensure that your job has exclusive access to all of the disk space, you'll need to use the `--exclusive` flag to prevent the node from being shared with other jobs.
The GPU nodes also have 3.4 TB of local disk space. Note that other jobs running on the same GPU node could also be using this space. Slurm is unable to divide this space to separate jobs on the same node like it does for memory or CPUs. If you need to ensure that your job has exclusive access to all of the disk space, you'll need to use the `--exclusive` flag to prevent the node from being shared with other jobs.

!!! warning
A job with the ` --exclusive` flag will be allocated all of the CPUs and GPUs on a node, but is only allocated as much memory as requested. Use the flag `--mem=0` to request all of the CPU RAM on the node.
Expand Down

0 comments on commit 4051324

Please sign in to comment.