Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rrdesi_mpi performance notes #191

Open
dmargala opened this issue Mar 25, 2021 · 0 comments
Open

rrdesi_mpi performance notes #191

dmargala opened this issue Mar 25, 2021 · 0 comments

Comments

@dmargala
Copy link
Contributor

This is a summary of quick performance study of rrdesi_mpi. From the profile (see image below), it looks like the computation bottleneck is in calc_zchi2/calc_zchi2_one at the dot product between a sparse resolution matrix and a spectral template here. For GPU-offloading, it might be beneficial to "stack" or "batch" the operations in calc_zchi2/calc_zchi2_one but it would require deeper investigation in order to understand how targets/templates/redshifts are distributed amongst mpi ranks.

Single Cori Haswell node performance:

#- DESI environment used for cascades, swapping out your own copy of redrock
source /global/cfs/cdirs/desi/software/desi_environment.sh 21.2
module unload redrock

git clone https://github.com/desihub/redrock
export PYTHONPATH=$(pwd)/redrock/py:$PYTHONPATH
export PATH=$(pwd)/redrock/bin:$PATH
export RR_TEMPLATE_DIR=/global/common/software/desi/cori/desiconda/20200801-1.4.0-spec/code/redrock-templates/0.7.2

#- Run redrock with 32x MPI parallelism on an interactive node (~3m20s)
salloc -N 1 -C haswell -q interactive -t 1:00:00
export OMP_NUM_THREADS=1
cd /global/cfs/cdirs/desi/spectro/redux/cascades/tiles/80605/20201215/
time srun -n 32 -c 2 rrdesi_mpi spectra-0-80605-20201215.fits -o $SCRATCH/redrock-0-80605-20201215.h5 -z $SCRATCH/zbest-0-80605-20201215.fits
...
Computing redshifts took: 147.6 seconds
Writing zscan data took: 1.1 seconds
Writing zbest data took: 24.9 seconds
Total run time: 191.2 seconds

real	3m18.792s
user	0m0.071s
sys	0m0.037s

Use python -m cProfile ... with a single mpi rank to generate a profile:

srun -n 1 -c 2 --cpu-bind=cores python -m cProfile -o profile.pstats $(which rrdesi_mpi) spectra-0-80605-20201215.fits -o $SCRATCH/redrock-0-80605-20201215.h5 -z $SCRATCH/zbest-0-80605-20201215.fits
...
Computing redshifts took: 2639.9 seconds
Writing zscan data took: 1.2 seconds
Writing zbest data took: 24.8 seconds
Total run time: 2693.3 seconds

rrdesi_mpi_n1_profile

Half DGX node performance:

#- start an interactive session using half a DGX node
salloc -C dgx -N 1 -G 4 -c 64 -t 60
module load python cuda/11.1.1 gcc openmpi

#- create a fresh gpu+mpi ready conda env
conda create -n gpu-redrock-dgx
source activate gpu-redrock-dgx
conda install -y numpy scipy numba pyyaml astropy matplotlib
pip install fitsio healpy speclite cupy-cuda111 h5py
#- build mpi from source
git clone https://bitbucket.org/mpi4py/mpi4py.git
cd mpi4py/
python setup.py build
python setup.py install
cd ..
#- desi specific pip installs
pip install git+https://github.com/desihub/desiutil.git
pip install git+https://github.com/desihub/desitarget.git
pip install git+https://github.com/desihub/desispec.git

#- Install redrock in develop mode for experimenting
git clone https://github.com/desihub/redrock
cd redrock
pip install -e .

export RR_TEMPLATE_DIR=/global/common/software/desi/cori/desiconda/20200801-1.4.0-spec/code/redrock-templates/0.7.2
export OMP_NUM_THREADS=1
cd /global/cfs/cdirs/desi/spectro/redux/cascades/tiles/80605/20201215/
time srun -n 32 -c 2 --cpu-bind=cores rrdesi_mpi spectra-0-80605-20201215.fits -o $SCRATCH/redrock-0-80605-20201215.h5 -z $SCRATCH/zbest-0-80605-20201215.fits
...
Computing redshifts took: 81.7 seconds
Writing zscan data took: 4.6 seconds
Writing zbest data took: 0.1 seconds
Total run time: 93.6 seconds

real	1m58.986s
user	0m0.020s
sys	0m0.043s
@dmargala dmargala mentioned this issue Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant