-
Notifications
You must be signed in to change notification settings - Fork 11
Tutorial on running 2x2_sim
NOTE: This tutorial has been superseded by the April 2024 edition.
This tutorial is intended to provide a high-level understanding of how the 2x2 simulation chain works, how to run it interactively (at NERSC), and how you can modify it, if desired. The larnd-sim detector simulation makes use of GPUs, and NERSC is, for us, an especially convenient and abundant source of GPU resources, so that's where we've been running the chain. We are happy to provide support for anyone who wants to run the chain elsewhere, but that's not on the agenda for this session. Nor is this session intended to provide details on the underlying simulation packages. Nor is it intended to describe how we've been running larger-scale productions at NERSC.
The chain is based on a small set of separate, decoupled packages that "communicate" only via the files that they produce. These packages are:
-
GENIE: The event generator. Reponsible for taking the NuMI flux files and the geometry description, and generating neutrino interactions.
-
edep-sim: The Geant4 wrapper. Reponsible for taking the outgoing particles from GENIE interactions and propagating them through the geometry, recording the particle trajectories and any energy deposited in active ("sensitive") detector volumes.
GENIE and edep-sim are combined in a single step, run-edep-sim
, which we run in two parallel paths, with different GENIE geometries (but the same edep-sim geometry, namely the full rock + hall + detector geometry):
-
The "nu" path, where GENIE sees a geometry that contains the hall and detectors (MINERvA and the 2x2), but none of the surrounding rock.
-
The "rock" path, where GENIE sees a geometry that just has the rock and an empty hall.
The purpose of this two-path approach is to keep the rock interactions in a separate sample, so that they can be reused later if necessary for conserving computational resources.
Returning to the packages that make up the chain:
-
hadd: From ROOT; responsible for merging multiple edep-sim outputs into a single file. We do this in order to decouple the walltime requirements of GENIE+edep-sim from those of the later steps.
-
The spill builder: A simple ROOT macro that takes "nu" and "rock" edep-sim files and, based on POT per spill, spill structure, and spill separation, overlays the events into spills.
At this point, the ROOT files from the spill builder are passed on to the MINERvA chain. For the 2x2, the steps continue:
-
larnd-sim: The detector simulation for the charge (LArPix) and light readout. Written in Python but with the "heavy lifting" compiled to GPU (CUDA) binary using Numba.
-
ndlar_flow: Calibration and low-level reconstruction. Written in numpy-based Python using Peter's "h5flow" framework.
-
Validation plotter: Produces matplotlib-based validation plots as multi-page PDFs, for the various preceding steps.
Also worth mentioning: The "g4numi" package is responsible for running the (Geant-based) NuMI beamline simulation and producing the "dk2nu" flux files that GENIE consumes. However, we don't run g4numi as an explicit step in this chain. Instead, we've been using a static set of dk2nu files copied over from a previous g4numi production run at the Fermilab cluster.
If you'd like to follow this tutorial directly, you will need a computing account at NERSC. To request an account, contact Callum Wilkinson. Assuming you have an account, you'll want to run these steps on the new Perlmutter system, which provides GPUs. To log in:
ssh saul-p1.nersc.gov
If you want to run the chain elsewhere, make sure that GPUs are available. The Wilson Cluster at Fermilab and the S(3)DF cluster at SLAC are a couple of options. The main change you will need to make will be the container setup (see the next section). NERSC has its own "Shifter" container runtime, which can import Docker containers from Dockerhub. Meanwhile, WC and SDF support Singularity/Apptainer. A container is used for the steps prior to larnd-sim, so you will need to modify the tops of those scripts in order to enter the container using apptainer instead of Shifter. You will also need to replace the "module load" commands, which are NERSC-specific.
This chain relies on a container to provide the "non-Python" dependencies for the steps prior to larnd-sim: ROOT, GENIE, Geant, edep-sim, etc. The container is built using Apptainer (formerly known as Singularity). The resulting .sif file can be used for running on a non-NERSC system. For running at NERSC, we use singularity2docker to convert the .sif to a Docker image, which then gets uploaded to Dockerhub (here) and imported into Shifter. The repository also contains a script (in admin/
) to pull the container from DockerHub for use with Singularity if not using Shifter.
The important point here is that the Shifter image is already imported and available to all users on Perlmutter, so there's nothing you need to do. If you'd like to enter the container interactively, do:
shifter --image=mjkramer/sim2x2:genie_edep.LFG_testing.20230228.v2 -- /bin/bash --init-file /environment
If you need to rebuild or modify the container, see the 2x2Containers repo.
git clone https://github.com/DUNE/2x2_sim.git
The following needs to be run just once, in a fresh clone of the repo, from the host OS (e.g. a fresh login), not the container. It's responsible for setting up necessary Python virtual environments, etc. Run it from the top level (or root) of the 2x2_sim
directory:
admin/install_everything.sh
There are seven subdirectories that contain the individual steps in the chain. In the order in which they're run:
- run-edep-sim (includes GENIE)
- run-hadd
- run-spill-build
- run-convert2h5
- run-larnd-sim
- run-ndlar-flow
- validation
Within each of these subdirectories, there's a corresponding "run script", e.g. run_edep_sim.sh
. These scripts should be run directly from the native Perlmutter OS, not from inside the container. The scripts themselves will take care of entering the container, loading any necessary modules or Python environments, etc.
The run scripts do not take any command-line arguments. Instead, they are controlled entirely by environment variables which, by convention, begin with ARCUBE_
. A couple of important common environment variables:
-
ARCUBE_RUNTIME
: The container runtime to use when running the 2x2 sim. Current valid options areSHIFTER
andSINGULARITY
; the default option isSHIFTER
. -
ARCUBE_CONTAINER
: Name of the container to use when running the 2x2 sim. The name is slightly different between use with Shifter (name of container on DockerHub) or Singularity (name of container .sif file). -
ARCUBE_CONTAINER_DIR
: Path/directory where the Singularity container is stored. Not used when using Shifter. -
ARCUBE_DIR
: The top-level (or root) location of the 2x2 sim directory (e.g./path/to/2x2_sim
). This is needed for Singularity to properly bind the directory to ensure it is mounted when using networked file systems. -
ARCUBE_OUT_NAME
: The name of the output directory. Output filenames will also be prefixed by$ARCUBE_OUT_NAME
. By convention, we setARCUBE_OUT_NAME
to be the name of the "production", separated by a period from the abbreviated name of the step, e.g.MiniRun3.larnd
-
ARCUBE_INDEX
: For a multiple-file "production", this is the ID of the file being produced. It is included as part of the output filename. For MiniRun3, the run-edep-simARCUBE_INDEX
ran from 0 to 10239, but we thenhadd
ed those files in blocks of 10, so thatARCUBE_INDEX
ran from 0 to 1023 for subsequent steps. For the purpose of this tutorial, we will use 0 to 10 and just 0, respectively.
Within each subdirectory, e.g., run-edep-sim
, the output files will appear in e.g. run-edep-sim/output/$ARCUBE_OUT_NAME
. Within this "subsubsubdirectory", you will find further "all-caps" directories that indicate the type of the file, e.g. GENIE
or LARNDSIM
. Finally, the files themselves have names that roughly look like e.g. ${ARCUBE_OUT_NAME}.${ARCUBE_INDEX}.LARNDSIM.h5
The examples below are more-or-less copy-pasted from e.g. run-edep-sim/tests/test_MiniRun3.nu.edep-sim.sh
. If you're copy pasting from here, you'll want to first do
TWOBYTWO_SIM=/path/to/your/clone/of/2x2_sim
When all is said and done, you'll have a file with 200 spills.
First we generate a set of 10 "nu" files:
cd $TWOBYTWO_SIM/run-edep-sim
export ARCUBE_CONTAINER='mjkramer/sim2x2:genie_edep.LFG_testing.20230228.v2'
export ARCUBE_CHERRYPICK='0'
export ARCUBE_DET_LOCATION='ProtoDUNE-ND'
export ARCUBE_DK2NU_DIR='/global/cfs/cdirs/dune/users/2x2EventGeneration/NuMI_dk2nu/newtarget-200kA_20220409'
export ARCUBE_EDEP_MAC='macros/2x2_beam.mac'
export ARCUBE_EXPOSURE='1E15'
export ARCUBE_GEOM='geometry/Merged2x2MINERvA_v2/Merged2x2MINERvA_v2_noRock.gdml'
export ARCUBE_GEOM_EDEP='geometry/Merged2x2MINERvA_v2/Merged2x2MINERvA_v2_withRock.gdml'
export ARCUBE_TUNE='D22_22a_02_11b'
export ARCUBE_XSEC_FILE='/global/cfs/cdirs/dune/users/2x2EventGeneration/inputs/NuMI/D22_22a_02_11b.all.LFG_testing.20230228.spline.xml'
export ARCUBE_OUT_NAME='test_MiniRun3.nu'
for i in $(seq 0 9); do
ARCUBE_INDEX=$i ./run_edep_sim.sh &
done
wait
Then we generate a set of 10 "rock" files:
cd $TWOBYTWO_SIM/run-edep-sim
export ARCUBE_CONTAINER='mjkramer/sim2x2:genie_edep.LFG_testing.20230228.v2'
export ARCUBE_CHERRYPICK='0'
export ARCUBE_DET_LOCATION='ProtoDUNE-ND-Rock'
export ARCUBE_DK2NU_DIR='/global/cfs/cdirs/dune/users/2x2EventGeneration/NuMI_dk2nu/newtarget-200kA_20220409'
export ARCUBE_EDEP_MAC='macros/2x2_beam.mac'
export ARCUBE_EXPOSURE='1E15'
export ARCUBE_GEOM='geometry/Merged2x2MINERvA_v2/Merged2x2MINERvA_v2_justRock.gdml'
export ARCUBE_GEOM_EDEP='geometry/Merged2x2MINERvA_v2/Merged2x2MINERvA_v2_withRock.gdml'
export ARCUBE_TUNE='D22_22a_02_11b'
export ARCUBE_XSEC_FILE='/global/cfs/cdirs/dune/users/2x2EventGeneration/inputs/NuMI/D22_22a_02_11b.all.LFG_testing.20230228.spline.xml'
export ARCUBE_OUT_NAME='test_MiniRun3.rock'
for i in $(seq 0 9); do
ARCUBE_INDEX=$i ./run_edep_sim.sh &
done
wait
Now we hadd together the "nu" files:
cd $TWOBYTWO_SIM/run-hadd
export ARCUBE_CONTAINER='mjkramer/sim2x2:genie_edep.LFG_testing.20230228.v2'
export ARCUBE_HADD_FACTOR='10'
export ARCUBE_IN_NAME='test_MiniRun3.nu'
export ARCUBE_OUT_NAME='test_MiniRun3.nu.hadd'
export ARCUBE_INDEX='0'
./run_hadd.sh
And likewise for the "rock" files:
cd $TWOBYTWO_SIM/run-hadd
export ARCUBE_CONTAINER='mjkramer/sim2x2:genie_edep.LFG_testing.20230228.v2'
export ARCUBE_HADD_FACTOR='10'
export ARCUBE_IN_NAME='test_MiniRun3.rock'
export ARCUBE_OUT_NAME='test_MiniRun3.rock.hadd'
export ARCUBE_INDEX='0'
./run_hadd.sh
cd $TWOBYTWO_SIM/run-spill-build
export ARCUBE_CONTAINER='mjkramer/sim2x2:genie_edep.LFG_testing.20230228.v2'
export ARCUBE_NU_NAME='test_MiniRun3.nu.hadd'
export ARCUBE_NU_POT='1E16'
export ARCUBE_ROCK_NAME='test_MiniRun3.rock.hadd'
export ARCUBE_ROCK_POT='1E16'
export ARCUBE_OUT_NAME='test_MiniRun3.spill'
export ARCUBE_INDEX='0'
./run_spill_build.sh
cd $TWOBYTWO_SIM/run-convert2h5
export ARCUBE_CONTAINER='mjkramer/sim2x2:genie_edep.LFG_testing.20230228.v2'
export ARCUBE_SPILL_NAME='test_MiniRun3.spill'
export ARCUBE_OUT_NAME='test_MiniRun3.convert2h5'
export ARCUBE_INDEX='0'
./run_convert2h5.sh
This step requires a GPU, ideally all to itself, to ensure that enough GPU memory is available. Each Perlmutter login node has one A100 GPU, which may or may not be hogged by someone else. You can check by running nvidia-smi
. If the GPU is in use, you can try logging into other login nodes until you hit the jackpot, or you can request interactive access to a compute node by running
salloc -q interactive -C gpu -t 20
which will give you 20 minutes on a GPU node, which has four A100 GPUs. Whether on a login or compute node, you can run:
cd $TWOBYTWO_SIM/run-larnd-sim
export ARCUBE_CONVERT2H5_NAME='test_MiniRun3.convert2h5'
export ARCUBE_OUT_NAME='test_MiniRun3.larnd'
export ARCUBE_INDEX='0'
./run_larnd_sim.sh
cd $TWOBYTWO_SIM/run-ndlar-flow
export ARCUBE_IN_NAME='test_MiniRun3.larnd'
export ARCUBE_OUT_NAME='test_MiniRun3.flow'
export ARCUBE_INDEX='0'
./run_ndlar_flow.sh
cd $TWOBYTWO_SIM/validation
export ARCUBE_EDEP_NAME='test_MiniRun3.convert2h5'
export ARCUBE_LARND_NAME='test_MiniRun3.larnd'
export ARCUBE_FLOW_NAME='test_MiniRun3.flow'
export ARCUBE_OUT_NAME='test_MiniRun3.plots'
export ARCUBE_INDEX='0'
./run_validation.sh
For the overlaying of the MINERvA and 2x2 geometries, see https://github.com/lbl-neutrino/GeoMergeFor2x2
https://github.com/lbl-neutrino/2x2Containers