Tutorial on running 2x2_sim

2x2_sim tutorial

NOTE: This tutorial has been superseded by the April 2024 edition.

Goals and non-goals

This tutorial is intended to provide a high-level understanding of how the 2x2 simulation chain works, how to run it interactively (at NERSC), and how you can modify it, if desired. The larnd-sim detector simulation makes use of GPUs, and NERSC is, for us, an especially convenient and abundant source of GPU resources, so that's where we've been running the chain. We are happy to provide support for anyone who wants to run the chain elsewhere, but that's not on the agenda for this session. Nor is this session intended to provide details on the underlying simulation packages. Nor is it intended to describe how we've been running larger-scale productions at NERSC.

Overview of the chain

The chain is based on a small set of separate, decoupled packages that "communicate" only via the files that they produce. These packages are:

GENIE: The event generator. Reponsible for taking the NuMI flux files and the geometry description, and generating neutrino interactions.
edep-sim: The Geant4 wrapper. Reponsible for taking the outgoing particles from GENIE interactions and propagating them through the geometry, recording the particle trajectories and any energy deposited in active ("sensitive") detector volumes.

GENIE and edep-sim are combined in a single step, run-edep-sim, which we run in two parallel paths, with different GENIE geometries (but the same edep-sim geometry, namely the full rock + hall + detector geometry):

The "nu" path, where GENIE sees a geometry that contains the hall and detectors (MINERvA and the 2x2), but none of the surrounding rock.
The "rock" path, where GENIE sees a geometry that just has the rock and an empty hall.

The purpose of this two-path approach is to keep the rock interactions in a separate sample, so that they can be reused later if necessary for conserving computational resources.

Returning to the packages that make up the chain:

hadd: From ROOT; responsible for merging multiple edep-sim outputs into a single file. We do this in order to decouple the walltime requirements of GENIE+edep-sim from those of the later steps.
The spill builder: A simple ROOT macro that takes "nu" and "rock" edep-sim files and, based on POT per spill, spill structure, and spill separation, overlays the events into spills.

At this point, the ROOT files from the spill builder are passed on to the MINERvA chain. For the 2x2, the steps continue:

larnd-sim: The detector simulation for the charge (LArPix) and light readout. Written in Python but with the "heavy lifting" compiled to GPU (CUDA) binary using Numba.
ndlar_flow: Calibration and low-level reconstruction. Written in numpy-based Python using Peter's "h5flow" framework.
Validation plotter: Produces matplotlib-based validation plots as multi-page PDFs, for the various preceding steps.

Also worth mentioning: The "g4numi" package is responsible for running the (Geant-based) NuMI beamline simulation and producing the "dk2nu" flux files that GENIE consumes. However, we don't run g4numi as an explicit step in this chain. Instead, we've been using a static set of dk2nu files copied over from a previous g4numi production run at the Fermilab cluster.

Computing accounts at NERSC

If you'd like to follow this tutorial directly, you will need a computing account at NERSC. To request an account, contact Callum Wilkinson. Assuming you have an account, you'll want to run these steps on the new Perlmutter system, which provides GPUs. To log in:

ssh saul-p1.nersc.gov

If you want to run the chain elsewhere, make sure that GPUs are available. The Wilson Cluster at Fermilab and the S(3)DF cluster at SLAC are a couple of options. The main change you will need to make will be the container setup (see the next section). NERSC has its own "Shifter" container runtime, which can import Docker containers from Dockerhub. Meanwhile, WC and SDF support Singularity/Apptainer. A container is used for the steps prior to larnd-sim, so you will need to modify the tops of those scripts in order to enter the container using apptainer instead of Shifter. You will also need to replace the "module load" commands, which are NERSC-specific.

The container

This chain relies on a container to provide the "non-Python" dependencies for the steps prior to larnd-sim: ROOT, GENIE, Geant, edep-sim, etc. The container is built using Apptainer (formerly known as Singularity). The resulting .sif file can be used for running on a non-NERSC system. For running at NERSC, we use singularity2docker to convert the .sif to a Docker image, which then gets uploaded to Dockerhub (here) and imported into Shifter. The repository also contains a script (in admin/) to pull the container from DockerHub for use with Singularity if not using Shifter.

The important point here is that the Shifter image is already imported and available to all users on Perlmutter, so there's nothing you need to do. If you'd like to enter the container interactively, do:

shifter --image=mjkramer/sim2x2:genie_edep.LFG_testing.20230228.v2 -- /bin/bash --init-file /environment

If you need to rebuild or modify the container, see the 2x2Containers repo.

Getting the code

git clone https://github.com/DUNE/2x2_sim.git

One-time setup

The following needs to be run just once, in a fresh clone of the repo, from the host OS (e.g. a fresh login), not the container. It's responsible for setting up necessary Python virtual environments, etc. Run it from the top level (or root) of the 2x2_sim directory:

admin/install_everything.sh

General structure

There are seven subdirectories that contain the individual steps in the chain. In the order in which they're run:

run-edep-sim (includes GENIE)
run-hadd
run-spill-build
run-convert2h5
run-larnd-sim
run-ndlar-flow
validation

Within each of these subdirectories, there's a corresponding "run script", e.g. run_edep_sim.sh. These scripts should be run directly from the native Perlmutter OS, not from inside the container. The scripts themselves will take care of entering the container, loading any necessary modules or Python environments, etc.

Environment variables

The run scripts do not take any command-line arguments. Instead, they are controlled entirely by environment variables which, by convention, begin with ARCUBE_. A couple of important common environment variables:

ARCUBE_RUNTIME: The container runtime to use when running the 2x2 sim. Current valid options are SHIFTER and SINGULARITY; the default option is SHIFTER.
ARCUBE_CONTAINER: Name of the container to use when running the 2x2 sim. The name is slightly different between use with Shifter (name of container on DockerHub) or Singularity (name of container .sif file).
ARCUBE_CONTAINER_DIR: Path/directory where the Singularity container is stored. Not used when using Shifter.
ARCUBE_DIR: The top-level (or root) location of the 2x2 sim directory (e.g. /path/to/2x2_sim). This is needed for Singularity to properly bind the directory to ensure it is mounted when using networked file systems.
ARCUBE_OUT_NAME: The name of the output directory. Output filenames will also be prefixed by $ARCUBE_OUT_NAME. By convention, we set ARCUBE_OUT_NAME to be the name of the "production", separated by a period from the abbreviated name of the step, e.g. MiniRun3.larnd
ARCUBE_INDEX: For a multiple-file "production", this is the ID of the file being produced. It is included as part of the output filename. For MiniRun3, the run-edep-sim ARCUBE_INDEX ran from 0 to 10239, but we then hadded those files in blocks of 10, so that ARCUBE_INDEX ran from 0 to 1023 for subsequent steps. For the purpose of this tutorial, we will use 0 to 10 and just 0, respectively.

Output files

Within each subdirectory, e.g., run-edep-sim, the output files will appear in e.g. run-edep-sim/output/$ARCUBE_OUT_NAME. Within this "subsubsubdirectory", you will find further "all-caps" directories that indicate the type of the file, e.g. GENIE or LARNDSIM. Finally, the files themselves have names that roughly look like e.g. ${ARCUBE_OUT_NAME}.${ARCUBE_INDEX}.LARNDSIM.h5

Running the chain

The examples below are more-or-less copy-pasted from e.g. run-edep-sim/tests/test_MiniRun3.nu.edep-sim.sh. If you're copy pasting from here, you'll want to first do

TWOBYTWO_SIM=/path/to/your/clone/of/2x2_sim

When all is said and done, you'll have a file with 200 spills.

edep-sim

"Fiducial" interactions

First we generate a set of 10 "nu" files:

cd $TWOBYTWO_SIM/run-edep-sim

export ARCUBE_CONTAINER='mjkramer/sim2x2:genie_edep.LFG_testing.20230228.v2'
export ARCUBE_CHERRYPICK='0'
export ARCUBE_DET_LOCATION='ProtoDUNE-ND'
export ARCUBE_DK2NU_DIR='/global/cfs/cdirs/dune/users/2x2EventGeneration/NuMI_dk2nu/newtarget-200kA_20220409'
export ARCUBE_EDEP_MAC='macros/2x2_beam.mac'
export ARCUBE_EXPOSURE='1E15'
export ARCUBE_GEOM='geometry/Merged2x2MINERvA_v2/Merged2x2MINERvA_v2_noRock.gdml'
export ARCUBE_GEOM_EDEP='geometry/Merged2x2MINERvA_v2/Merged2x2MINERvA_v2_withRock.gdml'
export ARCUBE_TUNE='D22_22a_02_11b'
export ARCUBE_XSEC_FILE='/global/cfs/cdirs/dune/users/2x2EventGeneration/inputs/NuMI/D22_22a_02_11b.all.LFG_testing.20230228.spline.xml'
export ARCUBE_OUT_NAME='test_MiniRun3.nu'

for i in $(seq 0 9); do
    ARCUBE_INDEX=$i ./run_edep_sim.sh &
done

wait

Rock interactions

Then we generate a set of 10 "rock" files:

cd $TWOBYTWO_SIM/run-edep-sim

export ARCUBE_CONTAINER='mjkramer/sim2x2:genie_edep.LFG_testing.20230228.v2'
export ARCUBE_CHERRYPICK='0'
export ARCUBE_DET_LOCATION='ProtoDUNE-ND-Rock'
export ARCUBE_DK2NU_DIR='/global/cfs/cdirs/dune/users/2x2EventGeneration/NuMI_dk2nu/newtarget-200kA_20220409'
export ARCUBE_EDEP_MAC='macros/2x2_beam.mac'
export ARCUBE_EXPOSURE='1E15'
export ARCUBE_GEOM='geometry/Merged2x2MINERvA_v2/Merged2x2MINERvA_v2_justRock.gdml'
export ARCUBE_GEOM_EDEP='geometry/Merged2x2MINERvA_v2/Merged2x2MINERvA_v2_withRock.gdml'
export ARCUBE_TUNE='D22_22a_02_11b'
export ARCUBE_XSEC_FILE='/global/cfs/cdirs/dune/users/2x2EventGeneration/inputs/NuMI/D22_22a_02_11b.all.LFG_testing.20230228.spline.xml'
export ARCUBE_OUT_NAME='test_MiniRun3.rock'

for i in $(seq 0 9); do
    ARCUBE_INDEX=$i ./run_edep_sim.sh &
done

wait

hadd

Now we hadd together the "nu" files:

cd $TWOBYTWO_SIM/run-hadd

export ARCUBE_CONTAINER='mjkramer/sim2x2:genie_edep.LFG_testing.20230228.v2'
export ARCUBE_HADD_FACTOR='10'
export ARCUBE_IN_NAME='test_MiniRun3.nu'
export ARCUBE_OUT_NAME='test_MiniRun3.nu.hadd'
export ARCUBE_INDEX='0'

./run_hadd.sh

And likewise for the "rock" files:

cd $TWOBYTWO_SIM/run-hadd

export ARCUBE_CONTAINER='mjkramer/sim2x2:genie_edep.LFG_testing.20230228.v2'
export ARCUBE_HADD_FACTOR='10'
export ARCUBE_IN_NAME='test_MiniRun3.rock'
export ARCUBE_OUT_NAME='test_MiniRun3.rock.hadd'
export ARCUBE_INDEX='0'

./run_hadd.sh

Spill building

cd $TWOBYTWO_SIM/run-spill-build

export ARCUBE_CONTAINER='mjkramer/sim2x2:genie_edep.LFG_testing.20230228.v2'
export ARCUBE_NU_NAME='test_MiniRun3.nu.hadd'
export ARCUBE_NU_POT='1E16'
export ARCUBE_ROCK_NAME='test_MiniRun3.rock.hadd'
export ARCUBE_ROCK_POT='1E16'
export ARCUBE_OUT_NAME='test_MiniRun3.spill'
export ARCUBE_INDEX='0'

./run_spill_build.sh

HDF5 conversion

cd $TWOBYTWO_SIM/run-convert2h5

export ARCUBE_CONTAINER='mjkramer/sim2x2:genie_edep.LFG_testing.20230228.v2'
export ARCUBE_SPILL_NAME='test_MiniRun3.spill'
export ARCUBE_OUT_NAME='test_MiniRun3.convert2h5'
export ARCUBE_INDEX='0'

./run_convert2h5.sh

larnd-sim

This step requires a GPU, ideally all to itself, to ensure that enough GPU memory is available. Each Perlmutter login node has one A100 GPU, which may or may not be hogged by someone else. You can check by running nvidia-smi. If the GPU is in use, you can try logging into other login nodes until you hit the jackpot, or you can request interactive access to a compute node by running

salloc -q interactive -C gpu -t 20

which will give you 20 minutes on a GPU node, which has four A100 GPUs. Whether on a login or compute node, you can run:

cd $TWOBYTWO_SIM/run-larnd-sim

export ARCUBE_CONVERT2H5_NAME='test_MiniRun3.convert2h5'
export ARCUBE_OUT_NAME='test_MiniRun3.larnd'
export ARCUBE_INDEX='0'

./run_larnd_sim.sh

ndlar_flow

cd $TWOBYTWO_SIM/run-ndlar-flow

export ARCUBE_IN_NAME='test_MiniRun3.larnd'
export ARCUBE_OUT_NAME='test_MiniRun3.flow'
export ARCUBE_INDEX='0'

./run_ndlar_flow.sh

validation plots

cd $TWOBYTWO_SIM/validation

export ARCUBE_EDEP_NAME='test_MiniRun3.convert2h5'
export ARCUBE_LARND_NAME='test_MiniRun3.larnd'
export ARCUBE_FLOW_NAME='test_MiniRun3.flow'
export ARCUBE_OUT_NAME='test_MiniRun3.plots'
export ARCUBE_INDEX='0'

./run_validation.sh

Other stuff

Geometry merging repository

For the overlaying of the MINERvA and 2x2 geometries, see https://github.com/lbl-neutrino/GeoMergeFor2x2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly