Skip to content

Production changes and validation findings

YifanC edited this page Jun 11, 2023 · 55 revisions

MiniRun3

Planned changes

  • DONE [Andrew] include truth information from Andrew
  • DONE [Matt] update event ID scheme as discussed in calibration channel -- flux file, nu or rock, random seed, event ID from edep-sim
    • Convert "event-local" trackIDs to file-local IDs; add separate event_local_trackID to enable backtracking
    • How to incorporate triggers / markers / spill IDs? (longer term question)
    • Clarify: edep-sim "events" vs "built" events
  • DONE [Matt] Move spill period timing upstream to edep-sim spill ROOT files
  • DONE [Kevin] [Yifan has a "bug" fix request for the true information edit in larnd-sim here https://github.com/DUNE/larnd-sim/blob/master/cli/simulate_pixels.py#L254-L256
    • this was done because the rest of the simulation expects the event to start at t0=0 and then an event time is assigned separately at the end
  • DONE [Matt] ndlar_flow step
  • DONE [Matt] validation step
  • DONE [Yifan] update the light configuration to what Module 1-3 used
    • Livio said in 2x2 all the LRS will be run with 62.5MHz, (16bit or 14bit to be confirmed) not 100MHz as currently configured by default in larnd-sim. The length of the readout window is undecided, but will be the same for all the modules. They intend to keep the readout window long so to capture delayed decay Michel or so, but the constraint would be data size. The longest limit for LRS readout window is 16.384 us.
    • point install_larnd_sim.sh to version with new light config
  • DONE [Yifan] (check on) the new LUT
    • point simulate_pixels.py to new LUT
  • DONE [Matt] Larger files
    • run-hadd step before spill building
  • [Matt] FHC
  • DONE [Matt] edep-sim: Don't save events with empty SegmentDetectors
    • pi0s? Now treated like gammas and neutrons in MarkTrajectories
    • new container: docker:mjkramer/sim2x2:genie_edep.LFG_testing.20230228.v2
  • DONE-ISH [Kevin] larnd-sim: Allow fixed random seed (for reproducibility)
    • --seed added; no longer always does SEED = int(time())
    • Verify that this actually makes larnd-sim reproducible
  • DONE [Matt] Add event_times to h5 file
    • Added as t_vert and t_spill in vertices dataset
  • DONE [Livio/Yifan] Update gains and thresholds for light system (based on Module 3 data)
    • Verify that it fixes the issues that Angela found (TODO?)
  • DONE [Matt] Rename eventID to vertexID in the HDF5 files
  • Rerun ndlar_flow on MiniRun3 as soon as improvements are implemented; don't wait for MiniRun4

File format differences

  • Each spill is now 5E13 POT instead of 6.5E13
  • edep-sim singles files (nu and rock) only contain events that leave hits in sensitive detectors
  • edep-sim singles files contain sequential event IDs
    • Rewritten from the original GENIE event IDs so we don't have "gaps" from all of the (rock) events that leave no hits
    • Pass-thru GENIE tree contains consistently rewritten EvtNum, equal # of entries to EDepSimEvents
  • RunId is now consecutive and unique (within the nu and rock chains), encodes flux file ID and random seed
    • RunId 12345 exists for both the rock and nu chain; individual (pre-hadd) singles file uniquely ID'd by e.g. (12345, "rock")
  • edep-sim spill files no longer map 1-to-1 to (rock, nu) pairs of singles files
    • Now we hadd singles files together before spill building. Previously: 10,240 spill files. Now: 1,024.
  • edep-sim spill files (in ROOT format) contain rewritten RunIds ("TaggedRunIds") to distinguish nu from rock
    • TaggedRunId = RunId + (1E9 if rock else 0); still stored in the RunId field of TG4Event
  • edep-sim spill ROOT files now contain "absolute" timing, with spills separated by 1.2s
    • Previously clock was reset to zero for each spill; 1.2s separation was applied in HDF5 converter
  • Spill IDs are now globally unique
    • SpillID = 1E3 * SpillFileID + SpillIndexWithinFile
    • Roughly speaking, SpillFileID is, e.g., RunId / 10, assuming we hadd 10 singles files at a time before spillbuilding
    • In other words, SpillFileID runs from 0 to NumberOfSpillFiles-1
    • SpillIndexWithinFile runs from 0 to NumberOfSpillsInFile-1
  • In the spill ROOT files, event_spill_map now uses string(RunId) + " " + string(EventId) as the key, instead of the previous string(EventId)
  • In the HDF5 files, eventID is now called vertexID, to avoid confusion with the events from the (flow) event builder
  • The vertexID is 64 bits, and is calculated as 1E6 * TaggedRunId + EdepSimEventId
  • HDF5 trackIDs are renumbered (from the original edep-sim TrackIds) to be unique within each file
    • Start from 0, increase monotonically within the file (and hence within each spill)
    • All references (e.g. to parent track ID; from hit segments) updated accordingly
    • Original edep-sim track ID stored as local_trackID
  • edep-sim HDF5 files and larnd-sim files now contain GENIE pass-thru
  • The larnd-sim tracks (i.e. hit segments) dataset now uses z as the beam coordinate, for consistency with the other truth datasets. To help identify whether a file follows this convention or the old (z = drift) convention, the attribute zbeam is set to True on the tracks dataset.

See also p7-10 of these slides as well as this brainstorming writeup.

Reflowing (MiniRun3)

  • First round (~Apr 21)?
    • Fuzzy hit merging (single pixel)
    • Improved (edep-sim) truth backtracking (e.g. calib_prompt_hits)
    • GENIE summary
  • Second round (~Apr 30)?
    • Fixed refs from calib_final_hits to calib_prompt_hits
  • Longer term?
    • GENIE particle stack
    • Light reco (hits dataset)
    • Fuzzy hit merging (neighboring pixels)
  • Longer longer term
    • Associate light triggers to charge clusters

Variants of MiniRun3

  • Different thresholds (single higher-threshold sample)
    • Module-0 had low/high threshold runs -- use those numbers?

MiniRun4

Simulation improvements

Full 10 weeks up to larnd-sim? Depending on validation findings.

Checkpoints :

  • GENIE:
    • Change to DUNE base model 3.4
  • Geometry in edep-sim:
    • New cavern geometry for rock simulation
    • Optimized beam window
  • larnd-sim:
    • Global beam trigger
    • Robust truth backtracking
    • Map the light readout to physical channels
    • Revisit larpix rollover logic
  • ndlar-flow:
    • Event building with external trigger logic
    • Light product
  • Unify variable naming style
  • Check the neutrino event rate per spill in 2x2

Good to have :

  • Geometry in edep-sim:

    • Complete geometry
  • larnd-sim:

    • Improved light noise modeling
    • Realistic beam trigger
    • Enable different configurations for different modules
    • Revisit light truth backtracking
    • Fix the missing hits for segments
  • ndlar-flow:

    • Drop the redundant datasets in flow
    • Build more direct references
      • Prompt and final hits -> segments with fraction
    • Light reconstruction
    • Reco (light/charge hits) to true (events)
  • GENIE / edep-sim / larnd-sim reproducibility

  • The activity 50 ms after a spill?

  • MAYBE expand the TG4Event EventId to 64 bits, giving us consistency with the HDF5 "eventID", and freeing up the RunId for other purposes. Noe has suggested that it shouldn't be too much trouble to adapt the MINERvA chain.

Possible variable name/structure improvements

Thoughts on variable naming/structure for the output files to improve clarity. Planned to be implemented in MiniRun4 as this will be a large breaking change for analyzers and will need a coordinated pull request across repositories.

  • Keep any backward compatibility (e.g. through configurable dataset names)?
  • Condensing reaction flags to single reaction enum/number
    • Still keep flags?
    • How to enumerate the reactions (e.g. how fine grained)?
  • Rename tracks dataset to segments (landsim/ndflow output)
  • Rename trackID to trajID (all output files)
  • Standarize ID variables to be varID (all output files)
  • Rename genie_hdr to mc_inter or mc_interactions or mc_hdr or something (convert/larndsim output)
  • Rename genie_stack to mc_stack (convert/larndsim output)
  • Remove charge/packets from ndflow output
  • Remove charge/raw_events from ndflow output

Future runs?

  • Further optimize rock muons with GeomVolSelectorRockBox?