Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current version of spill building for nersc production #59

Closed
wants to merge 1 commit into from

Conversation

jdkio
Copy link
Contributor

@jdkio jdkio commented Jan 26, 2024

This code is not ready

Towards the end of the PR, I'll talk about current issues.

First, this PR builds spills from multiple TG4 events using the spill time. overlaySinglesIntoSpillsSorted.C takes multiple streams and combines them into a single file. The time of the hits is set according to the flux spill structure. The events are inserted into the output sequentially but as separate TG4Events

But dune-tms is setup to use edep sim's overlay code. Its code combines all the hits into a single TG4Event so we can process it as normal. But overlaySinglesIntoSpillsSorted end up as separate events ordered in time without this PR.

In theory, this PR fixes the issue by combining events using the TMS_Event.AddEvent functionality. It's a draft because the code is messy and broken, but I'm low on time. I was trying to fix two issues.

  1. The 1.2e9 spill offset causes issues with floats. 100 + 1e9 = 1e9 when using a float. The output branches are floats and so a lot of precision is lost. Switching to doubles causes some of the crashes below. The causes aren't clear and I'll talk about them more at the end.
    One solution to the float issue was to subtract the spill offset from the hit time. First I tried to subtract the time from the hit by passing the spill time into the hit. This actually worked but was awkward because you're passing that through the hit and true hit.
    The second solution was to undo the overlaySinglesIntoSpillsSorted code before creating the TMS_Event. This seemed to work but then started crashing again. That's what's currently in the code and for some reason it's crashing, see below.

  2. The second issue is the crashes described below. I'm not sure what's causing them and I've run out of time to try to fix it. I'm not sure if this is caused by adding events or what

About the crashes:

I get all sorts of crashes. Sometimes doing a full make clean helps. Most of the time it's intermittent, and literally running the code multiple times leads to different crashes, and then eventually it works. Sometimes you change something in the code, and it will just crash forever for no reason. The only thing to do is to change it back.

The current crash is:

#15 std::_Destroy<TMS_TrueParticle*> (__last=<optimized out>, __first=<optimized out>) at /cvmfs/larsoft.opensciencegrid.org/products/gcc/v9_3_0/Linux64bit+3.10-2.17/include/c++/9.3.0/bits/stl_construct.h:137
#16 std::_Destroy<TMS_TrueParticle*, TMS_TrueParticle> (__last=0xff19d40, __first=<optimized out>) at /cvmfs/larsoft.opensciencegrid.org/products/gcc/v9_3_0/Linux64bit+3.10-2.17/include/c++/9.3.0/bits/stl_construct.h:206
#17 std::vector<TMS_TrueParticle, std::allocator<TMS_TrueParticle> >::~vector (this=0x7ffdddd15d40, __in_chrg=<optimized out>) at /cvmfs/larsoft.opensciencegrid.org/products/gcc/v9_3_0/Linux64bit+3.10-2.17/include/c++/9.3.0/bits/stl_vector.h:677
#18 TMS_Event::~TMS_Event (this=0x7ffdddd15d20, __in_chrg=<optimized out>) at ../src/TMS_Event.h:19
#19 0x0000000000409592 in ConvertToTMSTree (filename=..., output_filename=...) at ConvertToTMSTree.cpp:171

But here are additional other examples:

*** Error in `ConvertToTMSTree.exe': corrupted size vs. prev_size: 0x000000000e13f6f0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7f474)[0x7ff84aa9b474]
/lib64/libc.so.6(+0x8156b)[0x7ff84aa9d56b]
/cvmfs/larsoft.opensciencegrid.org/products/root/v6_22_08d/Linux64bit+3.10-2.17-e20-p392-prof/lib/libRIO.so(_ZN14TFileCacheReadD1Ev+0x147)[0x7ff84edbb547]
/cvmfs/larsoft.opensciencegrid.org/products/root/v6_22_08d/Linux64bit+3.10-2.17-e20-p392-prof/lib/libTree.so(_ZN10TTreeCacheD0Ev+0x12)[0x7ff84d5430b2]
/cvmfs/larsoft.opensciencegrid.org/products/root/v6_22_08d/Linux64bit+3.10-2.17-e20-p392-prof/lib/libTree.so(_ZN5TTreeD2Ev+0x145)[0x7ff84d562245]
/cvmfs/larsoft.opensciencegrid.org/products/root/v6_22_08d/Linux64bit+3.10-2.17-e20-p392-prof/lib/libTree.so(_ZN5TTreeD0Ev+0x12)[0x7ff84d562902]
ConvertToTMSTree.exe(_Z16ConvertToTMSTreeNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES4_+0x36c9)[0x40b2a9]
ConvertToTMSTree.exe(main+0xc6)[0x407946]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7ff84aa3e555]
ConvertToTMSTree.exe[0x407a8d]
*** Error in `ConvertToTMSTree.exe': munmap_chunk(): invalid pointer: 0x000000000e2ac270 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7f474)[0x7f85d6c59474]
ConvertToTMSTree.exe(_ZN9TMS_EventD1Ev+0x3a2)[0x40dc52]
ConvertToTMSTree.exe(_Z16ConvertToTMSTreeNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES4_+0x192a)[0x40950a]
ConvertToTMSTree.exe(main+0xc6)[0x407946]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f85d6bfc555]
ConvertToTMSTree.exe[0x407a8d]

@jdkio jdkio requested a review from LiamOS January 26, 2024 04:32
@jdkio jdkio changed the title Commit spill building for production pileup. Has all sorts of crashes… Current version of spill building for nersc production Jan 26, 2024
@LiamOS
Copy link
Member

LiamOS commented Jan 29, 2024

Have merged it onto liam_dev to play around with. If possible could you post the files/commands/env you were running with to get this crash?

In theory the float issue is solvable with some trickery, through renormalising things, or adding some extra bits in a clever way. We can discuss what exactly is needed sometime that suits you, whether that's soon or when you're back.

@jdkio
Copy link
Contributor Author

jdkio commented Jan 29, 2024

Hopefully you can fix this before I get back in 2 months. I think we're going to want to run with a proper pileup simulation soon. Sorry for leaving it incomplete

ConvertToTMSTree.exe /dune/data/users/abooth/Postdoc/Production/MiniProdN1p2-v1r1/run-spill-build/output/MiniProdN1p2_NDLAr_1E19_RHC.spill/EDEPSIM_SPILLS/MiniProdN1p2_NDLAr_1E19_RHC.spill.00001.EDEPSIM_SPILLS.root

You may want to ditch all this code and do it another way if it keeps giving you trouble. One idea would be to run a spill building step before dune-tms which loads the edep sim file and adds the events together into a single event

@jdkio jdkio closed this Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants