-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
processes for the paper #344
Comments
Hi Olivier, thanks. I would suggest adding also eemumu - first because it is quite different (much lighter in computations) and may have some interesting numbers, and second because this is what we had publiched in the CHEP proceedings, so it can be interesting to compare tp those numbers. What do you think? |
looks good to me. |
I am documenting a few specificities of ggttggg in #346. Feel free to add more observations please! |
I have just merged PR #345. This contains a couple of useful things for the paper, following Olivier's suggestions
The summary of all results for the five processes I look at (eemumu, ggtt+0,1,2,3g) is here: I tried several combinations
There are quite a few differences, still to be understood/tweaked, between the two compilers and the two inlining options, but I would consider the baseline for our comparison. Note that I give one CUDA number, and several SIMDs for C++. The nice thing is that a factor 4 between no SIMD and 512y SIMD for double (and a factor 8 for float) seems always there, also for ggttggg. The baseline of the baseline is thw single CUDA result, and the single 512y/C++ result. As the complexity increases and the tests take longer, I reduce the number of events (or gpublocks/threads) for ggttggg. For C++, even a few events would be enough to reach plateau performance, but I always try to run a reasonable number for CUDA too. I always run in CUDA the same number of events as in C++, to compare the ME value. But in CUDA, for ggttgg and ggttggg I do a second run with more gpublocks/threads, to reach the plateau. Typically for V100 this is 64 blocks and 256 threads as bare minimum (below the performance always drops by factors). The detailed configs are here
(look at exeArgs2 if it exists, else at exeArgs, for the CUDA blocks/threads). Voila that's my full performance numbers as of today. They will still evolve (especially with split kernels etc). I will also look at the two processes that Olivier suggested as a prrof of concept of generation. (One final word of caution, I think I have some small functional bugs in the calculations, I will look at them. Related, or maybe independent, the different compilers start giving quite different results on ggttggg... maybe it's just the order of adding the 1000 diagrams...) |
Just to put it in full, as of today:
|
Hi @oliviermattelaer about the EFT Higgs in #351, in the end I have a physics question! I had two build problems
So my questions:
For the moment I will assume that I can simply remove both the helicity and mass arguments, and modify sxxxx an dthe rest accordingly, then submit a PR to review. But let me know please! |
Yes this can be technically removed. Keeping it might be easier for the code generation but it's just an if statement.
Yes this is correct.
Yes this is correct for this computation. |
Hi Olivier, thanks! Ok so I will remove those two arguments. |
Hi @oliviermattelaer again, next physics question! In #358. |
Hi,
Yes indeed 2>1 process are special for the integration. This is not the main point of this check but rather to check the case with scalar and with non SM model
Cheers,
Olivier
On 26 Jan 2022, at 16:22, Andrea Valassi ***@***.******@***.***>> wrote:
Hi @oliviermattelaer<https://github.com/oliviermattelaer> again, next physics question! In #358<#358>.
I get a build warning from rambo, which makes me think that maybe a 2->1 process like gg>h is not a good example for this exercise (do we need phase space sampling at all)? Are we not repeating always the same ME calculation with the same momenta, indepndently of random numbers?
For the moment I would just ignore the warning anyway... let me know if you have other suggestions. (Anyway this was very useful to find other issues in the code!).
Thanks
Andrea
—
Reply to this email directly, view it on GitHub<#344 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AH6535UXMQ5Z5FUWPGGNBKLUYAGRVANCNFSM5MWLLQPA>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Hi Olivier, ok very good, then I will just keep the warning in the code and check that the ME generation works (indeed it does now). Thanks! |
I am not sure this issue is the best placed, but since it is open I will add these comments here. I just want to give an overview of the processes we already have and the ones we should be adding, and why. Currently we have these 7 SA and 6 MAD processes for cudacpp
What I would like to add includes
Much lower priority, but eventually relevant for performance tests (runtime AND build speed!):\
Comments welcome... cc @oliviermattelaer @roiser @zeniheisser @hageboeck @whhopkins @jtchilders @nscottnichols |
I would suggest the following processes for the paper:
In term of processes to use to check that the code can handle most of the cases
The text was updated successfully, but these errors were encountered: