Vectorise running coupling scale updates (port update_scale_coupling_vec to cudacpp with SIMD/GPU)?? Maybe not #964

valassi · 2024-08-13T04:55:46Z

After introducing more detailed counters #962, it is now clear that the running of the coupling scale is a moderate scalar bottleneck in some processes.

One example is ggttggg, where this takes 20% of the ME calculation.
https://github.com/valassi/madgraph4gpu/blob/2169f6286a3f43c295c913118909a5e75c38cda8/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt#L676


*** (3-cuda) EXECUTE MADEVENT_CUDA x10 (create events.lhe) ***
--------------------
CUDACPP_RUNTIME_FBRIDGEMODE = (not set)
CUDACPP_RUNTIME_VECSIZEUSED = 8192
--------------------
81920 1 1 ! Number of events and max and min iterations
0.000001 ! Accuracy (ignored because max iterations = min iterations)
0 ! Grid Adjustment 0=none, 2=adjust (NB if = 0, ftn26 will still be used if present)
1 ! Suppress Amplitude 1=yes (i.e. use MadEvent single-diagram enhancement)
0 ! Helicity Sum/event 0=exact
1 ! ICONFIG number (1-N) for single-diagram enhancement multi-channel (NB used even if suppress amplitude is 0!)
--------------------
Executing ' ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggttggg_x10_cudacpp > /tmp/avalassi/output_ggttggg_x10_cudacpp'
 [OPENMPTH] omp_get_max_threads/nproc = 1/4
 [NGOODHEL] ngoodhel/ncomb = 128/128
 [XSECTION] VECSIZE_USED = 8192
 [XSECTION] MultiChannel = TRUE
 [XSECTION] Configuration = 1
 [XSECTION] ChannelId = 1
 [XSECTION] Cross section = 2.332e-07 [2.3322993086656006E-007] fbridge_mode=1
 [UNWEIGHT] Wrote 303 events (found 1531 events)
 [COUNTERS] PROGRAM TOTAL                         :   17.9617s
 [COUNTERS] Fortran Other                  (  0 ) :    0.1382s
 [COUNTERS] Fortran Initialise(I/O)        (  1 ) :    0.0704s
 [COUNTERS] Fortran Random2Momenta         (  3 ) :    1.1767s for   467913 events => throughput is 2.51E-06 events/s
 [COUNTERS] Fortran PDFs                   (  4 ) :    0.5383s for   180224 events => throughput is 2.99E-06 events/s
 [COUNTERS] Fortran UpdateScaleCouplings   (  5 ) :    1.9975s for    90112 events => throughput is 2.22E-05 events/s
 [COUNTERS] Fortran Reweight               (  6 ) :    0.2803s for    90112 events => throughput is 3.11E-06 events/s
 [COUNTERS] Fortran Unweight(LHE-I/O)      (  7 ) :    0.1079s for    90112 events => throughput is 1.20E-06 events/s
 [COUNTERS] Fortran SamplePutPoint         (  8 ) :    0.1654s for   467913 events => throughput is 3.53E-07 events/s
 [COUNTERS] CudaCpp Initialise             ( 11 ) :    1.5325s
 [COUNTERS] CudaCpp Finalise               ( 12 ) :    0.0322s
 [COUNTERS] CudaCpp MEs                    ( 19 ) :   11.9224s for    90112 events => throughput is 1.32E-04 events/s
 [COUNTERS] OVERALL NON-MEs                ( 21 ) :    6.0393s
 [COUNTERS] OVERALL MEs                    ( 22 ) :   11.9224s for    90112 events => throughput is 1.32E-04 events/s

Unlike the porting of phase space sampling #963, however, the case for doing this and the possibility to do it successfully are much less obvious

20% of MEs is not that much, ggttggg is still limited by MEs (and simpler ggttgg does not have a scale bottleneck)
especially, the relevant fortran functions, especially setclscales, look very difficult to port to data parallelism, they are full of if/then/else branches, which is likely to prevent lockstep processing

So I put this here, but I am not convinced that this makes much sense at this stage

The text was updated successfully, but these errors were encountered:

valassi mentioned this issue Aug 13, 2024

Vectorise phase space sampling (port x_to_f_arg to cudacpp with SIMD and GPU support - starting with sample_get_x?) #963

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorise running coupling scale updates (port update_scale_coupling_vec to cudacpp with SIMD/GPU)?? Maybe not #964

Vectorise running coupling scale updates (port update_scale_coupling_vec to cudacpp with SIMD/GPU)?? Maybe not #964

valassi commented Aug 13, 2024

Vectorise running coupling scale updates (port update_scale_coupling_vec to cudacpp with SIMD/GPU)?? Maybe not #964

Vectorise running coupling scale updates (port update_scale_coupling_vec to cudacpp with SIMD/GPU)?? Maybe not #964

Comments

valassi commented Aug 13, 2024