Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda time profiles for DY+3j have high non-ME component #994

Open
valassi opened this issue Sep 11, 2024 · 2 comments
Open

Cuda time profiles for DY+3j have high non-ME component #994

valassi opened this issue Sep 11, 2024 · 2 comments
Assignees

Comments

@valassi
Copy link
Member

valassi commented Sep 11, 2024

Yesterday I ran some very first tests of cuda DY+3j with (OLD) timers in PR #948.

The cuda profiles are clearly weird

  • there is a high non-ME component (here stil called 'fortran overhead', these are olf timers)
  • there is a high outside-madevent ('python/bash'? time spent deleting the applications??) component

This is for 500 events

[avalassi@itscrd90 bash] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tlau/fromgridpacks> more pp_dy3j.mad/summary.txt 
pp_dy3j.mad/fortran/output.txt
[GridPackCmd.launch] OVERALL TOTAL    1945.6279 seconds
[madevent COUNTERS]  PROGRAM TOTAL    1910.3
[madevent COUNTERS]  Fortran Overhead 665.412
[madevent COUNTERS]  Fortran MEs      1244.89
--------------------------------------------------------------------------------
pp_dy3j.mad/cppnone/output.txt
[GridPackCmd.launch] OVERALL TOTAL    1920.0969 seconds
[madevent COUNTERS]  PROGRAM TOTAL    1896.82
[madevent COUNTERS]  Fortran Overhead 668.916
[madevent COUNTERS]  CudaCpp MEs      1223.65
[madevent COUNTERS]  CudaCpp HEL      4.2527
--------------------------------------------------------------------------------
pp_dy3j.mad/cppsse4/output.txt
[GridPackCmd.launch] OVERALL TOTAL    1336.0181 seconds
[madevent COUNTERS]  PROGRAM TOTAL    1313.34
[madevent COUNTERS]  Fortran Overhead 668.988
[madevent COUNTERS]  CudaCpp MEs      642.063
[madevent COUNTERS]  CudaCpp HEL      2.2873
--------------------------------------------------------------------------------
pp_dy3j.mad/cppavx2/output.txt
[GridPackCmd.launch] OVERALL TOTAL    960.2111 seconds
[madevent COUNTERS]  PROGRAM TOTAL    937.127
[madevent COUNTERS]  Fortran Overhead 667.996
[madevent COUNTERS]  CudaCpp MEs      267.903
[madevent COUNTERS]  CudaCpp HEL      1.2269
--------------------------------------------------------------------------------
pp_dy3j.mad/cpp512y/output.txt
[GridPackCmd.launch] OVERALL TOTAL    940.0347 seconds
[madevent COUNTERS]  PROGRAM TOTAL    917.336
[madevent COUNTERS]  Fortran Overhead 668.996
[madevent COUNTERS]  CudaCpp MEs      247.179
[madevent COUNTERS]  CudaCpp HEL      1.1605
--------------------------------------------------------------------------------
pp_dy3j.mad/cpp512z/output.txt
[GridPackCmd.launch] OVERALL TOTAL    1022.0703 seconds
[madevent COUNTERS]  PROGRAM TOTAL    997.125
[madevent COUNTERS]  Fortran Overhead 669.147
[madevent COUNTERS]  CudaCpp MEs      326.476
[madevent COUNTERS]  CudaCpp HEL      1.503
--------------------------------------------------------------------------------
pp_dy3j.mad/cuda/output.txt
[GridPackCmd.launch] OVERALL TOTAL    969.4855 seconds
[madevent COUNTERS]  PROGRAM TOTAL    853.823
[madevent COUNTERS]  Fortran Overhead 826.381
[madevent COUNTERS]  CudaCpp MEs      7.865
[madevent COUNTERS]  CudaCpp HEL      19.578
--------------------------------------------------------------------------------
@valassi valassi self-assigned this Sep 11, 2024
@valassi
Copy link
Member Author

valassi commented Sep 11, 2024

  • there is a high non-ME component (here stil called 'fortran overhead', these are olf timers)

specifically, fortran and cpp have 668s, cuda has 826

  • there is a high outside-madevent ('python/bash'? time spent deleting the applications??) component

specifically, fortran has 1945-1910 i.e 35s, cuda has 969-853 i.e. 116s

valassi added a commit to valassi/madgraph4gpu that referenced this issue Sep 11, 2024
… events

Note the large overhead in cuda results madgraph5#994

./parseGridpackLogs.sh pp_dy3j.mad | tee pp_dy3j.mad/summary.txt
valassi added a commit to valassi/madgraph4gpu that referenced this issue Sep 14, 2024
Note that there is still a large overhead in cuda results madgraph5#994, but on dy+4j this is background noise...

./parseGridpackLogs.sh  pp_dy4j.mad | tee pp_dy4j.mad/summary.txt
@valassi valassi changed the title Cuda time profiles for DY+3j have high non-ME component and high 'python/bash' component Cuda time profiles for DY+3j have high non-ME component Sep 16, 2024
@valassi
Copy link
Member Author

valassi commented Sep 16, 2024

I have stripped off the python/bash component to #1000 (for cuda but not only!). Instead here I keep only the non-ME madevent component (in cuda).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant