Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tRex implementation #1025

Open
wants to merge 91 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
a3c2cde
added rex files
zeniheisser Oct 11, 2023
89629da
Merge branch 'madgraph5:master' into rexDev
zeniheisser Jan 29, 2024
ce4e536
major changes to REX and teawREX, plus first base for template runfil…
zeniheisser Feb 27, 2024
9f798d6
Merge branch 'madgraph5:master' into rexDev
zeniheisser Mar 4, 2024
77ee370
small fixes to rwgt code
zeniheisser Mar 4, 2024
d3815b8
changed submodule to my fork
zeniheisser Mar 4, 2024
333bb5c
fixes to get rwgt exporter working
zeniheisser Mar 4, 2024
0050026
small modifications and added files, checking fbridge which currently…
zeniheisser Mar 6, 2024
43ef2e8
added proper makefiles for rwgt_runners and rwgt_driver
zeniheisser Mar 13, 2024
fb7a254
separated REX into header and implementation, fixed compilation of P-…
zeniheisser Apr 9, 2024
d041b35
added rex files
zeniheisser Oct 11, 2023
15a2a65
major changes to REX and teawREX, plus first base for template runfil…
zeniheisser Feb 27, 2024
d5933a8
small fixes to rwgt code
zeniheisser Mar 4, 2024
b5827c2
changed submodule to my fork
zeniheisser Mar 4, 2024
5433aac
fixes to get rwgt exporter working
zeniheisser Mar 4, 2024
e5a95d3
small modifications and added files, checking fbridge which currently…
zeniheisser Mar 6, 2024
f755c17
added proper makefiles for rwgt_runners and rwgt_driver
zeniheisser Mar 13, 2024
8c78800
separated REX into header and implementation, fixed compilation of P-…
zeniheisser Apr 9, 2024
ad46244
rebase
Jun 17, 2024
0e4c680
new makefiles and export routines for rwgt_runner/driver
Jul 31, 2024
77488e3
added generic rwgt_runner header, modified runners and drivers to put…
Jul 31, 2024
ff6b3e1
added final necessary functionality for a library based implementatio…
zeniheisser Aug 7, 2024
4b9b82e
fixed handling for amps with multiple parton sets, now treats them pr…
zeniheisser Aug 9, 2024
12ae2f8
lots of bugfixes with indexing and memory management
zeniheisser Sep 9, 2024
ad48bbc
updated submodule
zeniheisser Sep 9, 2024
6c00da5
update submodule to merged upstream version
zeniheisser Sep 9, 2024
506b2e8
reverted to earlier mg5 branch
Sep 10, 2024
52016c2
modified makefiles to support gpu compilation, made cuda default targ…
Sep 10, 2024
00a081c
changed default backend back to cppauto for testing
zeniheisser Sep 16, 2024
c2a9e85
merged upstream and tweaks to handle interfaces
zeniheisser Sep 16, 2024
06bd58f
fixed submodule link
zeniheisser Sep 16, 2024
a166e74
separated REX and teawREX compilations
zeniheisser Sep 17, 2024
4f68de8
removed legacy comments
zeniheisser Sep 17, 2024
55fdda9
removed legacy code and files, renamed some functions to make naming …
zeniheisser Sep 23, 2024
4c01292
[param] regenerate gg_tt.mad for reference (in the usual directory an…
valassi Sep 16, 2024
9a50131
[param] in CODEGEN/generateAndCompare.sh move the changes to Cards/id…
valassi Sep 16, 2024
ea796d0
[param] regenerate gg_tt.mad after anticipating the changes to ident_…
valassi Sep 16, 2024
2aeb8a3
[param] ** COMPLETE PARAM ** regenerate all processes: param_card.inc…
valassi Sep 16, 2024
9f7bc60
[amd] in gg_tt.mad and CODEGEN, fix cudacpp.mk to find the correct pa…
valassi Sep 16, 2024
b4f1689
[amd] regenerate all processes with fixes for libamdhip64 in cudacpp.mk
valassi Sep 16, 2024
212a9e0
[amd] in tput/allTees.sh clarify that -cpponly and -nocuda exist whil…
valassi Sep 17, 2024
1358fcb
[amd] in tput/allTees.sh, on second thought add back -hip, but make t…
valassi Sep 17, 2024
5ecc699
[amd] rerun 96 tput tests on LUMI - many issues at build time and at …
valassi Sep 17, 2024
abb441e
[amd] revert 96 tput logs on LUMI
valassi Sep 17, 2024
cf602a1
[amd] in tput/throughputX.sh expose FPE crash #1003 on HIP and improv…
valassi Sep 17, 2024
47a15ab
[amd] in gg_tt.mad cudacpp.mk, try to work around the HIP crashes #10…
valassi Sep 17, 2024
07e0754
[amd] in gg_tt.mad cudacpp.mk, revert the previous commit (1)
valassi Sep 17, 2024
bed013d
[amd] in gg_tt.mad cudacpp.mk, try to work around HIP crashes #1003 b…
valassi Sep 17, 2024
07845c1
[amd] in gg_tt.mad cudacpp.mk, revert the previous commit (2)
valassi Sep 17, 2024
14ac1d9
[amd] in gg_tt.mad EventStatistics.h, try to work around HIP crashes …
valassi Sep 17, 2024
35de4df
[amd] in gg_tt.mad EventStatistics.h, revert the previous commit (1)
valassi Sep 17, 2024
f111828
[amd] in gg_tt.mad EventStatistics.h, work around HIP crashes #1003 b…
valassi Sep 17, 2024
4244662
[amd] in gg_tt.mad EventStatistics.h, revert the previous commit (2)
valassi Sep 17, 2024
305c781
[amd] in gg_tt.mad and CODEGEN EventStatistics.h, work around FPE cra…
valassi Sep 17, 2024
b4a7b35
[amd] in gg_tt.mad and CODEGEN EventStatistics.h, fix clang formatting
valassi Sep 17, 2024
15df2e6
[amd] regenerate all processes with the fix for #1003
valassi Sep 17, 2024
9ccc0d7
[gcc14] in gg_tt.mad and CODEGEN mgOnGpuVectors.h, distinguish betwee…
valassi Sep 18, 2024
c0a3dc6
[gcc14] in gg_tt.mad and CODEGEN mgOnGpuCxtypes.h, clarify that cxtyp…
valassi Sep 18, 2024
c6c6234
[clang] in gg_tt.mad and CODEGEN EventStatistics.h, work around FPE c…
valassi Sep 18, 2024
55dcb6b
[clang] regenerate all processes with fixes for clang16 FPE #1005 and…
valassi Sep 18, 2024
a647b4b
[clang] rerun 102 tput tests on itscrd90 - all ok
valassi Sep 18, 2024
5439b7d
[clang] ** COMPLETE CLANG ** rerun 30 tmad tests on itscrd90 - all as…
valassi Sep 18, 2024
1b67e65
[amd] rerun 96 tput builds and tests on LUMI worker node (small-g 72h…
valassi Sep 18, 2024
a45dcb5
[amd] in gq_ttq.mad and CODEGEN cudacpp.mk add optional debug flags f…
valassi Sep 18, 2024
4416181
[amd] regenerate all processes (just with some comments in cudacpp.mk)
valassi Sep 18, 2024
3cc0280
[amd] rerun 30 tmad tests on LUMI against AMD GPUs - all as expected …
valassi Sep 19, 2024
c4164fe
[amd] ** COMPLETE AMD ** revert to itscrd90 logs for tput/tmad tests
valassi Sep 19, 2024
6eea889
removed superfluous makefile, added default backend that prioritises …
zeniheisser Sep 23, 2024
85b185f
Merge branch 'madgraph5:master' into rexDev
zeniheisser Sep 23, 2024
0ba830d
added official licensing terms (LGPL3.0) and removed some unused code
zeniheisser Sep 23, 2024
c48fc6c
fixed indexing issue when modifying several parameters in the same SL…
zeniheisser Sep 24, 2024
052b61b
Merge branch 'master' into tREX
zeniheisser Sep 27, 2024
23d504b
separated tREX output into a specific reweighting plugin mode
zeniheisser Sep 27, 2024
36b7139
added file storing all the functionality for tREX output
zeniheisser Sep 27, 2024
e442637
changed native mg branch from rexCPP to gpucpp
zeniheisser Sep 27, 2024
eaa2a3b
updated submodule to point to latest gpucpp
zeniheisser Sep 30, 2024
deb07ae
major restructuring of compilation, such that symbols shared across s…
zeniheisser Oct 2, 2024
910bfce
merge upstream
zeniheisser Nov 4, 2024
63bbbfc
fixed processes with multiple non-interfering cross sections
zeniheisser Nov 6, 2024
d9fefe6
uncommented debug statements but set default debug flags to false and…
zeniheisser Nov 6, 2024
3f08dc4
fixed trex import to work in the dev structure of having the plugin o…
zeniheisser Dec 5, 2024
bc74e37
fixed formatting to match project
zeniheisser Dec 5, 2024
89545ec
additional formatting issue
zeniheisser Dec 5, 2024
eda824f
added some safety checks to avoid out of bounds issues in generic xml…
zeniheisser Nov 8, 2024
f3063fc
moved from storing event-node information as string_view to native nu…
zeniheisser Nov 13, 2024
f6969fe
small safety check for running tRex through MG interface, as well as …
zeniheisser Nov 18, 2024
f12ed00
added generic lheSoA format
zeniheisser Jan 20, 2025
8785101
fix to slha block parsing --- was previously case sensitive for ME re…
zeniheisser Jan 23, 2025
1a04413
checking -Bsymbolic for nvcc
zeniheisser Jan 23, 2025
2af839d
fixed bug where trex did not reset processes at initialisation so wou…
zeniheisser Feb 4, 2025
68406a7
fixed bug where pointers were initialised to the same vector, changed…
zeniheisser Feb 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Copyright (C) 2020-2024 CERN and UCLouvain.
# Licensed under the GNU Lesser General Public License (version 3 or later).
# Created by: O. Mattelaer (Sep 2021) for the MG5aMC CUDACPP plugin.
# Further modified by: O. Mattelaer, A. Valassi (2021-2024) for the MG5aMC CUDACPP plugin.
# Further modified by: O. Mattelaer, A. Valassi, Z. Wettersten (2021-2024) for the MG5aMC CUDACPP plugin.

# AV - Rename the plugin as CUDACPP_OUTPUT (even if the madgraph4gpu directory is still called CUDACPP_SA_OUTPUT)
# This can be used in mg5amcnlo in one of two ways:
Expand Down Expand Up @@ -36,15 +36,19 @@
###import PLUGIN.CUDACPP_OUTPUT.output as output # AV modify this to also allow MG5aMC_PLUGIN
__import__('%s.output'%PLUGIN_NAME)
output = sys.modules['%s.output'%PLUGIN_NAME]
__import__('%s.trex'%PLUGIN_NAME)
trex = sys.modules['%s.trex'%PLUGIN_NAME]
new_output = { 'madevent_simd' : output.SIMD_ProcessExporter,
'madevent_gpu' : output.GPU_ProcessExporter,
'standalone_cudacpp' : output.PLUGIN_ProcessExporter,
'standalone_trex' : trex.TREX_ProcessExporter,
# the following one are used for the second exporter class
# (not really needed so far but interesting if need
# specialization in the futur)
'standalone_simd' : output.SIMD_ProcessExporter,
'standalone_cuda' : output.GPU_ProcessExporter,
}
new_reweight = {'trex': trex.TREX_ReweightInterface}

# 2. Define new way to handle the cluster.
# Example: new_cluster = {'mycluster': MYCLUSTERCLASS}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
// Copyright (C) 2020-2024 CERN and UCLouvain.
// Licensed under the GNU Lesser General Public License (version 3 or later).
// Created by: S. Roiser (Nov 2021) for the MG5aMC CUDACPP plugin.
// Further modified by: S. Roiser, J. Teig, A. Valassi (2021-2024) for the MG5aMC CUDACPP plugin.
// Further modified by: S. Roiser, J. Teig, A. Valassi, Z. Wettersten (2021-2024) for the MG5aMC CUDACPP plugin.

#ifndef BRIDGE_H
#define BRIDGE_H 1
Expand Down Expand Up @@ -255,18 +255,22 @@ namespace mg5amcCpu
throw std::logic_error( "Bridge constructor: FIXME! cannot choose gputhreads" ); // this should never happen!
m_gpublocks = m_nevt / m_gputhreads;
}
#ifdef MGONGPU_VERBOSE_BRIDGE
std::cout << "WARNING! Instantiate device Bridge (nevt=" << m_nevt << ", gpublocks=" << m_gpublocks << ", gputhreads=" << m_gputhreads
<< ", gpublocks*gputhreads=" << m_gpublocks * m_gputhreads << ")" << std::endl;
#endif
m_pmek.reset( new MatrixElementKernelDevice( m_devMomentaC, m_devGs, m_devRndHel, m_devRndCol, m_devChannelIds, m_devMEs, m_devSelHel, m_devSelCol, m_gpublocks, m_gputhreads ) );
#else
#ifdef MGONGPU_VERBOSE_BRIDGE
std::cout << "WARNING! Instantiate host Bridge (nevt=" << m_nevt << ")" << std::endl;
#endif
m_pmek.reset( new MatrixElementKernelHost( m_hstMomentaC, m_hstGs, m_hstRndHel, m_hstRndCol, m_hstChannelIds, m_hstMEs, m_hstSelHel, m_hstSelCol, m_nevt ) );
#endif // MGONGPUCPP_GPUIMPL
// Create a process object, read param card and set parameters
// FIXME: the process instance can happily go out of scope because it is only needed to read parameters?
// FIXME: the CPPProcess should really be a singleton? what if fbridgecreate is called from several Fortran threads?
CPPProcess process( /*verbose=*/false );
std::string paramCard = "../../Cards/param_card.dat";
std::string paramCard = "../Cards/param_card.dat"; // ZW: change default param_card.dat location to one dir down
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds dangerous...
Can you comment more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should not be an issue --- basically, our previous path assumed that the param_card was at least two subdirectories up, but since i'm running tRex from Subprocesses, it will only be one directory up. this would only cause an issue if there exists a file /Cards/param_card.dat IN Subprocesses, which i see no reason why anyone would add?

/*
#ifdef __HIPCC__
if( !std::experimental::filesystem::exists( paramCard ) ) paramCard = "../" + paramCard;
Expand All @@ -278,7 +282,12 @@ namespace mg5amcCpu
//if( !( stat( paramCard.c_str(), &dummyBuffer ) == 0 ) ) paramCard = "../" + paramCard; //
auto fileExists = []( std::string& fileName )
{ struct stat buffer; return stat( fileName.c_str(), &buffer ) == 0; };
if( !fileExists( paramCard ) ) paramCard = "../" + paramCard; // bypass std::filesystem #803
size_t paramCardCheck = 2; // ZW: check for paramCard up to 2 directories up
for( size_t k = 0; k < paramCardCheck; ++k )
{
if( fileExists( paramCard ) ) break; // bypass std::filesystem #803
paramCard = "../" + paramCard;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok This makes sense (I guess) But is this only for HIP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just a way to generalise our previous method, which checked if the param_card was two directories up and otherwise assumed it was 3 (such as if we have a build directory). this is general, i believe

process.initProc( paramCard );
}

Expand Down Expand Up @@ -347,7 +356,9 @@ namespace mg5amcCpu
if( goodHelOnly ) return;
m_pmek->computeMatrixElements( useChannelIds );
copyHostFromDevice( m_hstMEs, m_devMEs );
#ifdef MGONGPU_VERBOSE_BRIDGE
flagAbnormalMEs( m_hstMEs.data(), m_nevt );
#endif
copyHostFromDevice( m_hstSelHel, m_devSelHel );
copyHostFromDevice( m_hstSelCol, m_devSelCol );
if constexpr( std::is_same_v<FORTRANFPTYPE, fptype> )
Expand Down Expand Up @@ -400,7 +411,9 @@ namespace mg5amcCpu
}
if( goodHelOnly ) return;
m_pmek->computeMatrixElements( useChannelIds );
#ifdef MGONGPU_VERBOSE_BRIDGE
flagAbnormalMEs( m_hstMEs.data(), m_nevt );
#endif
if constexpr( std::is_same_v<FORTRANFPTYPE, fptype> )
{
memcpy( mes, m_hstMEs.data(), m_hstMEs.bytes() );
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
// Copyright (C) 2020-2024 CERN and UCLouvain.
// Licensed under the GNU Lesser General Public License (version 3 or later).
// Created by: J. Teig (Jun 2023, based on earlier work by S. Roiser) for the MG5aMC CUDACPP plugin.
// Further modified by: O. Mattelaer, S. Roiser, J. Teig, A. Valassi (2020-2024) for the MG5aMC CUDACPP plugin.
// Further modified by: O. Mattelaer, S. Roiser, J. Teig, A. Valassi, Z. Wettersten (2020-2024) for the MG5aMC CUDACPP plugin.

#ifndef MG5AMC_GPURUNTIME_H
#define MG5AMC_GPURUNTIME_H 1
Expand Down Expand Up @@ -38,8 +38,11 @@ namespace mg5amcGpu
// *** FIXME! This will all need to be designed differently when going to multi-GPU nodes! ***
struct GpuRuntime final
{
GpuRuntime( const bool debug = true )
: m_debug( debug ) { setUp( m_debug ); }
GpuRuntime( const bool debug = false ) // ZW: default debug to false
: m_debug( debug )
{
setUp( m_debug );
}
~GpuRuntime() { tearDown( m_debug ); }
GpuRuntime( const GpuRuntime& ) = delete;
GpuRuntime( GpuRuntime&& ) = delete;
Expand All @@ -50,7 +53,7 @@ namespace mg5amcGpu
// Set up CUDA application
// ** NB: strictly speaking this is not needed when using the CUDA runtime API **
// Calling cudaSetDevice on startup is useful to properly book-keep the time spent in CUDA initialization
static void setUp( const bool debug = true )
static void setUp( const bool debug = false ) // ZW: default debug to false
{
// ** NB: it is useful to call cudaSetDevice, or cudaFree, to properly book-keep the time spent in CUDA initialization
// ** NB: otherwise, the first CUDA operation (eg a cudaMemcpyToSymbol in CPPProcess ctor) appears to take much longer!
Expand All @@ -71,7 +74,7 @@ namespace mg5amcGpu
// ** NB: strictly speaking this is not needed when using the CUDA runtime API **
// Calling cudaDeviceReset on shutdown is only needed for checking memory leaks in cuda-memcheck
// See https://docs.nvidia.com/cuda/cuda-memcheck/index.html#leak-checking
static void tearDown( const bool debug = true )
static void tearDown( const bool debug = false ) // ZW: default debug to false
{
if( debug ) std::cout << "__GpuRuntime: calling GpuDeviceReset()" << std::endl;
checkGpu( gpuDeviceReset() );
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
// Copyright (C) 2020-2024 CERN and UCLouvain.
// Licensed under the GNU Lesser General Public License (version 3 or later).
// Created by: A. Valassi (Jan 2022) for the MG5aMC CUDACPP plugin.
// Further modified by: J. Teig, A. Valassi (2022-2024) for the MG5aMC CUDACPP plugin.
// Further modified by: J. Teig, A. Valassi, Z. Wettersten (2022-2024) for the MG5aMC CUDACPP plugin.

#include "MatrixElementKernels.h"

Expand Down Expand Up @@ -60,7 +60,9 @@ namespace mg5amcCpu
#ifdef MGONGPU_CHANNELID_DEBUG
MatrixElementKernelBase::dumpNevtProcessedByChannel();
#endif
#ifdef MGONGPU_VERBOSE_FPES
MatrixElementKernelBase::dumpSignallingFPEs();
#endif
}

//--------------------------------------------------------------------------
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ namespace mg5amcCpu

// Does this host system support the SIMD used in the matrix element calculation?
// [NB: this is private, SIMD vectorization in mg5amc C++ code is currently only used in the ME calculations below MatrixElementKernelHost!]
static bool hostSupportsSIMD( const bool verbose = true );
static bool hostSupportsSIMD( const bool verbose = false ); // ZW: set verbose to false by default

private:

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Copyright (C) 2020-2024 CERN and UCLouvain.
# Licensed under the GNU Lesser General Public License (version 3 or later).
# Created by: A. Valassi (Mar 2024) for the MG5aMC CUDACPP plugin.
# Further modified by: A. Valassi (2024) for the MG5aMC CUDACPP plugin.
# Further modified by: A. Valassi, Z. Wettersten (2024) for the MG5aMC CUDACPP plugin.

#-------------------------------------------------------------------------------

Expand All @@ -10,7 +10,21 @@

# Set the default BACKEND (CUDA, HIP or C++/SIMD) choice
ifeq ($(BACKEND),)
override BACKEND = cppauto
override BACKEND = gpucpp
endif

# ZW: gpucpp backend checks if there is a GPU backend available before going to SIMD
# prioritises CUDA over HIP
ifeq ($(BACKEND),gpucpp)
ifeq ($(shell which nvcc 2>/dev/null),)
ifeq ($(shell which hipcc 2>/dev/null),)
override BACKEND = cppauto
else
override BACKEND = hip
endif
else
override BACKEND = cuda
endif
endif

# Set the default FPTYPE (floating point type) choice
Expand Down
Loading
Loading