Skip to content

Commit

Permalink
Merge pull request #701 from yandthj/timkupdates
Browse files Browse the repository at this point in the history
Tim K Updates
  • Loading branch information
yandthj authored Nov 11, 2024
2 parents a171fe0 + 2842e90 commit dfdf348
Show file tree
Hide file tree
Showing 10 changed files with 1,002 additions and 684 deletions.
59 changes: 55 additions & 4 deletions docs/Documentation/Development/Compilers/rosetta_stone.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,10 @@ The topics covered include:

* gcc
* gfortran
* Intel icc (Classic, not Clang)
* Intel ifort (Fortran, not Clang)
* Intel icc (Classic)
* Moving to Intel's new icx compiler
* Intel ifort (Fortran)
* Moving to Intel's new ifx compiler
* Cray C (Clang based)
* Cray Fortran (ftn)

Expand Down Expand Up @@ -632,7 +634,40 @@ Valid categories include
reports - Optimization Reports

openmp - OpenMP and Parallel Processing
```
```
## Moving to Intel's new compiler icx
The Intel compilers icc and icpc are being retired and being replaced with icx and icpx.
Other than the name change many people will not notice significant differences.
The document [https://www.intel.com/content/www/us/en/developer/articles/guide/porting-guide-for-icc-users-to-dpcpp-or-icx.html](https://www.intel.com/content/www/us/en/developer/articles/guide/porting-guide-for-icc-users-to-dpcpp-or-icx.html)
has details. Here are some important blurbs from that page.
ICX and ICC Classic use different compiler drivers. The Intel® C++ Compiler Classic
compiler drivers are icc, icpc, and icl.  The Intel® oneAPI DPC++/C++ Compiler drivers
are icx and icpx. Use icx to compile and link C programs, and icpx for C++ programs.
Unlike the icc driver, icx does not use the file extension to determine whether to
compile as C or C+. Users must invoke icpx to compile C+ files. . In addition to
providing a core C++ Compiler, ICX/ICPX is also used to compile SYCL/DPC++ codes for the
Intel® oneAPI Data Parallel C++ Compiler when we pass an additional flag “-fsycl”. 
The major changes in compiler defaults are listed below:
* The Intel® oneAPI DPC++/C++ Compiler drivers are icx and icpx.
* Intel® C++ Compiler Classic uses icc, icpc or icl drivers but this compiler will be deprecated in the upcoming release.
* DPC++/SYCL users can use the icx/icpx driver along with the -fsycl flag which invokes ICX with SYCL extensions.
* Unlike Clang, the ICX Default floating point model was chosen to match ICC behavior and by default it is -fp-model=fast .
* MACRO naming is changing. Please be sure to check release notes for future macros to be included in ICX.
* No diagnostics numbers are listed for remarks, warnings, or notes. Every diagnostic is emitted with the corresponding compiler option to disable it.
* Compiler intrinsics cannot be automatically recognized without processor targeting options, unlike the behavior in Intel® C++ Compiler Classic. If you use intrinsics, read more on the documentation about intrinsic behavior changes.
## ifort
This discussion is for version 2021.6.0. Ifort will be replaced with a clang backend based alternative in the near future, ifx. Ifx will have most of the same options as ifort with some clang additions. In the Cray environment if PrgEnv-intel is loaded the "cc" maps to icc.
Expand Down Expand Up @@ -920,7 +955,23 @@ Valid categories include
reports - Optimization Reports

openmp - OpenMP and Parallel Processing
```
```
## Moving to Intel's new compiler ifx
Intel® Fortran Compiler Classic (ifort) is now deprecated and will be discontinued in late 2024.
Intel recommends that customers transition now to using the LLVM-based Intel® Fortran Compiler (ifx).
Other than the name change some people will not notice significant differences. The new compiler
supports offloading to Intel GPU. Kestrel and Swift do not have Intel GPUs so this is not at NREL.
One notable deletion from the new compiler is dropping of auto-parilization. With ifort the
-parallel compiler option auto-parallelization is enabled. That is not true for ifx; there
is no auto-parallelization feature with ifx.
For complete details please see: [https://www.intel.com/content/www/us/en/developer/articles/guide/porting-guide-for-ifort-to-ifx.html](https://www.intel.com/content/www/us/en/developer/articles/guide/porting-guide-for-ifort-to-ifx.html)
## Cray CC
In the Cray environment cc is a generic call for several different compilers. The compile actually called is determined by the modules loaded. Here we discuss Cray C : Version 14.0.4. cc will detect if the program being compiled calls MPI routines. If so, it will call the program as MPI. Cray C : Version 14.0.4 is clang based with Cray enhancements
Expand Down
9 changes: 9 additions & 0 deletions docs/Documentation/Development/Debug_Tools/ddt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# DDT (Linaro Debugger)

*DDT is Linaro's (formally ARM's) parallel GUI based debugger*

ddt is a GUI based parallel debugger that supports MPI, OpenMP, Cuda.
It can be used with C, C++, Fortran and Python. It shares much of its
infrastructure with Linaro's map and profiling tools. See the [Linaro-Forge](../Performance_Tools/Linaro-Forge/index.md) page for additional information.


Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,27 @@ There are two options for how to run MAP. The first method is to use the remote
### Option 1: Remote Client Setup
Download the remote client from the [Linaroforge Website](https://www.linaroforge.com/downloadForge/) Select the client for your platform (Mac/Windows/Linux) and ensure the client version number matches the version number of the Linaro suite you are using. You can see all the versions of linaro-forge available using:

`$ module avail linaro-forge`
`$ module avail forge`

Once you have the client installed, you will need to configure it to connect to the host:

1. Open the Linaro Forge Client application
2. Select the configure option in the "Remote Launch" dropdown menu, click "Add" and set the hostname to "[email protected]" where USER is your username and HOST is the host you are trying to connect to. We recommend using DAV nodes if available on your system.
3. In the Remote Installation Directory field, set the path to the Linaro installation on your host. (For example on Eagle this is: /nopt/nrel/apps/linaro-forge/##.#.# where ##.#.# represents the version number that must match your installation. Hint: use `$ module show linaro-forge/##.#.#` to get the path, do not include "/lib..." in the path)
3. In the Remote Installation Directory field, set the path to the Linaro installation on your host. This can be found by running the command:


```
dirname $(dirname $(which map))
```

For example:

```
module load forge/24.0.4
dirname $(dirname $(which map))
/nopt/nrel/apps/cpu_stack/software/forge/24.0.4
```

4. Hit "Test Remote Launch" to test the configuration.

Once the remote client is correctly set up, start a terminal and connect to the desired HPC system.
Expand Down Expand Up @@ -64,3 +78,81 @@ Once you have an appropriate build of your program to profile and either the Lin
You should now see the profiling data we described in the previous section [MAP](index.md). Please refer to that page as well as the [Linaro Forge Documentation](https://www.linaroforge.com/documentation/) for more details on what you can learn from such profiles.

![Linaro-MAP-Profile](../../../../../assets/images/Profiling/MAP-7.png)


## Debugging a program

The Forge debugger is ddt. It uses the same local client at map and perf-report. To get started, set up your local client version of Forge as described above in the section [MAP Setup - Option 1: Remote Client Setup](#option-1-remote-client-setup).

There are many ways to launch a debug session. Probably the simplest is to launch from an interactive session on a compute node.

Get an interactive session replacing MYACCOUNT with your account:

```
salloc --exclusive --mem=0 --tasks-per-node=104 --nodes=1 --time=01:00:00 --account=MYACCOUNT --partition=debug
```

As with map your application needs to be compiled with the -g option. Here is a simple build with make. (Here we also have a OpenMP program so we add the flag -fopenmp.)

```
make
cc -g -fopenmp -c triad.c
cc -g -fopenmp ex1.c triad.o -o exc
```

Our executable is *exc*.

We are going to need our remote directory so we run *pwd*.

```
pwd
/kfs3/scratch/user/debug
```

We load the module:

```
module load forge/24.0.4
```

Then run the command:

```
ddt --connect
```

Ddt is running on the compute node, waiting for you to connect with the local client. Launch your local client. Then under **Remote Launch:** select the machine to which you want to connect. After a few seconds you will see a window announcing that the ddt wants to connect you to your client. Hit **Accept**.

![Linaro-MAP-Profile](../../../../../assets/images/Profiling/DDT-1.png)

After acceptance completes click **Run and debug a program**.

Here is where you need the directory for your program. Put the full path to your application in the **Application** box and the directory in **Working Directory**. We assume the Working Directory, the directory which would normally contain your data is the same as your program directory.

This is an MPI program so select MPI. After that you will see more options. For most programs the Implementation should be SLURM (generic). If this is not what is shown or you know you need something else, select Change... to set it. For OpenMP programs select that box also.

![Linaro-MAP-Profile](../../../../../assets/images/Profiling/DDT-2.png)


Finally hit Run. After a few seconds you will see the debug window with the "main" source in the center window. You can set Break Points by clicking in the leftmost column of the source window. To start your program click the right facing triangle in the top left corner of the window.

![Linaro-MAP-Profile](../../../../../assets/images/Profiling/DDT-3.png)

See the full documentation for complete instructions. There is a copy of *userguide-forge.pdf* in the *doc* directory of the Forge directory.

```
module load forge
$echo `dirname $(dirname $(which ddt))`/doc
/nopt/nrel/apps/cpu_stack/software/forge/24.0.4/doc
ls /nopt/nrel/apps/cpu_stack/software/forge/24.0.4/doc
RELEASE-NOTES stacks.dtd userguide-forge.pdf
```






Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Intel's C compiler icc has been around for many years. It is being retired and

Our example programs are hybrid MPI/Openmp so we'll show commands for building hybrid programs. If your program is pure MPI the only change you need to make to the build process is to remove the compile line option -fopenmp.

Sample makefile, source codes, and runscript for on Kestrel can be found in our [Kestrel Repo](https://github.com/NREL/HPC/tree/master/kestrel) under the Toolchains folder. There are individual directories for source,makefiles, and scripts or you can download the intel.tgz file containing all required files.
Sample makefile, source codes, and runscript for on Kestrel can be found in our [Kestrel Repo](https://github.com/NREL/HPC/tree/master/kestrel) under the Toolchains folder. There are individual directories for source,makefiles, and scripts or you can download the intel.tgz file containing all required files. The source differs slightly from what is shown here. There is an extra file *triad.c* that gets compiled along with the Fortran and C programs discussed below. This file does some "dummy" work to allow the programs to run for a few seconds.


### module loads for compile
Expand Down Expand Up @@ -49,7 +49,7 @@ Here's what the compile lines should be where we add the -fopenmp option for Opn
mpiifort -O3 -g -fopenmp ex1.f90
```

#### 2. C with: Intel MPI and Intel C compiler, older compiler (icc)
#### 2a. C with: Intel MPI and Intel C compiler, older compiler (icc)
```
mpiicc -O3 -g -fopenmp ex1.c -o ex_c
```
Expand All @@ -60,15 +60,28 @@ We can compile with the extra flag.

```
mpiicc -diag-disable=10441 -O3 -g -fopenmp ex1.c -o gex_c
```
```

#### 2b. Older compiler (icc) might not be avialable

Depending on the version of compilers loaded the message shown above might be replaced with one saying that the icx is no longer available. In this case you **MUST** use icx. There are two ways to do that shown below.

#### 3. C with: Intel MPI and Intel C compiler, newer compiler (icx)
#### 3a. C with: Intel MPI and Intel C compiler, newer compiler (icx)

```
export I_MPI_CC=icx
mpiicc -O3 -g -fopenmp ex1.c -o ex_c
```
Setting the environmental variable tells mpiicc to use icx (the newer Intel compiler) instead of icc.
Setting the environmental variable tells mpiicc to use icx (the newer Intel compiler) instead of icc.


#### 3a. C with: Intel MPI and Intel C compiler, newer compiler (icx)

```
mpiicx -O3 -g -fopenmp ex1.c -o ex_c
```
Explictly running mpiicx will give you icx as the backend compiler.


### mpicc and mpif90 may not give you what you expect.

Expand Down Expand Up @@ -179,7 +192,7 @@ Our srun command line options for 2 tasks per node and 3 threads per task are:
--mpi=pmi2 --cpu-bind=v,cores --threads-per-core=1 --tasks-per-node=2 --cpus-per-task=3
```

* --mpi=pmi2 : tells srun to use a particular launcher
* --mpi=pmi2 : tells srun to use a particular launcher (This is optional.)
* --cpu-bind=v,cores : discussed above
* --threads-per-core=1 : don't allow multiple threads to run on the same core. Without this option it is possible for multiple threads to end up on the same core, decreasing performance.
* --cpus-per-task=3 : The cpus-per-task should always be equal to OMP\_NUM\_THREADS.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Compile and run: *MPI*

### Introduction
The [ToolChains Intel](./intel.md) document goes into great detail on running with various settings and
with the old and new versions of the Intel compilers.

The mpi/normal section of [gpubuildandrun](../gpubuildandrun.md) shows how to build and run using the more standard version of MPI.
Loading

0 comments on commit dfdf348

Please sign in to comment.