05 Sep 15:38

p-costa

00065e6

v2.0

`CaNS 2.0` is finally released! 🎉

This is the most significant revision of our toolkit so far.

Co-authored by Pedro Costa, Massimiliano Fatica, and Josh Romero.

Summary

This release marks the ending of a fresh porting effort for massively parallel simulations on modern architectures, from one to thousands of GPUs with a focus on performance while ensuring a flexible and sustainable implementation that is easy to extend for more complex physical problems. We used OpenACC directives to accelerate loops and for host/device data transfer, interoperated with NVIDIA's cuFFT and the new cuDecomp domain decomposition library.

cuDecomp is the heart of the multi-GPU implementation, ensuring the solver's performance by bringing a novel, hardware-adaptive parallelization of the transposes in the Poisson/Helmholtz solver, and of the halo-exchange operations.

Although quite performant, the implementation is also flexible, allowing for an easy change of solver profiles, such as X-aligned default pencils, which are optimal for a fully explicit time integration, or Z-aligned default pencils, which are optimal for a Z-implicit time integration for wall flows.

Finally, another noteworthy (optional) feature is CaNS' new mixed-precision mode, where the pressure Poisson equation is solved in lower precision. This mode makes a huge difference in performance for many-GPU calculations across multiple nodes.

In addition to these big-picture changes, there have been many impactful changes that make the solver more versatile and robust. All relevant changes are summarized below.

Changes:

GPU acceleration using OpenACC directives for loops and data movement, which is interfaced with CUDA whenever needed
Hardware-adaptive multi-GPU implementation using the cuDecomp library for transposes (seven possible communication backends) and halo exchanges (five possible communication backends), with different flavors of MPI, NCCL and NVSHMEM implementations
Lean memory footprint on GPUs, which can be made even leaner by exploiting cuDecomp's in-place transposes
Mixed-precision mode implemented on both CPUs and GPUs
Hybrid MPI-OpenMP parallelization is still supported on CPUs
Any default pencil orientation is supported, on both CPUs and GPUs
A fast-kernel mode is used by default to speed up the calculation of the prediction velocity, on both CPUs and GPUs
The 2DECOMP library is still used for the many-CPU parallelization of the Poisson solver, and some of the parallel data I/O
Build process made much simpler and more robust, with the dependencies determined automatically
Refactoring of the FFT-based Fourier, cosine, and sine transforms on GPUs, together with the Gauss elimination kernels, with improvements both in terms of speed and maintainability
Support for uneven decompositions and odd numbers along any direction; perhaps surprisingly, at times setups with odd numbers near the desired resolution may result in a more efficient FFT computation
External domain decomposition libraries, cuDecomp and 2DECOMP, loaded as Submodules
Many changes for improved performance and robustness, with a focus on minimizing the memory footprint and computation intensity while keeping the tool versatile

Acknowledgements

CaNS 2.0 has been tested in several GPU-accelerated systems such as Marconi 100, Meluxina, Perlmutter, Selene, Summit and Vega. We acknowledge the support from CoE RAISE, NERSC and EuroHPC, which enabled thorough testing of CaNS 2.0 in these state-of-the-art supercomputers.

Assets 2

25 Jul 01:22

p-costa

v1.3.1

3f04d2a

v1.3.1

Summary

This release features some simplifications of the OpenMP code and the removal of the nthreadsmax input parameter. It was first meant at fixing an issue concerning boundary conditions, but the implementation is actually correct.

Changes

removes necessary nthreadsmax input parameter (#28)
simplified OpenMP directives (#28)

Full Changelog: v1.3.0...v1.3.1

Assets 2

15 Jul 11:39

p-costa

v1.3.0

e37b3e0

v1.3.0

Summary

This release features a mixed-precision mode where the Poisson equation can be solved using lower precision, which may be useful for certain setups.

Changes

Mixed precision mode by @p-costa in #26 after discussions w/ @maxcuda and @romerojosh. For more details on how to set it up, see the option SINGLE_PRECISION_POISSON under doc/INFO_COMPILING.md.

Contributors

romerojosh, maxcuda, and p-costa

Assets 2

16 May 15:02

p-costa

v1.2.0

0256f8c

v1.2.0

Summary

This release features a more robust and friendly build process (still using Make). It also features some restructuring of the documentation.

Changes:

better build process with a few pre-defined profiles and automatic dependency generation (requires gawk). See doc/INFO_COMPILING.md
2DECOMP built as an external library
documentation files brought into the doc folder

(see #25)

Full Changelog: v1.1.5...v1.2.0

Assets 2

22 Apr 21:14

p-costa

v1.1.5

566aa72

v1.1.5

Summary

This is release features minor changes, adding a new checkpointing mode;

Changes:

new checkpointing mode was added to bound the number of checkpoints per run to a maximum, which can be set using a new parameter in the input file dns.in, named nsaves_max; please see src/INFO_INPUT.md for more details;

Assets 2

04 Mar 18:16

p-costa

v1.1.4

800283d

v1.1.4

Summary

This is release features minor changes, with performance improvements, and bugfixes;

Changes:

implicit Z diffusion made considerably more efficient. For optimal performance, the code needs to be built with -D_DECOMP_Z, as explained in README.md;
new example files and grid mapping functions have been added (thanks @GabrieleBoga for the temporal boundary layer setup! #22);
other minor bugfixes;

Contributors

GabrieleBoga

Assets 2

03 Jan 11:54

p-costa

v1.1.3

2ca5d67

v1.1.3

Summary

This release has a main major feature. It implements the option for choosing implicit diffusion along only one of the domain directions - the third one (z), where the grid can be non-uniform. Hence, CaNS can be run now in (1) fully explicit mode; (2) implicit diffusion along all directions, and (3) implicit diffusion only along the z-direction, which comes in handy for very fine grids along only z. See Compilation, under README.md for how to activate this feature.

Changes:

Option for implicit diffusion only along z;
Minor changes in the Poisson solver to avoid scaling of the absolute pressure under certain combinations of BCs;
Added a two-dimensional Taylor-Green vortex case.

Assets 2

03 Dec 16:06

p-costa

v1.1.2

e2290b0

v1.1.2

Summary

This release adds very minor features with respect to the previous major release v1.1.0. Just a more robust input sanity check, and a slightly larger flexibility of the domain and processor grids.

Changes:

A very robust check of the direct Helmholtz solver for cell- and face-centered variables has been enabled -- at the beginning of any calculation, the Poisson and, if implicit diffusion is used, the three additional Helmholtz equations with normal boundary conditions the cell faces are checked for random inputs under sanity.f90, if the code is built with the -D_DEBUG preprocessor flag;
dims(:) does not have to be divisible by 2 anymore;
possibility of using 1 grid point along a certain direction, rather than the previous minimum of 2;
fixes an input sanity check bug introduced in v1.1.1 (#19);
some nitpicking.

Assets 2

29 Oct 16:37

p-costa

v1.1.0

eeccb5c

v1.1.0

Summary

This release features significant improvements in terms of performance and scalability, but also enhances the code modularity and the implementation in general. There is no breaking of backward compatibility.

Changes:

x-aligned pencils are now used by default in the main branch, which results in improved speed and scalability;
support for uneven partitioning of the computational subdomains: the total number of grid points along one direction does not have to be divisible by the number of tasks;
simplified and unified the routines used for computing the prediction velocity with and without implicit diffusion;
improved the routines for imposing boundary conditions, and the MPI I/O checkpointing (based on those of SNaC);
support an arbitrary extent of boundary cells when imposing boundary conditions;
lots of polishing and minor improvements.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`CaNS 2.0` is finally released! 🎉

Summary

Changes:

Acknowledgements

Summary

Changes

Summary

Changes

Contributors

Summary

Changes:

Summary

Changes:

Summary

Changes:

Contributors

Summary

Changes:

Summary

Changes:

Summary

Changes:

Releases: CaNS-World/CaNS

v2.0

CaNS 2.0 is finally released! 🎉

Summary

Changes:

Acknowledgements

v1.3.1

Summary

Changes

v1.3.0

Summary

Changes

Contributors

v1.2.0

Summary

Changes:

v1.1.5

Summary

Changes:

v1.1.4

Summary

Changes:

Contributors

v1.1.3

Summary

Changes:

v1.1.2

Summary

Changes:

v1.1.0

Summary

Changes:

`CaNS 2.0` is finally released! 🎉