Skip to content

Releases: CaNS-World/CaNS

v2.0

05 Sep 15:38
Compare
Choose a tag to compare

CaNS 2.0 is finally released! 🎉

This is the most significant revision of our toolkit so far.

Co-authored by Pedro Costa, Massimiliano Fatica, and Josh Romero.

Summary

This release marks the ending of a fresh porting effort for massively parallel simulations on modern architectures, from one to thousands of GPUs with a focus on performance while ensuring a flexible and sustainable implementation that is easy to extend for more complex physical problems. We used OpenACC directives to accelerate loops and for host/device data transfer, interoperated with NVIDIA's cuFFT and the new cuDecomp domain decomposition library.

cuDecomp is the heart of the multi-GPU implementation, ensuring the solver's performance by bringing a novel, hardware-adaptive parallelization of the transposes in the Poisson/Helmholtz solver, and of the halo-exchange operations.

Although quite performant, the implementation is also flexible, allowing for an easy change of solver profiles, such as X-aligned default pencils, which are optimal for a fully explicit time integration, or Z-aligned default pencils, which are optimal for a Z-implicit time integration for wall flows.

Finally, another noteworthy (optional) feature is CaNS' new mixed-precision mode, where the pressure Poisson equation is solved in lower precision. This mode makes a huge difference in performance for many-GPU calculations across multiple nodes.

In addition to these big-picture changes, there have been many impactful changes that make the solver more versatile and robust. All relevant changes are summarized below.

Changes:

  • GPU acceleration using OpenACC directives for loops and data movement, which is interfaced with CUDA whenever needed
  • Hardware-adaptive multi-GPU implementation using the cuDecomp library for transposes (seven possible communication backends) and halo exchanges (five possible communication backends), with different flavors of MPI, NCCL and NVSHMEM implementations
  • Lean memory footprint on GPUs, which can be made even leaner by exploiting cuDecomp's in-place transposes
  • Mixed-precision mode implemented on both CPUs and GPUs
  • Hybrid MPI-OpenMP parallelization is still supported on CPUs
  • Any default pencil orientation is supported, on both CPUs and GPUs
  • A fast-kernel mode is used by default to speed up the calculation of the prediction velocity, on both CPUs and GPUs
  • The 2DECOMP library is still used for the many-CPU parallelization of the Poisson solver, and some of the parallel data I/O
  • Build process made much simpler and more robust, with the dependencies determined automatically
  • Refactoring of the FFT-based Fourier, cosine, and sine transforms on GPUs, together with the Gauss elimination kernels, with improvements both in terms of speed and maintainability
  • Support for uneven decompositions and odd numbers along any direction; perhaps surprisingly, at times setups with odd numbers near the desired resolution may result in a more efficient FFT computation
  • External domain decomposition libraries, cuDecomp and 2DECOMP, loaded as Submodules
  • Many changes for improved performance and robustness, with a focus on minimizing the memory footprint and computation intensity while keeping the tool versatile

Acknowledgements

CaNS 2.0 has been tested in several GPU-accelerated systems such as Marconi 100, Meluxina, Perlmutter, Selene, Summit and Vega. We acknowledge the support from CoE RAISE, NERSC and EuroHPC, which enabled thorough testing of CaNS 2.0 in these state-of-the-art supercomputers.

v1.3.1

25 Jul 01:22
Compare
Choose a tag to compare

Summary

This release features some simplifications of the OpenMP code and the removal of the nthreadsmax input parameter. It was first meant at fixing an issue concerning boundary conditions, but the implementation is actually correct.

Changes

  • removes necessary nthreadsmax input parameter (#28)
  • simplified OpenMP directives (#28)

Full Changelog: v1.3.0...v1.3.1

v1.3.0

15 Jul 11:39
Compare
Choose a tag to compare

Summary

This release features a mixed-precision mode where the Poisson equation can be solved using lower precision, which may be useful for certain setups.

Changes

v1.2.0

16 May 15:02
0256f8c
Compare
Choose a tag to compare

Summary

This release features a more robust and friendly build process (still using Make). It also features some restructuring of the documentation.

Changes:

  • better build process with a few pre-defined profiles and automatic dependency generation (requires gawk). See doc/INFO_COMPILING.md
  • 2DECOMP built as an external library
  • documentation files brought into the doc folder

(see #25)

Full Changelog: v1.1.5...v1.2.0

v1.1.5

22 Apr 21:14
Compare
Choose a tag to compare

Summary

This is release features minor changes, adding a new checkpointing mode;

Changes:

  • new checkpointing mode was added to bound the number of checkpoints per run to a maximum, which can be set using a new parameter in the input file dns.in, named nsaves_max; please see src/INFO_INPUT.md for more details;

v1.1.4

04 Mar 18:16
Compare
Choose a tag to compare

Summary

This is release features minor changes, with performance improvements, and bugfixes;

Changes:

  • implicit Z diffusion made considerably more efficient. For optimal performance, the code needs to be built with -D_DECOMP_Z, as explained in README.md;
  • new example files and grid mapping functions have been added (thanks @GabrieleBoga for the temporal boundary layer setup! #22);
  • other minor bugfixes;

v1.1.3

03 Jan 11:54
Compare
Choose a tag to compare

Summary

This release has a main major feature. It implements the option for choosing implicit diffusion along only one of the domain directions - the third one (z), where the grid can be non-uniform. Hence, CaNS can be run now in (1) fully explicit mode; (2) implicit diffusion along all directions, and (3) implicit diffusion only along the z-direction, which comes in handy for very fine grids along only z. See Compilation, under README.md for how to activate this feature.

Changes:

  • Option for implicit diffusion only along z;
  • Minor changes in the Poisson solver to avoid scaling of the absolute pressure under certain combinations of BCs;
  • Added a two-dimensional Taylor-Green vortex case.

v1.1.2

03 Dec 16:06
e2290b0
Compare
Choose a tag to compare

Summary

This release adds very minor features with respect to the previous major release v1.1.0. Just a more robust input sanity check, and a slightly larger flexibility of the domain and processor grids.

Changes:

  • A very robust check of the direct Helmholtz solver for cell- and face-centered variables has been enabled -- at the beginning of any calculation, the Poisson and, if implicit diffusion is used, the three additional Helmholtz equations with normal boundary conditions the cell faces are checked for random inputs under sanity.f90, if the code is built with the -D_DEBUG preprocessor flag;
  • dims(:) does not have to be divisible by 2 anymore;
  • possibility of using 1 grid point along a certain direction, rather than the previous minimum of 2;
  • fixes an input sanity check bug introduced in v1.1.1 (#19);
  • some nitpicking.

v1.1.0

29 Oct 16:37
eeccb5c
Compare
Choose a tag to compare

Summary

This release features significant improvements in terms of performance and scalability, but also enhances the code modularity and the implementation in general. There is no breaking of backward compatibility.

Changes:

  • x-aligned pencils are now used by default in the main branch, which results in improved speed and scalability;
  • support for uneven partitioning of the computational subdomains: the total number of grid points along one direction does not have to be divisible by the number of tasks;
  • simplified and unified the routines used for computing the prediction velocity with and without implicit diffusion;
  • improved the routines for imposing boundary conditions, and the MPI I/O checkpointing (based on those of SNaC);
  • support an arbitrary extent of boundary cells when imposing boundary conditions;
  • lots of polishing and minor improvements.