Skip to content

v0.3.1

Compare
Choose a tag to compare
@davidbeckingsale davidbeckingsale released this 21 Sep 16:30

This release contains some new RAJA features, plus a bunch of internal changes including more tests, conversion of nearly all unit tests to use Google Test, improved testing coverage, and compilation portability improvements (e.g., Intel, nvcc, msvc). Also, the prefix for all RAJA source files has been changed from *.cxx to *.cpp for consistency with the header file prefix conversion in the last release. The source file prefix change should not require users to change anything.

New features included in this release:

  • Execution policy modifications and additions: seq_exec is now strictly sequential (no SIMD, etc.), simd_exec will force SIMD vectorization, loop_exec (new policy) will allow compiler to optimize however it can, including SIMD. So, loop_exec is really what our previous simd_exec policy was before, and 'no vector' pragmas have been added to all sequential implementations. NOTE: SIMD changes are still being evaluated with different compilers on different platforms. More information will be provided as we learn more.
  • Added support for atomic operations (min, max, inc, dec, and, or, xor, exchange, and CAS) for all programming model backends. These appear in the RAJA::atomic namespace.
  • Support added for Intel Threading Building Blocks backend (considered experimental at this point)
  • Added macros that will be used to mark features for future deprecation (please watch for this as we will be deprecating some features in the next release)
  • Added support for C++17 if CMake knows about it
  • Remove limit on number of ordered OpenMP reductions that can be used in a kernel
  • Remove compile-time error from memutils, add portable aligned allocator
  • Improved ListSegment implementation
  • RAJA::Index_type is now ptrdiff_t instead of int

Notable bug fixes included in this release:

  • Fixed strided_numeric_iterator to apply stride sign in comparison
  • Bug in RangeStrideSegment when using CUDA is fixed
  • Fixed reducer logic for openmp_ordered policy