Skip to content

Latest commit

 

History

History
770 lines (537 loc) · 24.2 KB

atomic_ref.asciidoc

File metadata and controls

770 lines (537 loc) · 24.2 KB

Test plan for atomic_ref

This is a test plan for the APIs described in SYCL 2020 section 4.15.3. Atomic references

1. Testing scope

The test plan covers non-atomic behavior, the latter will be covered in a separate test plan

1.1. Device coverage

All of the tests described below are performed on the default device that is selected on the CTS command line

2. Type coverage

All of the interface tests described are performed using each of the following types as the ReferencedType.

In Regular mode:

  • int

  • float

In full conformance mode:

  • int

  • unsigned int

  • long

  • unsigned long

  • float

In addition, if the device has aspect::atomic64, the following types are tested:

  • long long

  • unsigned long long

  • double

Additionaly run same tests for ReferencedType*.

Note: using T as ReferencedType or ReferencedType*.

3. Tests

Note: all tests run with different values of DefaultOrder and DefaultScope template parameters, for unsupported values tests are skiped. To check if the special values of DefaultOrder parameters are supported should use info::device::atomic_memory_order_capabilities, memory_order::relaxed should be supported on all devices. To check if the special values for DefaultScope parameter are supported should use info::device::atomic_memory_scope_capabilities.

3.1. SYCL2020 interface

All non static tests run on following cases:

  • single_task with referenced value is accessor element - should be skipped for address_space::local_space

  • parallel_for with nd_range parameter and referenced value local_accessor element - should be skipped for address_space::global_space

3.1.1. Members

For atomic_ref with the following DefaultOrder template parameter values:

  • memory_order::relaxed

  • memory_order::acq_rel

  • memory_order::seq_cst

And the following values as DefaultScope template parameter:

  • memory_scope::work_item

  • memory_scope::sub_group

  • memory_scope::work_group

  • memory_scope::device

  • memory_scope::system

And the following values as Space template parameter:

  • access::address_space::global_space

  • access::address_space::local_space

  • access::address_space::generic_space

Check the following:

  • default_read_order, default_write_order and default_read_modify_write_order members are const

  • required_alignment memeber is const and is equal to or grater than alignof(ReferencedType)

  • is_always_lock_free member is const bool

  • default_scope member is const and is equal to DefaultScope template parameter was used when object was created

If T is ReferencedType check that value_type is same as ReferencedType.

If T is ReferencedType* check that value_type is same as ReferencedType*.

If T is Integral or Floating check that atomic_ref::difference_type is same as atomic_ref::value_type.

If T is ReferencedType* check that atomic_ref::difference_type is same as ptrdiff_t.

For different cases of DefaultOrder template parameter check the following:

  • memory_order::relaxed

    • default_read_order is memory_order::relaxed

    • default_write_order is memory_order::relaxed

    • default_read_modify_write_order is memory_order::relaxed

  • memory_order::acq_rel

    • default_read_order is memory_order::acquire

    • default_write_order is memory_order::release

    • default_read_modify_write_order is memory_order::acq_rel

  • memory_order::seq_cst

    • default_read_order is memory_order::seq_cst

    • default_write_order is memory_order::seq_cst

    • default_read_modify_write_order is memory_order::seq_cst

3.1.2. Common constructors

Create an atomic_ref object using atomic_ref(T&) and another object using copy constructor in device code using the following values as DefaultOrder template parameter:

  • memory_order::relaxed

  • memory_order::acq_rel

  • memory_order::seq_cst

The following values as DefaultScope template parameter:

  • memory_scope::work_item

  • memory_scope::sub_group

  • memory_scope::work_group

  • memory_scope::device

  • memory_scope::system

And the following values as Space template parameter:

  • access::address_space::global_space

  • access::address_space::local_space

  • access::address_space::generic_space

After calling constructors check for:

  • atomic_ref(T&)

    • If the stored value match with the value passed as T& using T load( memory_order order = default_read_order memory_scope scope = default_scope) const. Check returned value type of load member function.

  • atomic_ref(const atomic_ref&) noexcept

    • If the members of the newly created object match that of const atomic_ref& and the referenced value is valid using T load( memory_order order = default_read_order memory_scope scope = default_scope) const.

3.1.3. Member functions

To test member functions create an atomic_ref object using atomic_ref(T&) in device code using the following values as DefaultOrder template parameter:

  • memory_order::relaxed

  • memory_order::acq_rel

  • memory_order::seq_cst

And the following values as DefaultScope template parameter:

  • memory_scope::work_item

  • memory_scope::sub_group

  • memory_scope::work_group

  • memory_scope::device

  • memory_scope::system

And the following values as Space template parameter:

  • access::address_space::global_space

  • access::address_space::local_space

  • access::address_space::generic_space

Note: for the member functions tests below that take memory_order and memory_scope as arguments, the following values of these parameters are used only if the FULL CONFORMANCE mode is enabled:

  • For memory_order:

    • memory_order::relaxed

    • memory_order::acquire

    • memory_order::release

    • memory_order::acq_rel

    • memory_order::seq_cst

  • For memory_scope:

    • memory_scope::work_item

    • memory_scope::sub_group

    • memory_scope::work_group

    • memory_scope::device

    • memory_scope::system

It should be noted that memory_order::acquire is used only for read operations, memory_order::release for write operations and memory_order::acq_rel for read-modify-write operations.

If the FULL CONFORMANCE mode is disabled only default values for functions arguments is used which are determined by atomic_ref template parameters that which object was created with.

Common member functions

bool is_lock_free()

Check if the function exists and returns true if atomic_ref::is_always_lock_free is equal to true. Check returned value type.

void store(T operand, memory_order order = default_write_order, memory_scope scope = default_scope)

For the following values as order parameter:

  • memory_order::relaxed

  • memory_order::release

  • memory_order::seq_cst

For the following values as scope parameter:

  • memory_scope::work_item

  • memory_scope::sub_group

  • memory_scope::work_group

  • memory_scope::device

  • memory_scope::system

Check if the function stores operand to the object referenced by this atomic_ref.

T operator=(T desired) const noexcept

Same as store(desired): Check if the function stores desired to the object referenced by this atomic_ref. Check if returned value is desired and check returned value type.

operator T() const

Check if the function loads the value of the object referenced by this atomic_ref.

T exchange(T operand, memory_order order = default_read_modify_write_order, memory_scope scope = default_scope) const

For the following values as order parameter:

  • memory_order::relaxed

  • memory_order::acq_rel

  • memory_order::seq_cst

For the following values as scope parameter:

  • memory_scope::work_item

  • memory_scope::sub_group

  • memory_scope::work_group

  • memory_scope::device

  • memory_scope::system

Check if replaces the value of the object referenced by this atomic_ref with value operand and returns the original value of the referenced object. Check returned value type.

bool compare_exchange_weak(T &expected, T desired, memory_order success, memory_order failure, memory_scope scope = default_scope) const

For the following values as success parameter:

  • memory_order::relaxed

  • memory_order::acq_rel

  • memory_order::seq_cst

For the following values as failure parameter:

  • memory_order::relaxed

  • memory_order::acquire

  • memory_order::seq_cst

For the following values as scope parameter:

  • memory_scope::work_item

  • memory_scope::sub_group

  • memory_scope::work_group

  • memory_scope::device

  • memory_scope::system

For equal values: it attempts to replaces the value of the referenced object with the value of desired. This may not be checked since it is non-deterministic.

For unequal values: check if it assigns the original value of the referenced object to expected.

Also check if it returns true when the comparison operation and replacement operation were successful.

Check returned value type.

bool compare_exchange_weak(T &expected, T desired, memory_order order = default_read_modify_write_order, memory_scope scope = default_scope) const

Same as above: Equivalent to compare_exchange_weak(expected, desired, order, order, scope) using following values as order parameter:

  • memory_order::relaxed

  • memory_order::acq_rel

  • memory_order::seq_cst

bool compare_exchange_strong(T &expected, T desired, memory_order success, memory_order failure, memory_scope scope = default_scope) const

For the following values as success parameter:

  • memory_order::relaxed

  • memory_order::acq_rel

  • memory_order::seq_cst

For the following values as failure parameter:

  • memory_order::relaxed

  • memory_order::acquire

  • memory_order::seq_cst

For the following values as scope parameter:

  • memory_scope::work_item

  • memory_scope::sub_group

  • memory_scope::work_group

  • memory_scope::device

  • memory_scope::system

For equal values: check if it replaces the value of the referenced object with the value of desired.

For unequal values: check if it assigns the original value of the referenced object to expected.

Also check if it returns true when the comparison operation was successful.

Check returned value type.

bool compare_exchange_strong(T &expected, T desired, memory_order order = default_read_modify_write_order) const

Same as above: Equivalent to compare_exchange_strong(expected, desired, order, order, scope) using following values as order parameter:

  • memory_order::relaxed

  • memory_order::acq_rel

  • memory_order::seq_cst

Additional member functions for all specializations

For the following values as order parameter:

  • memory_order::relaxed

  • memory_order::acq_rel

  • memory_order::seq_cst

For the following values as scope parameter:

  • memory_scope::work_item

  • memory_scope::sub_group

  • memory_scope::work_group

  • memory_scope::device

  • memory_scope::system

Test following functions:

T fetch_add(difference_type operand, memory_order order = default_read_modify_write_order, memory_scope scope = default_scope) const

Check if it adds operand to the value of the object referenced by this atomic_ref and assigns the result to the value of the referenced object. If T is Floating check that new value of the referenced object is equal to (previous value + operand) +- epsilon. Check if it returns the original value of the referenced object. Check returned value type.

T operator+=(difference_type operand) const

Same as above: Equivalent to fetch_add(operand) + operand.

T fetch_sub(difference_type operand, memory_order order = default_read_modify_write_order, memory_scope scope = default_scope) const

Check if it subtracts operand from the value of the object referenced by this atomic_ref and assigns the result to the value of the referenced object. If T is Floating check that new value of the referenced object is equal to (previous value + operand) +- epsilon. Check if it returns the original value of the referenced object. Check returned value type.

T operator-=(difference_type operand) const

Same as above: Equivalent to fetch_sub(operand) - operand.

Additional member functions available on an object of type atomic_ref<T> for integral T

For the following values as order parameter:

  • memory_order::relaxed

  • memory_order::acq_rel

  • memory_order::seq_cst

For the following values as scope parameter:

  • memory_scope::work_item

  • memory_scope::sub_group

  • memory_scope::work_group

  • memory_scope::device

  • memory_scope::system

Test following functions if T is Integral, skip otherwise:

T fetch_and(T operand, memory_order order = default_read_modify_write_order, memory_scope scope = default_scope) const

Check if it performs a bitwise AND between operand and the value of the object referenced by this atomic_ref, and assigns the result to the value of the referenced object. Check if it returns the original value of the referenced object. Check returned value type.

T operator&=(T operand) const

Same as above: Equivalent to fetch_and(operand) & operand.

T fetch_or(T operand, memory_order order = default_read_modify_write_order, memory_scope scope = default_scope) const

Check if it performs a bitwise OR between operand and the value of the object referenced by this atomic_ref, and assigns the result to the value of the referenced object. Check if it returns the original value of the referenced object. Check returned value type.

T operator|=(T operand) const

Same as above: Equivalent to fetch_or(operand) | operand.

T fetch_xor(T operand, memory_order order = default_read_modify_write_order, memory_scope scope = default_scope) const

Check if it performs a bitwise XOR between operand and the value of the object referenced by this atomic_ref, and assigns the result to the value of the referenced object. Check if it returns the original value of the referenced object. Check returned value type.

T operator^=(T operand) const

Same as above: Equivalent to fetch_xor(operand) ^ operand.

Additional member functions available on an object of type atomic_ref<T> for integral and pointer T

Test following functions if T is Integral or ReferencedType*, skip otherwise:

T operator++(int) const

Same as fetch_add: Equivalent to fetch_add(1).

T operator++() const

Same as fetch_add: Equivalent to fetch_add(1) + 1.

T operator--(int) const

Same as fetch_sub: Equivalent to fetch_sub(1).

T operator--() const

Same as fetch_sub: Equivalent to fetch_sub(1) - 1.

Additional member functions available on an object of type atomic_ref<T> for integral and floating T

For the following values as order parameter:

  • memory_order::relaxed

  • memory_order::acq_rel

  • memory_order::seq_cst

For the following values as scope parameter:

  • memory_scope::work_item

  • memory_scope::sub_group

  • memory_scope::work_group

  • memory_scope::device

  • memory_scope::system

Test following functions if T is Integral or Floating, skip otherwise:

T fetch_min(T operand, memory_order order = default_read_modify_write_order, memory_scope scope = default_scope) const

Check if it computes the minimum of operand and the value of the object referenced by this atomic_ref, and assigns the result to the value of the referenced object. If T is Floating check that new value of the referenced object is equal to (minimum value + operand) +- epsilon. Check if it returns the original value of the referenced object. Check returned value type.

T fetch_max(T operand, memory_order order = default_read_modify_write_order, memory_scope scope = default_scope) const

Check if it computes the maximum of operand and the value of the object referenced by this atomic_ref, and assigns the result to the value of the referenced object. If T is Floating check that new value of the referenced object is equal to (maximum value + operand) +- epsilon. Check if it returns the original value of the referenced object. Check returned value type.

3.2. Stress tests

3.2.1. Atomicity for device scope

This test uses a simple-form parallel_for to atomically update a single scalar value from all work-items:

TYPE val{};
sycl::buffer buf{&val, {1}};
q.submit([&](sycl::handler &cgh) {
  sycl::accessor acc{buf, cgh};
  cgh.parallel_for({SIZE}, [=](auto i) {
     sycl::atomic_ref<TYPE, ORDER, SCOPE, SPACE> a_r{acc[0]};
     a_r.fetch_add(2);
  });
});

Verify that val == SIZE*2 after the loop completes. (The result must be accurate within some small error range when TYPE is a floating point type.)

This test is repeated for all combinations of supported values listed below:

These TYPE types:

  • int

  • float

  • long long (if sizeof(long long) == 8 then only if the device has aspect::atomic64)

  • double (only if the device has aspect::fp64 and if sizeof(double) == 8 only if the device has aspect::atomic64)

These SCOPE values which are supported by the device (according to info::device::atomic_memory_scope_capabilities):

  • memory_scope::device

  • memory_scope::system

These ORDER values which are supported by the device (according to info::device::atomic_memory_order_capabilities):

  • memory_order::relaxed

  • memory_order::acq_rel

  • memory_order::seq_cst

These SPACE values:

  • access::address_space::global_space

  • access::address_space::generic_space

3.2.2. Atomicity for work_group_scope

This test uses a parallel_for with nd_range parameter to atomically update array, each element from all work-items in one work-group:

std::array<TYPE, GROUP_RANGE> vals;
vals.fill(0);
sycl::buffer buf{vals.data(), {GROUP_RANGE}};
q.submit([&](sycl::handler &cgh) {
  sycl::accessor acc{buf, cgh};
  sycl::local_accessor<TYPE> lacc{{1}, cgh}
  cgh.parallel_for(sycl::nd_range(GROUP_RANGE * LOCAL_SIZE, LOCAL_SIZE),
                    [=](auto item) {
     sycl::atomic_ref<TYPE, ORDER, SCOPE, SPACE> a_r{lacc[0]};
     if (a_r.fetch_sub(2) == -(LOCAL_SIZE*2)) {
        acc[item.group_id()] = a_r.load();
     }
  });
});

Verify that val[i] == -LOCAL_SIZE*2 after the loop completes. (The result must be accurate within some small error range when TYPE is a floating point type.)

This test is repeated for all combinations of supported values listed below:

These TYPE types:

  • int

  • float

  • long long (if sizeof(long long) == 8 then only if the device has aspect::atomic64)

  • double (only if the device has aspect::fp64 and if sizeof(double) == 8 only if the device has aspect::atomic64)

These SCOPE values which are supported by the device (according to info::device::atomic_memory_scope_capabilities):

  • memory_scope::device

  • memory_scope::system

  • memory_scope::work_group

These ORDER values which are supported by the device (according to info::device::atomic_memory_order_capabilities):

  • memory_order::relaxed

  • memory_order::acq_rel

  • memory_order::seq_cst

These SPACE values:

  • access::address_space::local_space

  • access::address_space::generic_space

3.2.3. Aquire and release

This test uses a parallel_for with nd_range parameter to atomically update shared scalars in two threads within local group.

const TYPE val chaged_val = 42;
sycl::nd_range nd_range(GLOBAL_RANGE, 2);
std::array<bool, GLOBAL_RANGE / 2> res;
res.fill(false);
sycl::buffer buf{res.data(), {GLOBAL_RANGE/2}};
q.submit([&](sycl::handler &cgh) {
  sycl::accessor res_acc{buf, cgh};
  sycl::local_accessor<TYPE, 0> x{cgh};
  sycl::local_accessor<TYPE, 0> y{cgh};
  sycl::local_accessor<TYPE, 0> A{cgh};
  sycl::local_accessor<TYPE, 0> B{cgh};
  cgh.parallel_for(nd_range, sycl::range(2), [=](auto item) {
    sycl::atomic_ref<TYPE, ORDER, SCOPE, SPACE> refA{A};
    sycl::atomic_ref<TYPE, ORDER, SCOPE, SPACE> refB{B};
    refA.store(0);
    refB.store(0);
    sycl::group_barrier(item.get_group());
    if (item.get_local_id() == 0) {
        x = refA.load();
        refB.store(1);
      } else {
        y = refB.load();
        refA.store(1);
      }
    sycl::group_barrier(item.get_group());
      if (item.get_local_id() == 0)
        res_acc[item.get_group__id()] = !(x == 1 && y == 1);
    });
  });

Verify that all elements of res is true after the loop completes.

This test is repeated for all combinations of supported values listed below:

These TYPE types:

  • int

  • float

  • long long (if sizeof(long long) == 8 then only if the device has aspect::atomic64)

  • double (only if the device has aspect::fp64 and if sizeof(double) == 8 only if the device has aspect::atomic64)

These SCOPE values which are supported by the device (according to info::device::atomic_memory_scope_capabilities):

  • memory_scope::device

  • memory_scope::system

  • memory_scope::work_group

These ORDER values which are supported by the device (according to info::device::atomic_memory_order_capabilities):

  • memory_order::acq_rel

  • memory_order::seq_cst

These SPACE values:

  • access::address_space::local_space

  • access::address_space::generic_space

3.2.4. Ordering

This test uses a parallel_for with nd_range parameter to atomically read scalar in one thread, and then aquire all changes from it in another thread within local group.

size_t local_range = q.get_info<sycl::info::device::max_work_group_size>();
sycl::nd_range nd_range(GLOBAL_RANGE, local_range);
std::array<bool, GLOBAL_RANGE> res;
res.fill(false);
sycl::buffer buf{res.data(), {GLOBAL_RANGE}};
q.submit([&](sycl::handler &cgh) {
  sycl::accessor res_acc{buf, cgh};
  sycl::local_accessor<TYPE, 0> local_acc{cgh};
  sycl::local_accessor<bool> arr_acc{{local_range}, cgh};
  cgh.parallel_for(nd_range, [=](auto item) {
    arr_acc[item.get_local_id()] = false;
    sycl::group_barrier(item.get_group());
    sycl::atomic_ref<TYPE, memory_order::relaxed, SCOPE, SPACE> a_r{local_acc};
    arr_acc[item.get_local_id()] = true;
    a_r.store(item.get_local_id(), ORDER1);
    res_acc[item.get_global_id()] = (arr_acc[a_r.load(ORDER2)] == true);
  });
});

Verify that all elements of res is true after the loop completes.

This test is repeated for all combinations of supported values listed below:

These TYPE types:

  • int

  • float

  • long long (if sizeof(long long) == 8 then only if the device has aspect::atomic64)

  • double (only if the device has aspect::fp64 and if sizeof(double) == 8 only if the device has aspect::atomic64)

These SCOPE values which are supported by the device (according to info::device::atomic_memory_scope_capabilities):

  • memory_scope::device

  • memory_scope::system

  • memory_scope::work_group

These ORDER1 and ORDER2 values which are supported by the device (according to info::device::atomic_memory_order_capabilities):

  • ORDER1 is memory_order::release and ORDER2 is memory_order::acquire

  • ORDER1 and ORDER2 are both memory_order::seq_cst

These SPACE values:

  • access::address_space::local_space

  • access::address_space::generic_space

3.2.5. Atomicity with respect to atomic operations in host code

This check should use #ifdef __cpp_lib_atomic_ref so it can be skipped if std::atomic_ref is not supported.

This check is skipped if device doesn’t support aspect::usm_atomic_shared_allocations.

This check is skipped if device doesn’t support memory_scope::system in info::device::atomic_memory_scope_capabilities.

This test uses a simple-form parallel_for to atomically update a single scalar value from all work-items and simultainiously update it on host:

TYPE *pval = sycl::malloc_shared<TYPE>(1, q);
*pval = 0;
std::atomic_ref a_host{*pval};
auto event = q.submit([&](sycl::handler &cgh) {
  cgh.parallel_for({SIZE}, [=](auto i) {
     sycl::atomic_ref<TYPE, ORDER, sycl::memory_scope::system, SPACE> a_dev{*pval};
     a_dev++;
  });
});
for (int i = 0; i < COUNT; i++)
  a_host++;
event.wait();

Verify that *pval == SIZE+COUNT.

This test is repeated for all combinations of supported values listed below:

These ORDER values which are supported by the device (according to info::device::atomic_memory_order_capabilities):

  • memory_order::relaxed

  • memory_order::acq_rel

  • memory_order::seq_cst

These SPACE values:

  • access::address_space::global_space

  • access::address_space::generic_space

Run the same test as above with malloc_host instead of malloc_shared if device supports aspect::usm_atomic_host_allocations.