Replace internally developed CSR GEMM with a call to MKL #2959

Vika-F · 2024-10-28T12:56:00Z

internally developed GEMM was removed
call to MKL's sparse GEMM was added into clusters assignment step in sparse K-means (see assign_clusters kernel)
incorrect use of communicator was removed
sparse method was aligned with dense method by removing buggy version of handle_empty_clusters kernel from sparse implementation. Now both dense and sparse implementations use the same kernel for empty clusters handling and give the same results on the same input data passed in sparse and dense layout respectively.

PR completeness and readability

I have reviewed my changes thoroughly before submitting this pull request.
I have commented my code, particularly in hard-to-understand areas.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have added a respective label(s) to PR if I have a permission for that.
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.
All the failures in CI are not related to sparse K-means algorithm.
I have extended testing suite if new functionality was introduced in this PR.
No new functionality was introduced.

Performance

I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
I have provided justification why performance has changed or why changes are not expected.
I have provided justification why quality metrics have changed or why changes are not expected.
I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

Pull changes from oneDAL main branch

…d debug prints.

Vika-F · 2024-10-31T09:33:38Z

/intelci: run

Vika-F · 2024-11-04T14:03:11Z

/intelci: run

Vika-F · 2024-11-06T11:59:15Z

CI: http://intel-ci.intel.com/ef9c3679-46e1-f136-934c-a4bf010d0e2d

Pull the changes from main branch

…nto dev/kmeans_csr_upd

Vika-F · 2024-11-11T13:35:57Z

/intelci: run

cpp/oneapi/dal/algo/kmeans/backend/gpu/kernels_fp.hpp

ahuber21 · 2024-11-13T10:49:59Z

cpp/oneapi/dal/algo/kmeans/backend/gpu/kernels_csr_impl.hpp


-    const auto distances_ptr = distances.get_data();
+    // Workaround. Sparse gemm cannot accept transposed dense inputs in oneMKL 2025.0.


How will we track this?

ahuber21 · 2024-11-13T10:51:05Z

cpp/oneapi/dal/algo/kmeans/backend/gpu/kernels_csr_impl.hpp


    const auto finalize_range =
        bk::make_multiple_nd_range_2d({ num_clusters, local_size }, { 1, local_size });
+
+    // Compute the array of centroids by dividing the respective sums of observations
+    //  by the number of observations in each centroid


Suggested change

// by the number of observations in each centroid

// by the number of observations in each centroid

ahuber21 · 2024-11-13T10:57:43Z

cpp/oneapi/dal/algo/kmeans/detail/infer_ops_dpc.cpp

+#define INSTANTIATE_SINGLE_NODE(F, M, T) \
+    template struct ONEDAL_EXPORT infer_ops_dispatcher<dal::detail::data_parallel_policy, F, M, T>;


Suggested change

#define INSTANTIATE_SINGLE_NODE(F, M, T) \

template struct ONEDAL_EXPORT infer_ops_dispatcher<dal::detail::data_parallel_policy, F, M, T>;

#define INSTANTIATE_NON_DISTRIBUTED(F, M, T) \

template struct ONEDAL_EXPORT infer_ops_dispatcher<dal::detail::data_parallel_policy, F, M, T>;

Maybe that's better? With multi-tile GPUs or single-node multi-GPU setups we would be using spdm on a single node?

cpp/oneapi/dal/test/engine/csr_table_builder.hpp

ahuber21 · 2024-11-13T11:01:30Z

Thanks for adding extensive & helpful comments!

Vika-F added 2 commits September 27, 2024 05:02

Copied changes from dev\kmeans_csr branch

bbf61b6

Merge pull request #31 from oneapi-src/main

5716fa8

Pull changes from oneDAL main branch

Vika-F mentioned this pull request Oct 28, 2024

Replace internally developed CSR GEMM with a call to MKL #2808

Closed

1. Replace tree reduce with atomics in sparse update_centroids. 2. Ad…

69b3310

…d debug prints.

Revert the changes in update_centroids SYCL kernel for spars K-means

3bde12e

Vika-F added the enhancement label Nov 4, 2024

Vika-F added 4 commits November 5, 2024 02:17

Fix build error in test engine file csr_table_builder.hpp

1f75ae0

Update test with the csr_table_builder::build_dense_table API change

2824de3

clang-format

8f47003

minor fix

a07a785

Vika-F added 4 commits November 6, 2024 13:58

Merge pull request #32 from oneapi-src/main

02641b1

Pull the changes from main branch

Fix dependencies in sparse K-means inference

b945eb7

Merge branch 'dev/kmeans_csr_upd' of https://github.com/Vika-F/daal i…

a65a9f0

…nto dev/kmeans_csr_upd

clang-format

d42b1a3

Vika-F marked this pull request as ready for review November 12, 2024 08:51

Vika-F requested review from Alexsandruss, samir-nasibli and Alexandr-Solovev as code owners November 12, 2024 08:51

Vika-F requested review from david-cortes-intel and ethanglaser and removed request for samir-nasibli and Alexsandruss November 12, 2024 08:52

david-cortes-intel reviewed Nov 12, 2024

View reviewed changes

cpp/oneapi/dal/algo/kmeans/backend/gpu/kernels_fp.hpp Outdated Show resolved Hide resolved

Vika-F requested a review from avolkov-intel November 12, 2024 09:12

Remove debug code

90d9b9a

ahuber21 reviewed Nov 13, 2024

View reviewed changes

Apply review comments

96342ee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace internally developed CSR GEMM with a call to MKL #2959

Replace internally developed CSR GEMM with a call to MKL #2959

Vika-F commented Oct 28, 2024 •

edited

Loading

Vika-F commented Oct 31, 2024

Vika-F commented Nov 4, 2024

Vika-F commented Nov 6, 2024

Vika-F commented Nov 11, 2024

ahuber21 Nov 13, 2024

ahuber21 Nov 13, 2024

ahuber21 Nov 13, 2024

ahuber21 commented Nov 13, 2024


		const auto distances_ptr = distances.get_data();
		// Workaround. Sparse gemm cannot accept transposed dense inputs in oneMKL 2025.0.

	// by the number of observations in each centroid
	// by the number of observations in each centroid

		#define INSTANTIATE_SINGLE_NODE(F, M, T) \
		template struct ONEDAL_EXPORT infer_ops_dispatcher<dal::detail::data_parallel_policy, F, M, T>;

Replace internally developed CSR GEMM with a call to MKL #2959

Are you sure you want to change the base?

Replace internally developed CSR GEMM with a call to MKL #2959

Conversation

Vika-F commented Oct 28, 2024 • edited Loading

Vika-F commented Oct 31, 2024

Vika-F commented Nov 4, 2024

Vika-F commented Nov 6, 2024

Vika-F commented Nov 11, 2024

ahuber21 Nov 13, 2024

Choose a reason for hiding this comment

ahuber21 Nov 13, 2024

Choose a reason for hiding this comment

ahuber21 Nov 13, 2024

Choose a reason for hiding this comment

ahuber21 commented Nov 13, 2024

Vika-F commented Oct 28, 2024 •

edited

Loading