README improvement #125

carstenbauer · 2023-11-08T09:02:19Z

I've tried to clarify the README and have updated/rerun the examples on Julia 1.9 on a linux machine with 8 threads.

The sections about threadlocal and disabling polyester threads have not gotten much attention (but benchmarks in the latter have been rerun as well).

(I think the package needs an overhaul but this is beyond this PR which just tries to communicate the status quo a bit better.)

chriselrod

Thanks, looks great!

chriselrod · 2023-11-08T11:04:43Z

README.md

+
+function axpy_polyester!(y, a, x)
+    @batch for i in eachindex(y,x)
+        y[i] = a * x[i] + y[i]


Why no muladd?
Because Julia takes a strong stance against unexpectedly giving people better performance and accuracy, I'm pro-normalizing-muladd.

Zen CPUs probably don't muladd much faster than separate mul and add, because they have 2 adders and 2 multiply units that can also fma.
For the muladd version on Zen, the adders are idle, so throughput is the same w/ respect to the backend (but fewer instructions and uops with muladd is friendlier to the frontend).

I'm not strongly opposed to using muladd, I just don't think it matters here. The purpose of this example is to demonstrate how to use @batch. Optimizing the axpy kernel itself is an orthogonal issue IMO and only a distraction.

That's fine. FWIW, even on my intel system, muladd wasn't faster. axpy isn't compute bound.

chriselrod · 2023-11-08T11:11:53Z

README.md

+
+## Important Notes
+
+* `Polyester.@batch` moves arrays to threads by turning them into [StrideArraysCore.PtrArray](https://github.com/JuliaSIMD/StrideArraysCore.jl)s. This means that under an `@batch` slices will create `view`s by default(!). You may want to start Julia with `--check-bounds=yes` while debugging.


No automatic @inbounds as of StrideArraysCore.jl 0.5
but using --check-bounds=yes is good advice, anyway.
Especially because @batch adds @inbounds.

Not sure I understand.

Will arrays this be turned into PtrArray?

Will slices still default to view?

I'll add a note about @batch automatically adding @inbounds.

(I have to say I'm not a fan of adding automatic @inbounds and defaulting to views inside of @batch. For me, the latter is about low-overhead threading. The former two are optimizations that a user can add themself if they want to. But it's your package.)

chriselrod · 2023-11-08T11:15:06Z

I think the package needs an overhaul but this is beyond this PR which just tries to communicate the status quo a bit better.

What do you have in mind?

codecov · 2023-11-08T11:17:32Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (f29a42e) 91.13% compared to head (7832539) 91.13%.

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #125   +/-   ##
=======================================
  Coverage   91.13%   91.13%           
=======================================
  Files           3        3           
  Lines         451      451           
=======================================
  Hits          411      411           
  Misses         40       40

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

carstenbauer · 2023-11-09T12:34:49Z

What do you have in mind?

Apart from what I said above - I'd personally drop auto @inbounds and view behavior - threadlocal is a misnomer and very much broken. Also per is somewhat of a misnomer to me. Additionally, the disabling of polyester threads doesn't show the timing differences anymore that it used to which warrants a reconsideration of this section/feature. Statements about nesting should be generally revisited given that @threads defaults to :dynamic these days, which is nestable with ok-ish performance.

chriselrod · 2023-11-10T23:24:15Z

Thanks!

I don't really want to change PtrArray. I do not like copy-by-default, so even though it is now documented that slices of AbstractArrays are supposed to copy, maybe I'll make PtrArray no longer subtype AbstractArray.
We could drop the StrideArraysCore dependency and just make our own PtrArray type for being able to pass it to the tasks.

carstenbauer added 2 commits November 7, 2023 11:22

first improvements

18951ac

gs

7832539

chriselrod reviewed Nov 8, 2023

View reviewed changes

chriselrod merged commit 0952d11 into JuliaSIMD:master Nov 10, 2023
30 of 31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README improvement #125

README improvement #125

carstenbauer commented Nov 8, 2023

chriselrod left a comment

chriselrod Nov 8, 2023

carstenbauer Nov 9, 2023

chriselrod Nov 10, 2023

chriselrod Nov 8, 2023

carstenbauer Nov 9, 2023

chriselrod commented Nov 8, 2023

codecov bot commented Nov 8, 2023

carstenbauer commented Nov 9, 2023

chriselrod commented Nov 10, 2023


		## Important Notes

		* `Polyester.@batch` moves arrays to threads by turning them into [StrideArraysCore.PtrArray](https://github.com/JuliaSIMD/StrideArraysCore.jl)s. This means that under an `@batch` slices will create `view`s by default(!). You may want to start Julia with `--check-bounds=yes` while debugging.

README improvement #125

README improvement #125

Conversation

carstenbauer commented Nov 8, 2023

chriselrod left a comment

Choose a reason for hiding this comment

chriselrod Nov 8, 2023

Choose a reason for hiding this comment

carstenbauer Nov 9, 2023

Choose a reason for hiding this comment

chriselrod Nov 10, 2023

Choose a reason for hiding this comment

chriselrod Nov 8, 2023

Choose a reason for hiding this comment

carstenbauer Nov 9, 2023

Choose a reason for hiding this comment

chriselrod commented Nov 8, 2023

codecov bot commented Nov 8, 2023

Codecov Report

carstenbauer commented Nov 9, 2023

chriselrod commented Nov 10, 2023