-
-
Notifications
You must be signed in to change notification settings - Fork 857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modernize and optimize pixel format operations across platforms. #2645
Conversation
/// <param name="numBytes">The number of bytes to shift by.</param> | ||
/// <returns>The <see cref="Vector128{Byte}"/>.</returns> | ||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
public static Vector128<byte> ShiftRightBytesInVector(Vector128<byte> value, [ConstantExpected(Max = (byte)15)] byte numBytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's something not right in one of the methods below on Arm64.
@tannergooding @saucecontrol Would either of you be able to do a quick readthrough and set me right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Figured it out. It was left shift. I'd forgotten to offset.
/// Calculates <paramref name="x"/> % 4 | ||
/// </summary> | ||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
public static nuint Modulo4(nuint x) => x & 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that x % 4
should already get optimized to x & 3
by the JIT since x
is nuint
(and therefore definitely unsigned)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks yeah, I just kept it the same for consistency.
ref Vector512<float> vs0 = ref Unsafe.Add(ref sourceBase, i); | ||
ref Vector512<float> vd0 = ref Unsafe.Add(ref destinationBase, i); | ||
|
||
vd0 = Avx.Permute(vs0, control); | ||
Unsafe.Add(ref vd0, 1) = Avx.Permute(Unsafe.Add(ref vs0, 1), control); | ||
Unsafe.Add(ref vd0, 2) = Avx.Permute(Unsafe.Add(ref vs0, 2), control); | ||
Unsafe.Add(ref vd0, 3) = Avx.Permute(Unsafe.Add(ref vs0, 3), control); | ||
vd0 = Vector512Utilities.Shuffle(vs0, control); | ||
Unsafe.Add(ref vd0, (nuint)1) = Vector512Utilities.Shuffle(Unsafe.Add(ref vs0, (nuint)1), control); | ||
Unsafe.Add(ref vd0, (nuint)2) = Vector512Utilities.Shuffle(Unsafe.Add(ref vs0, (nuint)2), control); | ||
Unsafe.Add(ref vd0, (nuint)3) = Vector512Utilities.Shuffle(Unsafe.Add(ref vs0, (nuint)3), control); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth noting that you could do this like the following:
Vector512Utilities.Shuffle(Vector512.LoadUnsafe(ref sourceBase, i + 0), control).StoreUnsafe(ref destinationBase, i + 0).
Vector512Utilities.Shuffle(Vector512.LoadUnsafe(ref sourceBase, i + 1), control).StoreUnsafe(ref destinationBase, i + 1).
Vector512Utilities.Shuffle(Vector512.LoadUnsafe(ref sourceBase, i + 2), control).StoreUnsafe(ref destinationBase, i + 2).
Vector512Utilities.Shuffle(Vector512.LoadUnsafe(ref sourceBase, i + 3), control).StoreUnsafe(ref destinationBase, i + 3).
It shouldn't result in any codegen differences, but avoids needing to manipulate the byref
and uses direct hardware intrinsic APIs, rather than the Unsafe helpers, so can be easier to read/understand at least IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sourceBase
is Vector512<float>
so Vector512.LoadUnsafe(ref sourceBase, i + 0)
would yield Vector512<Vector512<float>>
. I could use float
as the base ref type but that complicates the offsetting which confuses me and causes me to make mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just left a couple comments on newer APIs that might help with readability
/// </summary> | ||
/// <returns>The <see cref="PixelTypeInfo"/>.</returns> | ||
#pragma warning disable CA1000 | ||
static abstract PixelTypeInfo GetPixelTypeInfo(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious. Is there a specific reason to kick it from the interface and only have it in the implementations? Probably it was never called from the interface and only from specific implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is just temporary. I had to hack to get the solution to build because we were misusing ApproximateFloatComparer in the tests.
src/ImageSharp.ruleset
Outdated
@@ -1,6 +1,7 @@ | |||
<?xml version="1.0" encoding="utf-8"?> | |||
<RuleSet Name="ImageSharp" ToolsVersion="17.0"> | |||
<Include Path="..\shared-infrastructure\sixlabors.ruleset" Action="Default" /> | |||
<Rules AnalyzerId="StyleCop.Analyzers" RuleNamespace="StyleCop.Analyzers"> | |||
<Rules AnalyzerId="Microsoft.CodeAnalysis.CSharp.Features" RuleNamespace="Microsoft.CodeAnalysis.CSharp.Features"> | |||
<Rule Id="IDE0290" Action="None" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also be suppressible via .editorconfig
using csharp_style_prefer_primary_constructors = false:none
(or a different severity level if you want to block use of primary constructors altogether, such as silent
, suggestion
, warning
, or error
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Yeah, I’ll do that upstream. Not sold on primary constructors. The syntax doesn’t sit right with me.
@@ -492,7 +468,7 @@ private void UncompressRle4(BufferedReadStream stream, int w, Span<byte> buffer, | |||
int max = cmd[1]; | |||
int bytesToRead = (int)(((uint)max + 1) / 2); | |||
|
|||
Span<byte> run = bytesToRead <= 128 ? scratchBuffer.Slice(0, bytesToRead) : new byte[bytesToRead]; | |||
Span<byte> run = bytesToRead <= 128 ? scratchBuffer[..bytesToRead] : new byte[bytesToRead]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious if there is a Analyzer to enforce this style. I know that Rider has a editoconfig setting for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://learn.microsoft.com/en-us/dotnet/fundamentals/code-analysis/style-rules/ide0057
You can find most of the language style rules here: https://learn.microsoft.com/en-us/dotnet/fundamentals/code-analysis/style-rules/language-rules
@@ -185,7 +185,7 @@ public ImageInfo Identify(BufferedReadStream stream, CancellationToken cancellat | |||
{ | |||
uint frameCount = 0; | |||
ImageFrameMetadata? previousFrame = null; | |||
List<ImageFrameMetadata> framesMetadata = new(); | |||
List<ImageFrameMetadata> framesMetadata = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be enforced by this editorconfig rule https://learn.microsoft.com/en-us/dotnet/fundamentals/code-analysis/style-rules/ide0028?pivots=dotnet-8-0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want I can look at enforcing this style in a seperate pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be nice thanks. The IDE has been making the suggestion for me, that and range usage also.
Vector4 ToVector4(); | ||
/// <param name="source">The <see cref="Abgr32"/> value.</param> | ||
/// <returns>The <typeparamref name="TSelf"/>.</returns> | ||
static abstract TSelf FromAbgr32(Abgr32 source); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is creating a copy the JIT may not be able(?) to eliminate, which can hurt performance, ie. I expect pixelSpan[i] = TPixel.FromArgb(source)
to be slower than pixelSpan[i].FromArgb(source)
.
Have you benchmarked the impact on (non-optimized) batch pixel span conversion before/after?
Edit: The issue might be nonexistant for 32/64bit pixel types but present for stuff like RgbaVector
. We have to benchmark and/or check codegen to see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the benchmarks in PixelConversion_ConvertFromRgba32
to test the static byval implementation vs byref and byval for TestRgba
, TestArgb
, and TestRgbaVector
. All benchmarks look good with the JIT doing a great job of normalizing the differences.
BenchmarkDotNet v0.13.10, Windows 11 (10.0.22631.3007/23H2/2023Update/SunValley3)
11th Gen Intel Core i7-11370H 3.30GHz, 1 CPU, 8 logical and 4 physical cores
.NET SDK 8.0.200-preview.23624.5
[Host] : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX2
DefaultJob : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX2
PixelConversion_ConvertFromRgba32_Compatible
Method | Count | Mean | Error | StdDev | Ratio |
---|---|---|---|---|---|
ByRef | 256 | 103.4 ns | 0.52 ns | 0.46 ns | 1.00 |
ByVal | 256 | 103.3 ns | 1.48 ns | 1.38 ns | 1.00 |
StaticByVal | 256 | 104.0 ns | 0.36 ns | 0.30 ns | 1.01 |
FromBytes | 256 | 201.8 ns | 1.30 ns | 1.15 ns | 1.95 |
Inline | 256 | 106.6 ns | 0.40 ns | 0.34 ns | 1.03 |
ByRef | 2048 | 771.5 ns | 3.68 ns | 3.27 ns | 1.00 |
ByVal | 2048 | 769.7 ns | 3.39 ns | 2.83 ns | 1.00 |
StaticByVal | 2048 | 773.2 ns | 3.95 ns | 3.50 ns | 1.00 |
FromBytes | 2048 | 1,555.3 ns | 9.24 ns | 8.19 ns | 2.02 |
Inline | 2048 | 799.5 ns | 5.91 ns | 4.93 ns | 1.04 |
PixelConversion_ConvertFromRgba32_Permuted_RgbaToArgb
Method | Count | Mean | Error | StdDev | Ratio | RatioSD |
---|---|---|---|---|---|---|
ByRef | 256 | 203.48 ns | 3.318 ns | 3.104 ns | 1.00 | 0.00 |
ByVal | 256 | 201.46 ns | 2.242 ns | 1.872 ns | 0.99 | 0.02 |
StaticByVal | 256 | 201.45 ns | 0.791 ns | 0.701 ns | 0.99 | 0.02 |
FromBytes | 256 | 200.76 ns | 1.365 ns | 1.140 ns | 0.99 | 0.01 |
InlineShuffle | 256 | 221.65 ns | 2.104 ns | 1.968 ns | 1.09 | 0.02 |
PixelConverter_Rgba32_ToArgb32 | 256 | 26.23 ns | 0.277 ns | 0.231 ns | 0.13 | 0.00 |
ByRef | 2048 | 1,561.54 ns | 11.208 ns | 8.751 ns | 1.00 | 0.00 |
ByVal | 2048 | 1,554.26 ns | 9.607 ns | 8.517 ns | 1.00 | 0.01 |
StaticByVal | 2048 | 1,562.48 ns | 8.937 ns | 8.360 ns | 1.00 | 0.01 |
FromBytes | 2048 | 1,552.68 ns | 7.445 ns | 5.812 ns | 0.99 | 0.01 |
InlineShuffle | 2048 | 1,711.28 ns | 7.559 ns | 6.312 ns | 1.10 | 0.01 |
PixelConverter_Rgba32_ToArgb32 | 2048 | 94.43 ns | 0.363 ns | 0.322 ns | 0.06 | 0.00 |
PixelConversion_ConvertFromRgba32_RgbaToRgbaVector
Method | Count | Mean | Error | StdDev | Ratio | RatioSD |
---|---|---|---|---|---|---|
ByRef | 256 | 448.5 ns | 4.86 ns | 4.06 ns | 1.00 | 0.00 |
ByVal | 256 | 447.0 ns | 1.55 ns | 1.21 ns | 1.00 | 0.01 |
StaticByVal | 256 | 447.4 ns | 1.67 ns | 1.30 ns | 1.00 | 0.01 |
ByRef | 2048 | 3,577.7 ns | 53.80 ns | 47.69 ns | 1.00 | 0.00 |
ByVal | 2048 | 3,590.5 ns | 43.59 ns | 36.40 ns | 1.00 | 0.02 |
StaticByVal | 2048 | 3,604.6 ns | 16.19 ns | 14.36 ns | 1.01 | 0.01 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antonfirsov Are there any additional concerns here or can I move on? I'm keen to keep moving.
@@ -14,14 +14,14 @@ namespace SixLabors.ImageSharp.PixelFormats; | |||
public interface IPixel<TSelf> : IPixel, IEquatable<TSelf> | |||
where TSelf : unmanaged, IPixel<TSelf> | |||
{ | |||
#pragma warning disable CA1000 // Do not declare static members on generic types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to open an issue for this on dotnet/roslyn-analyzers. Declaring static members on generic types is kind-of the purpose of static abstracts in interfaces
😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we have an issue already: dotnet/roslyn-analyzers#6424
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reason that exists is to avoid unnecessary binary size bloat for when you eg. accidentally declare members that don't actually depend on type parameters at all. We had a few of those in ImageSharp, where the bokeh blur processor had a couple static fields with some readonly buffers (like, constants), and a bunch of pixel agnostic processing methods. We saved a nice amount of binary size moving those out to a non-generic type. It would be nice if the analyzer was smarter and just avoided warning in cases where you do in fact use the type parameter(s), but kept warning otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, the analyzer exists not because of binary bloat but because MyClass.Foo(x)
(given Foo<T>(T x)
) works with generic inference while MyClass<T>.Foo(x)
does not and since static members historically couldn't be abstract
it didn't make sense to ever have them there.
So the analyzer really just hasn't been updated to account for the existence of static abstract in interfaces
. Once it's been made aware, I'd expect it suggest such members be explicitly implemented on the type and to defer to the MyClass.Foo<T>(...)
implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve upvoted on the linked issue anyway
Let's merge this. It's awesome. |
Prerequisites
Description
Fixes #2232 and #594
Optimizes and modernizes SIMD Vector4 packing operations.I'm leaving this for a follow up PRI don't expect anyone to reasonably find the time to analyze this line-by-line as it's a massive piece of work. However, changes on the whole are simplistic and the developer benefit massive.
API change Elevator Pitch
The abomination that is.
Becomes, as nature intended.
All other conversion methods follow suit where applicable.