-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DenseMap perfomance 'old' intel CPU #43
Comments
Have to say, I love your resilience and dedication :) The main idea is to use vectorization and make sure it runs well on any CPU. This scalar method works fine on older CPUs, but it can’t match the performance of vectorization on newer ones. |
It seems to me that simd on old processors looks like a marketing ploy to sell them to ordinary people at a high price yes it works but with a lot of restrictions and getting the code to actually be more productive is not so easy it’s like with a gpu, load the data and process it quickly, but if you need to constantly download new ones, then the performance is hiding somewhere It seems to me that the problem is that in our case a byte array is processed, and not an array of words, I’ll try to check this hypothesis |
DenseMapCPU, jumpDistance += 4; // Increase the jump distance by 16 to probe the next cluster. Reducing the distance for EmplaceCPU gives these results
|
I've a feeling that the performance hit of loading unaligned data on older CPUs is huge. I was considering using Vector128.ReadAligned() where each index corresponds to the nearest group of 16( less or equal to index). |
the problem is not in array alignment, I found out what code on older processors slows down the insertion of new elements, but so far I don’t understand why (:
|
Mind benchmark the read-vector-aligned branch :) Wondering if there is any difference. Anyways your benchmark looks promising. |
.net aligns all memory to 8 bytes, but we need 16, but this is not enough (: we also need to align index to 16 bytes _lengthMinusOne = (Length-1) & 0xFFFFFFF0 I wrote a test suite just for the Add function (80_000_000)
|
I don’t know why this happens(: I think it’s a .net bug
|
simplified the code a bit
|
Wondering what will happen if you increase the load to 80-85% |
easy, but I no longer have a 9700X(: collected for a friend only i7-10700K |
|
|
can you run benchmark on intel 12xxx? https://github.com/check4game/DotNetBug2 I want to send the problem to MS |
Care to explain the problem first? Not really sure what im looking at. Trying to find some time :) |
the problem is the same :) what I started in this topic
AddOptimal 1413ms, it's problem! On modern CPUs AddOptimal is faster than AddOptimalFix1 or AddOptimalFix2 var emptyMask = Vector128.Equals(source, _emptyBucketVector).ExtractMostSignificantBits();
// Check for empty buckets in the current vector.
if (emptyMask != 0)
{
index += (uint)BitOperations.TrailingZeroCount(emptyMask);
Find(_controlBytes, index) = h2;
Find(_entries, index) = key;
Count++;
return;
} AddOptimalFix1 567ms var emptyMask = Vector128.Equals(source, _emptyBucketVector).ExtractMostSignificantBits();
// Check for empty buckets in the current vector.
if (emptyMask != 0)
{
if (_emptyBucket != Find(_controlBytes, index))
{
index += (uint)BitOperations.TrailingZeroCount(emptyMask);
}
Find(_controlBytes, index) = h2;
Find(_entries, index) = key;
Count++;
return;
} AddOptimalFix2 570ms var emptyMask = Vector128.Equals(source, _emptyBucketVector).ExtractMostSignificantBits();
// Check for empty buckets in the current vector.
if (emptyMask != 0)
{
while (_emptyBucket != Find(_controlBytes, index)) index++;
Find(_controlBytes, index) = h2;
Find(_entries, index) = key;
Count++;
return;
} |
I don't have a modern processor at the moment (: |
Not having to time to analyze what you did, but here are the results
|
please add BenchmarkDotNet header with info |
it's vaild? BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4602/23H2/2023Update/SunValley3) InvocationCount=1 IterationCount=5 LaunchCount=1 |
It is valid
Op wo 15 jan 2025, 19:18 schreef check4game ***@***.***>:
… it's vaild?
BenchmarkDotNet v0.14.0, Windows 11
(10.0.22631.4602/23H2/2023Update/SunValley3)
12th Gen Intel Core i5-12500H, 1 CPU, 16 logical and 12 physical cores
.NET SDK 9.0.100-rc.2.24474.11
[Host] : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX2
Job-UIXUPV : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX2
InvocationCount=1 IterationCount=5 LaunchCount=1
RunStrategy=Monitoring UnrollFactor=1 WarmupCount=2
—
Reply to this email directly, view it on GitHub
<#43 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD4KISNSA26LGEHZ4ANCRBT2K2Q6XAVCNFSM6AAAAABVAARXTKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJTGY2DGMJYGI>
.
You are receiving this because you commented.Message ID: <Wsm2110/Faster.
***@***.***>
|
thanks |
Releasing a Hybrid solution soonish... with some promising results BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4751/23H2/2023Update/SunValley3) InvocationCount=1 IterationCount=5 LaunchCount=1
|
Hi, again:)
MyVer.zip
The text was updated successfully, but these errors were encountered: