-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize HTTP method lookup in routing jumptable #51803
Conversation
- Converting Dictionary to Frozen in the ctor of HttpMethodDictionaryPolicyJumpTable. - Adding performance tests for the ctor. dotnet#47569
Thanks for your PR, @ladeak. Someone from the team will get assigned to your PR shortly and we'll get it reviewed. |
@BrennanConroy I am not sure if the change worth merging based on the significant tradeoff at startup time. Does EDIT: thinking about it further, EDIT 2: original table removed, see the latest results at a later comment. |
So you almost doubled the perf by using a switch instead? Whereas frozen dictionary "only" got ~50% faster? Edit: This actually reminded me of a change to use perfect hashing for http methods: |
@BrennanConroy it was simply using 9 |
Using switch-expressions as committed.
I would like to re-use somehow the code in Kestrel (#44096), but not sure what common place I may move it to, and having reference to Kestrel does not seem right. |
@BrennanConroy and @adityamandaleeka , could you review the latest changes? |
@@ -7,11 +7,104 @@ namespace Microsoft.AspNetCore.Routing.Matching; | |||
|
|||
internal sealed class HttpMethodDictionaryPolicyJumpTable : PolicyJumpTable | |||
{ | |||
internal struct HttpMethodsJumpTable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if the struct could be readonly. Could the dictionary of destinations be passed into the ctor? The values could be set in the ctor, the struct made readonly, and struct fields readonly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, this could be a class. Did you make it a struct to avoid the allocation? Jump tables probably last the lifetime of an app. Allocations aren't a concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go all the way with optimizing this - https://github.com/dotnet/aspnetcore/pull/51803/files#r1382732662 - then this type is probably better as a class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I made it a struct because I did not want allocation. I can make it a class or a readonly struct too, as I see you prefer it as a class rather.
if (destinations != null) | ||
{ | ||
foreach (var item in destinations) | ||
{ | ||
_knownHttpMethodDestinations.Set(item.Key, item.Value); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the time, the destinations/corsPreflightDestinations dictionaries will only contain known destinations. Keeping the dictionary around becomes unnecessary. It would reduce memory usage to never save the dictionary on the jump table.
I wonder if HttpMethodsJumpTable
could be refactored into something that the builder uses directly here and is then passed to the jump table. There is a set method that either sets the value directly if it is a known header value or sets it into a lazily created dictionary (_unknownHttpMethodDestinations
) if it is unknown. That would avoid creating a dictionary during startup most of the time.
Reduce allocations on startup
Reduce overall memory usage
Improve route matching performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point. Let me have a look and update the PR.
I have updated the PR to take a direction as suggested in previous comments. The "known jump table" is now created in the BuildPolicy method. I update the benchmarks here. However, please note that I have switched off the support preflight flag, hence a slight improvement in all benchmarks below compared to the ones previously commented on in this post. HttpMethodPolicyJumpTableBenchmarkBEFOREFor transparency (and because the ctor [GlobalSetup]
public void Setup()
{
_destinations.Add(HttpMethods.Connect, 1);
_destinations.Add(HttpMethods.Delete, 2);
_destinations.Add(HttpMethods.Head, 3);
_destinations.Add(HttpMethods.Get, 4);
_destinations.Add(HttpMethods.Options, 5);
_destinations.Add(HttpMethods.Patch, 6);
_destinations.Add(HttpMethods.Post, 7);
_destinations.Add(HttpMethods.Put, 8);
_destinations.Add(HttpMethods.Trace, 9);
_destinations.Add("MERGE", 10);
_dictionaryJumptable = new HttpMethodDictionaryPolicyJumpTable(0, _destinations, -1, _corsDestinations);
_singleEntryJumptable = new HttpMethodSingleEntryPolicyJumpTable(0, HttpMethods.Get, -1, supportsCorsPreflight: false, -1, 2);
_httpContext = new DefaultHttpContext();
_httpContext.Request.Method = TestHttpMethod;
}
AFTER
HttpMethodMatcherPolicyBenchmarkBEFORE
AFTER
@JamesNK is this the direction you would like to go? |
Mostly. I played around with the code myself and made some changes:
See 37e50c8 What do you think? Feel free to cherry-pick into this PR. |
Should probably rename |
Refactor HttpMethodDestinationsLookup
Cherry-picked, renamed, and I will re-run the tests today and update this comment. These look much better:
and the jump table is more or less in par too:
|
If I understand correctly the error why the build failed is a known error? Could you re-trigger the build? |
FYI, it looks good to me. I want someone else on the team to take a look and give a thumbs-up before merging. |
_ => (null, 0), | ||
}; | ||
|
||
if (matchedMethod is not null && method.Equals(matchedMethod, StringComparison.OrdinalIgnoreCase)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we can speed-up the method.Equals(matchedMethod, StringComparison.OrdinalIgnoreCase)
since we know we are targeting a well-known method. Maybe we can leverage the fact that it's very likely ASCII on the header to perform two memory comparisons and or the results for the individual characters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to compare with upper-case and lower-case, OR the results and check if all values set to 1? I actually wonder if there is something like this in System.Text.Ascii already that I could use straight away. And I assume we cannot expect it as ASCII, so if the match fails, fall back to regular string comparison, right?
- I will implement it, measure it and come back to you with the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there is potential to speed this up:
- One option is to generate a trie to compare the strings. That's probably overkill.
- A simpler optimization would be to have a fast path that assumes upper case method (which basically everyone uses) and compares against hardcoded chars using
SequenceEqual
. If it doesn't match then fallback to string.Equals + OrdinalIgnoreCase.
However, this PR has a nice improvement and sets up good infrastructure. Merge now and iterate in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have the two versions of the method in a buffer, (upper and lowercase), compare against it and or the results. If done with simd it's about 4 ops.
Disclaimer that I haven't done it myself, but I know in other places we use things like SWAR (Simd within a register) so it's likely useful to apply it here too, rely on SIMD instructions directly or use any specialized helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy to come explore @javiercn , I have written some vectorized code in the past, let me see if I can do something "simple". But Ascii type also uses vectorization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have the two versions of the method in a buffer, (upper and lowercase), compare against it and or the results. If done with simd it's about 4 ops.
It would be a bit more complicated because mix case is valid, e.g. Post
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JamesNK and @javiercn I promised to come back to this thread. TLDR; none of the above suggestions were performing better to what is already merged.
Using ASCII
Relevant code:
if (matchedMethod is not null && (System.Text.Ascii.EqualsIgnoreCase(matchedMethod, method) || method.Equals(matchedMethod, StringComparison.OrdinalIgnoreCase)))
{
return destination;
}
ASCII Results
| Method | TestHttpMethod | Mean | Error | StdDev | Median | Op/s | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------------------- |--------------- |----------:|----------:|----------:|----------:|--------------:|------:|------:|------:|----------:|
| DictionaryPolicyJumpTable | GET | 6.182 ns | 0.0793 ns | 0.0703 ns | 6.147 ns | 161,768,263.8 | - | - | - | - |
| SingleEntryPolicyJumpTable | GET | 2.721 ns | 0.0557 ns | 0.1344 ns | 2.674 ns | 367,483,825.2 | - | - | - | - |
| DictionaryPolicyJumpTable | Merge | 12.403 ns | 0.1286 ns | 0.1140 ns | 12.396 ns | 80,624,929.0 | - | - | - | - |
| SingleEntryPolicyJumpTable | Merge | 3.363 ns | 0.0221 ns | 0.0196 ns | 3.363 ns | 297,385,268.2 | - | - | - | - |
| DictionaryPolicyJumpTable | POST | 6.330 ns | 0.0457 ns | 0.0405 ns | 6.336 ns | 157,968,543.3 | - | - | - | - |
| SingleEntryPolicyJumpTable | POST | 3.442 ns | 0.0702 ns | 0.0721 ns | 3.444 ns | 290,548,813.4 | - | - | - | - |
Using SequenceEquals Ordinal comparison
Relevant code:
if (matchedMethod is not null && (method.AsSpan().SequenceEqual(matchedMethod)
|| method.Equals(matchedMethod, StringComparison.OrdinalIgnoreCase)))
{
return destination;
}
SequenceEqual Results
| Method | TestHttpMethod | Mean | Error | StdDev | Op/s | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------------------- |--------------- |----------:|----------:|----------:|--------------:|------:|------:|------:|----------:|
| DictionaryPolicyJumpTable | GET | 5.325 ns | 0.0951 ns | 0.0794 ns | 187,810,373.8 | - | - | - | - |
| SingleEntryPolicyJumpTable | GET | 3.136 ns | 0.0436 ns | 0.0387 ns | 318,917,169.1 | - | - | - | - |
| DictionaryPolicyJumpTable | Merge | 12.382 ns | 0.1478 ns | 0.1383 ns | 80,761,722.6 | - | - | - | - |
| SingleEntryPolicyJumpTable | Merge | 3.360 ns | 0.0353 ns | 0.0330 ns | 297,646,602.7 | - | - | - | - |
| DictionaryPolicyJumpTable | POST | 4.894 ns | 0.0540 ns | 0.0505 ns | 204,328,749.4 | - | - | - | - |
| SingleEntryPolicyJumpTable | POST | 3.343 ns | 0.0078 ns | 0.0065 ns | 299,173,275.5 | - | - | - | - |
Vector
A slightly different implementation I take more optimistic approach that only compares with upper-case vectorized, and if no match falls back to regular approach. The problem with vectors is that they are a bit more difficult to create when the size of the vector is larger than the input.
So what I ended up doing is creating vectorized versions of the known HTTP verbs and a vector mask corresponding to their length.
// Set values to vConnect, vDelete, vGet etc. fields in ctor
vConnect = Vector256.LoadUnsafe<ushort>(ref Unsafe.As<char, ushort>(ref MemoryMarshal.GetReference(HttpMethods.Connect.AsSpan()))) & vMask7;
vDelete = Vector256.LoadUnsafe<ushort>(ref Unsafe.As<char, ushort>(ref MemoryMarshal.GetReference(HttpMethods.Delete.AsSpan()))) & vMask6;
vGet = Vector256.LoadUnsafe<ushort>(ref Unsafe.As<char, ushort>(ref MemoryMarshal.GetReference(HttpMethods.Get.AsSpan()))) & vMask3;
// ...
vMask3 = new Vector<ushort>(new ushort[16] { ushort.MaxValue, ushort.MaxValue, ushort.MaxValue, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }).AsVector256();
vMask4 = new Vector<ushort>(new ushort[16] { ushort.MaxValue, ushort.MaxValue, ushort.MaxValue, ushort.MaxValue, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }).AsVector256();
//...
}
public int GetDestination(string method)
{
if (Vector256.IsHardwareAccelerated)
{
(var vMatched, var vMask, var expectedLength, var destination) = (method[0]) switch
{
'C' => (vConnect, vMask7, 7, _connectDestination),
'D' => (vDelete, vMask6, 6, _deleteDestination),
'G' => (vGet, vMask3, 3, _getDestination),
'H' => (vHead, vMask4, 4, _headDestination),
'O' => (vOptions, vMask7, 7, _optionsDestination),
'P' => method.Length switch
{
3 => (vPut, vMask3, 3, _putDestination),
4 => (vPost, vMask4, 4, _postDestination),
_ => (vPatch, vMask5, 5, _patchDestination),
},
'T' => (vTrace, vMask5, 5, _traceDestination),
_ => (Vector256<ushort>.Zero, Vector256<ushort>.Zero, 0, 0),
};
if (expectedLength == method.Length)
{
ref var source = ref Unsafe.As<char, ushort>(ref MemoryMarshal.GetReference(method.AsSpan()));
var vSource = Vector256.LoadUnsafe<ushort>(ref source);
vSource = vSource & vMask;
if (Vector256.Equals(vSource, vMatched) == Vector256<ushort>.AllBitsSet)
{
return destination;
}
}
}
// else fallback to non-vectorized approach
}
I was considering to implement the lower-case comparison too but given the performance tests indicate this solution is already slower to the string.Equals
even when the input is upper cased, extending it won't get any faster.
Vectorized Results:
| Method | TestHttpMethod | Mean | Error | StdDev | Median | Op/s | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------------------- |--------------- |----------:|----------:|----------:|----------:|--------------:|------:|------:|------:|----------:|
| DictionaryPolicyJumpTable | GET | 4.669 ns | 0.0417 ns | 0.0369 ns | 4.669 ns | 214,196,679.8 | - | - | - | - |
| SingleEntryPolicyJumpTable | GET | 3.119 ns | 0.0193 ns | 0.0181 ns | 3.116 ns | 320,565,879.3 | - | - | - | - |
| DictionaryPolicyJumpTable | Merge | 13.416 ns | 0.1905 ns | 0.1688 ns | 13.446 ns | 74,538,092.7 | - | - | - | - |
| SingleEntryPolicyJumpTable | Merge | 3.375 ns | 0.0281 ns | 0.0249 ns | 3.367 ns | 296,253,388.9 | - | - | - | - |
| DictionaryPolicyJumpTable | POST | 4.382 ns | 0.0920 ns | 0.1682 ns | 4.292 ns | 228,209,040.8 | - | - | - | - |
| SingleEntryPolicyJumpTable | POST | 3.364 ns | 0.0283 ns | 0.0237 ns | 3.374 ns | 297,300,993.8 | - | - | - | - |
Thanks for this work @ladeak ! We'd welcome more contributions, if there's something that interests you. |
Hi @danmoseley. It looks like you just commented on a closed PR. The team will most probably miss it. If you'd like to bring something important up to their attention, consider filing a new issue and add enough details to build context. |
(Not?) Using FrozenDictionary in HttpMethodDictionaryPolicyJumpTable
Commit 1-2:
After discussions Commit 3:
Description
My performance measurements indicate a quite significant increase in the startup time as well as allocations (vs the decrease in DictionaryPolicyJumpTable which is the goal of the PR).
Before:
After (using FrozenDictionary):
It is unclear for me if this change brings enough value to be merged.
After (using switch expression):
cc. @BrennanConroy
Fixes #47569