Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize HTTP method lookup in routing jumptable #51803

Merged
merged 8 commits into from
Nov 15, 2023

Conversation

ladeak
Copy link
Contributor

@ladeak ladeak commented Nov 1, 2023

(Not?) Using FrozenDictionary in HttpMethodDictionaryPolicyJumpTable

Commit 1-2:

  • Converting Dictionary to Frozen in the ctor of HttpMethodDictionaryPolicyJumpTable.
  • Adding performance tests for the ctor.

After discussions Commit 3:

  • Based on performance measurement commit 3 introduces knowledge on known HTTP methods and removes frozen dictionary. It is using a switch expression (like .NET runtime) to match the known verbs without allocation.

Description

My performance measurements indicate a quite significant increase in the startup time as well as allocations (vs the decrease in DictionaryPolicyJumpTable which is the goal of the PR).

Before:

// * Summary *

BenchmarkDotNet=v0.13.0, OS=Windows 10.0.22621
12th Gen Intel Core i7-1255U, 1 CPU, 12 logical and 10 physical cores
.NET SDK=9.0.100-alpha.1.23524.8
  [Host]     : .NET 9.0.0 (9.0.23.52802), X64 RyuJIT
  Job-BZJOQC : .NET 9.0.0 (9.0.23.52313), X64 RyuJIT

Server=True  Toolchain=.NET Core 9.0  RunStrategy=Throughput

|                        Method | DestinationCount |     Mean |     Error |    StdDev |          Op/s |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------------------ |----------------- |---------:|----------:|----------:|--------------:|-------:|------:|------:|----------:|
| DictionaryPolicyJumpTableCtor |                2 | 8.300 ns | 0.0806 ns | 0.0715 ns | 120,475,250.4 | 0.0005 |     - |     - |      48 B |
|     DictionaryPolicyJumpTable |                2 | 9.422 ns | 0.0843 ns | 0.0747 ns | 106,140,068.2 |      - |     - |     - |         - |
|    SingleEntryPolicyJumpTable |                2 | 4.637 ns | 0.0964 ns | 0.1382 ns | 215,642,963.9 |      - |     - |     - |         - |
| DictionaryPolicyJumpTableCtor |                4 | 8.117 ns | 0.0531 ns | 0.0497 ns | 123,195,634.7 | 0.0005 |     - |     - |      48 B |
|     DictionaryPolicyJumpTable |                4 | 9.397 ns | 0.0720 ns | 0.0673 ns | 106,415,958.1 |      - |     - |     - |         - |
|    SingleEntryPolicyJumpTable |                4 | 4.526 ns | 0.0785 ns | 0.0696 ns | 220,966,690.7 |      - |     - |     - |         - |
| DictionaryPolicyJumpTableCtor |                9 | 8.232 ns | 0.0776 ns | 0.0726 ns | 121,480,877.5 | 0.0005 |     - |     - |      48 B |
|     DictionaryPolicyJumpTable |                9 | 9.494 ns | 0.0836 ns | 0.0653 ns | 105,328,310.6 |      - |     - |     - |         - |
|    SingleEntryPolicyJumpTable |                9 | 4.435 ns | 0.0368 ns | 0.0326 ns | 225,503,508.5 |      - |     - |     - |         - |

After (using FrozenDictionary):

|                        Method | DestinationCount |       Mean |     Error |    StdDev |          Op/s |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------------------ |----------------- |-----------:|----------:|----------:|--------------:|-------:|------:|------:|----------:|
| DictionaryPolicyJumpTableCtor |                2 | 390.895 ns | 5.2786 ns | 4.6794 ns |   2,558,230.8 | 0.0129 |     - |     - |   1,200 B |
|     DictionaryPolicyJumpTable |                2 |   6.799 ns | 0.0935 ns | 0.0829 ns | 147,087,596.4 |      - |     - |     - |         - |
|    SingleEntryPolicyJumpTable |                2 |   4.466 ns | 0.0404 ns | 0.0338 ns | 223,917,894.9 |      - |     - |     - |         - |
| DictionaryPolicyJumpTableCtor |                4 | 496.404 ns | 6.9354 ns | 6.4874 ns |   2,014,489.3 | 0.0153 |     - |     - |   1,472 B |
|     DictionaryPolicyJumpTable |                4 |   6.786 ns | 0.0474 ns | 0.0396 ns | 147,355,032.1 |      - |     - |     - |         - |
|    SingleEntryPolicyJumpTable |                4 |   4.467 ns | 0.0423 ns | 0.0375 ns | 223,866,735.8 |      - |     - |     - |         - |
| DictionaryPolicyJumpTableCtor |                9 | 771.634 ns | 3.9767 ns | 3.3207 ns |   1,295,951.9 | 0.0191 |     - |     - |   1,824 B |
|     DictionaryPolicyJumpTable |                9 |   6.789 ns | 0.0456 ns | 0.0426 ns | 147,288,554.2 |      - |     - |     - |         - |
|    SingleEntryPolicyJumpTable |                9 |   4.535 ns | 0.0375 ns | 0.0332 ns | 220,523,926.8 |      - |     - |     - |         - |

It is unclear for me if this change brings enough value to be merged.

After (using switch expression):

|                        Method | DestinationCount |      Mean |     Error |    StdDev |          Op/s |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------------------ |----------------- |----------:|----------:|----------:|--------------:|-------:|------:|------:|----------:|
| DictionaryPolicyJumpTableCtor |                2 | 27.427 ns | 0.2116 ns | 0.1875 ns |  36,460,073.0 | 0.0013 |     - |     - |     120 B |
|     DictionaryPolicyJumpTable |                2 |  5.771 ns | 0.0580 ns | 0.0514 ns | 173,274,603.3 |      - |     - |     - |         - |
|    SingleEntryPolicyJumpTable |                2 |  4.408 ns | 0.0460 ns | 0.0431 ns | 226,851,759.1 |      - |     - |     - |         - |
| DictionaryPolicyJumpTableCtor |                4 | 41.707 ns | 0.8466 ns | 0.9410 ns |  23,976,899.9 | 0.0013 |     - |     - |     120 B |
|     DictionaryPolicyJumpTable |                4 |  5.663 ns | 0.0332 ns | 0.0310 ns | 176,582,577.0 |      - |     - |     - |         - |
|    SingleEntryPolicyJumpTable |                4 |  4.361 ns | 0.0197 ns | 0.0175 ns | 229,311,321.4 |      - |     - |     - |         - |
| DictionaryPolicyJumpTableCtor |                9 | 91.236 ns | 0.4729 ns | 0.4192 ns |  10,960,590.1 | 0.0012 |     - |     - |     120 B |
|     DictionaryPolicyJumpTable |                9 |  5.637 ns | 0.0176 ns | 0.0138 ns | 177,390,232.5 |      - |     - |     - |         - |
|    SingleEntryPolicyJumpTable |                9 |  4.366 ns | 0.0315 ns | 0.0295 ns | 229,050,294.5 |      - |     - |     - |         - |

cc. @BrennanConroy

Fixes #47569

- Converting Dictionary to Frozen in the ctor of HttpMethodDictionaryPolicyJumpTable.
- Adding performance tests for the ctor.

dotnet#47569
@ladeak ladeak requested a review from javiercn as a code owner November 1, 2023 13:01
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions label Nov 1, 2023
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Nov 1, 2023
@ghost
Copy link

ghost commented Nov 1, 2023

Thanks for your PR, @ladeak. Someone from the team will get assigned to your PR shortly and we'll get it reviewed.

@ladeak
Copy link
Contributor Author

ladeak commented Nov 1, 2023

@BrennanConroy I am not sure if the change worth merging based on the significant tradeoff at startup time.

Does HttpMethodDictionaryPolicyJumpTable only allow HTTP method as the key of the lookup table? The type of the key is string, and currently it does not restrict to the known http verbs. I am asking to understand if it would be possible to use a different data type (ie. an array with a switch expression), given the HTTP verbs are well-known. Or does this type need to handle cases that are not known http verbs?

EDIT: thinking about it further, HttpMethodDictionaryPolicyJumpTable could handle the known HTTP verbs with a switch while the unknown ones could be handled similarly as today with a Dictionary. With such for known HTTP methods we could achieve the below results, but note that I have not yet pushed this commit.

EDIT 2: original table removed, see the latest results at a later comment.

@ladeak ladeak marked this pull request as draft November 1, 2023 15:02
@ladeak ladeak marked this pull request as ready for review November 1, 2023 16:10
@BrennanConroy
Copy link
Member

BrennanConroy commented Nov 2, 2023

EDIT: thinking about it further, HttpMethodDictionaryPolicyJumpTable could handle the known HTTP verbs with a switch while the unknown ones could be handled similarly as today with a Dictionary. With such for known HTTP methods we could achieve the below results, but note that I have not yet pushed this commit.

So you almost doubled the perf by using a switch instead? Whereas frozen dictionary "only" got ~50% faster?

Edit: This actually reminded me of a change to use perfect hashing for http methods:
#44096 maybe that would be even better here? Although switches did get better earlier this year for cases like this dotnet/roslyn#66081

@ladeak
Copy link
Contributor Author

ladeak commented Nov 2, 2023

@BrennanConroy it was simply using 9 if(string.Equals(...,..., OrdinalIgnoreCase)) for the known values and dictionary fallback for the unknown values. Using switch expression is trickier because of case insesitiveness, I would need to stackalloc to avoid extra allocation (stackalloc with upper limit of the longest known value).
I will take a look at the suggested PR, and measure both and come back to this thread.

@ladeak
Copy link
Contributor Author

ladeak commented Nov 2, 2023

Using switch-expressions as committed.

|                        Method | DestinationCount |      Mean |     Error |    StdDev |          Op/s |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------------------ |----------------- |----------:|----------:|----------:|--------------:|-------:|------:|------:|----------:|
| DictionaryPolicyJumpTableCtor |                2 | 27.427 ns | 0.2116 ns | 0.1875 ns |  36,460,073.0 | 0.0013 |     - |     - |     120 B |
|     DictionaryPolicyJumpTable |                2 |  5.771 ns | 0.0580 ns | 0.0514 ns | 173,274,603.3 |      - |     - |     - |         - |
|    SingleEntryPolicyJumpTable |                2 |  4.408 ns | 0.0460 ns | 0.0431 ns | 226,851,759.1 |      - |     - |     - |         - |
| DictionaryPolicyJumpTableCtor |                4 | 41.707 ns | 0.8466 ns | 0.9410 ns |  23,976,899.9 | 0.0013 |     - |     - |     120 B |
|     DictionaryPolicyJumpTable |                4 |  5.663 ns | 0.0332 ns | 0.0310 ns | 176,582,577.0 |      - |     - |     - |         - |
|    SingleEntryPolicyJumpTable |                4 |  4.361 ns | 0.0197 ns | 0.0175 ns | 229,311,321.4 |      - |     - |     - |         - |
| DictionaryPolicyJumpTableCtor |                9 | 91.236 ns | 0.4729 ns | 0.4192 ns |  10,960,590.1 | 0.0012 |     - |     - |     120 B |
|     DictionaryPolicyJumpTable |                9 |  5.637 ns | 0.0176 ns | 0.0138 ns | 177,390,232.5 |      - |     - |     - |         - |
|    SingleEntryPolicyJumpTable |                9 |  4.366 ns | 0.0315 ns | 0.0295 ns | 229,050,294.5 |      - |     - |     - |         - |

I would like to re-use somehow the code in Kestrel (#44096), but not sure what common place I may move it to, and having reference to Kestrel does not seem right.

@ladeak
Copy link
Contributor Author

ladeak commented Nov 3, 2023

@BrennanConroy and @adityamandaleeka , could you review the latest changes?

@@ -7,11 +7,104 @@ namespace Microsoft.AspNetCore.Routing.Matching;

internal sealed class HttpMethodDictionaryPolicyJumpTable : PolicyJumpTable
{
internal struct HttpMethodsJumpTable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if the struct could be readonly. Could the dictionary of destinations be passed into the ctor? The values could be set in the ctor, the struct made readonly, and struct fields readonly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, this could be a class. Did you make it a struct to avoid the allocation? Jump tables probably last the lifetime of an app. Allocations aren't a concern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go all the way with optimizing this - https://github.com/dotnet/aspnetcore/pull/51803/files#r1382732662 - then this type is probably better as a class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I made it a struct because I did not want allocation. I can make it a class or a readonly struct too, as I see you prefer it as a class rather.

Comment on lines 119 to 125
if (destinations != null)
{
foreach (var item in destinations)
{
_knownHttpMethodDestinations.Set(item.Key, item.Value);
}
}
Copy link
Member

@JamesNK JamesNK Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the time, the destinations/corsPreflightDestinations dictionaries will only contain known destinations. Keeping the dictionary around becomes unnecessary. It would reduce memory usage to never save the dictionary on the jump table.

I wonder if HttpMethodsJumpTable could be refactored into something that the builder uses directly here and is then passed to the jump table. There is a set method that either sets the value directly if it is a known header value or sets it into a lazily created dictionary (_unknownHttpMethodDestinations) if it is unknown. That would avoid creating a dictionary during startup most of the time.

Reduce allocations on startup
Reduce overall memory usage
Improve route matching performance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point. Let me have a look and update the PR.

@ladeak
Copy link
Contributor Author

ladeak commented Nov 11, 2023

I have updated the PR to take a direction as suggested in previous comments. The "known jump table" is now created in the BuildPolicy method.

I update the benchmarks here. However, please note that I have switched off the support preflight flag, hence a slight improvement in all benchmarks below compared to the ones previously commented on in this post.

HttpMethodPolicyJumpTableBenchmark

BEFORE

For transparency (and because the ctor HttpMethodDictionaryPolicyJumpTable has changed) this is the Setup method of HttpMethodPolicyJumpTableBenchmark I used to measure the before performance. The after one is the one available in the PR.

[GlobalSetup]
public void Setup()
{
    _destinations.Add(HttpMethods.Connect, 1);
    _destinations.Add(HttpMethods.Delete, 2);
    _destinations.Add(HttpMethods.Head, 3);
    _destinations.Add(HttpMethods.Get, 4);
    _destinations.Add(HttpMethods.Options, 5);
    _destinations.Add(HttpMethods.Patch, 6);
    _destinations.Add(HttpMethods.Post, 7);
    _destinations.Add(HttpMethods.Put, 8);
    _destinations.Add(HttpMethods.Trace, 9);
    _destinations.Add("MERGE", 10);

    _dictionaryJumptable = new HttpMethodDictionaryPolicyJumpTable(0, _destinations, -1, _corsDestinations);
    _singleEntryJumptable = new HttpMethodSingleEntryPolicyJumpTable(0, HttpMethods.Get, -1, supportsCorsPreflight: false, -1, 2);
    _httpContext = new DefaultHttpContext();
    _httpContext.Request.Method = TestHttpMethod;
}
|                     Method | TestHttpMethod |     Mean |     Error |    StdDev |   Median |          Op/s | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------------------- |--------------- |---------:|----------:|----------:|---------:|--------------:|------:|------:|------:|----------:|
|  DictionaryPolicyJumpTable |            GET | 8.338 ns | 0.0704 ns | 0.0658 ns | 8.341 ns | 119,936,347.4 |     - |     - |     - |         - |
| SingleEntryPolicyJumpTable |            GET | 3.108 ns | 0.0225 ns | 0.0211 ns | 3.101 ns | 321,710,436.1 |     - |     - |     - |         - |
|  DictionaryPolicyJumpTable |          Merge | 8.341 ns | 0.0575 ns | 0.0538 ns | 8.344 ns | 119,893,341.5 |     - |     - |     - |         - |
| SingleEntryPolicyJumpTable |          Merge | 3.615 ns | 0.0768 ns | 0.1479 ns | 3.680 ns | 276,609,311.1 |     - |     - |     - |         - |
|  DictionaryPolicyJumpTable |           POST | 8.290 ns | 0.0912 ns | 0.0809 ns | 8.283 ns | 120,633,371.2 |     - |     - |     - |         - |
| SingleEntryPolicyJumpTable |           POST | 3.381 ns | 0.0556 ns | 0.0493 ns | 3.376 ns | 295,737,987.0 |     - |     - |     - |         - |

AFTER

|                     Method | TestHttpMethod |     Mean |     Error |    StdDev |          Op/s | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------------------- |--------------- |---------:|----------:|----------:|--------------:|------:|------:|------:|----------:|
|  DictionaryPolicyJumpTable |            GET | 3.807 ns | 0.0791 ns | 0.0739 ns | 262,683,263.4 |     - |     - |     - |         - |
| SingleEntryPolicyJumpTable |            GET | 3.089 ns | 0.0116 ns | 0.0090 ns | 323,756,224.3 |     - |     - |     - |         - |
|  DictionaryPolicyJumpTable |          Merge | 8.800 ns | 0.0454 ns | 0.0402 ns | 113,641,755.1 |     - |     - |     - |         - |
| SingleEntryPolicyJumpTable |          Merge | 3.359 ns | 0.0210 ns | 0.0196 ns | 297,748,718.0 |     - |     - |     - |         - |
|  DictionaryPolicyJumpTable |           POST | 3.743 ns | 0.0209 ns | 0.0186 ns | 267,176,079.6 |     - |     - |     - |         - |
| SingleEntryPolicyJumpTable |           POST | 3.381 ns | 0.0300 ns | 0.0280 ns | 295,781,095.6 |     - |     - |     - |         - |

HttpMethodMatcherPolicyBenchmark

BEFORE

|         Method | DestinationCount |      Mean |    Error |   StdDev |         Op/s |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------- |----------------- |----------:|---------:|---------:|-------------:|-------:|------:|------:|----------:|
| BuildJumpTable |                3 |  83.29 ns | 0.551 ns | 0.430 ns | 12,005,749.0 | 0.0029 |     - |     - |     264 B |
| BuildJumpTable |                5 | 142.14 ns | 1.974 ns | 1.750 ns |  7,035,194.8 | 0.0055 |     - |     - |     512 B |
| BuildJumpTable |               11 | 278.49 ns | 2.404 ns | 2.131 ns |  3,590,761.2 | 0.0110 |     - |     - |   1,040 B |

AFTER

|         Method | DestinationCount |      Mean |    Error |   StdDev |         Op/s |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------- |----------------- |----------:|---------:|---------:|-------------:|-------:|------:|------:|----------:|
| BuildJumpTable |                3 |  96.43 ns | 1.016 ns | 0.848 ns | 10,370,136.0 | 0.0042 |     - |     - |     392 B |
| BuildJumpTable |                5 | 104.21 ns | 1.723 ns | 1.439 ns |  9,596,231.4 | 0.0042 |     - |     - |     392 B |
| BuildJumpTable |               11 | 149.29 ns | 2.491 ns | 2.330 ns |  6,698,388.4 | 0.0041 |     - |     - |     392 B |

@JamesNK is this the direction you would like to go?

@ladeak ladeak requested a review from JamesNK November 13, 2023 13:15
@JamesNK
Copy link
Member

JamesNK commented Nov 14, 2023

@JamesNK is this the direction you would like to go?

Mostly. I played around with the code myself and made some changes:

  • I think optimizing in the builder is too early. The logic to decide if there is a single entry doesn't feel good. Instead, I changed it to use a List<KeyValuePair<string, int>>. It's much lighter weight than a dictionary without adding too much complexity. Could optimize it further, but this is only at startup, so it's not a priority.
  • I merged multiple fields into the known HTTP lookup (known + extra + exit default). It cleans up the jump table a lot.

See 37e50c8

What do you think? Feel free to cherry-pick into this PR.
Note: I didn't rerun benchmarks.

@JamesNK
Copy link
Member

JamesNK commented Nov 14, 2023

Should probably rename HttpMethodDictionaryPolicyJumpTable to HttpMethodMultiplePolicyJumpTable

@ladeak
Copy link
Contributor Author

ladeak commented Nov 14, 2023

Cherry-picked, renamed, and I will re-run the tests today and update this comment.

These look much better:

|         Method | DestinationCount |      Mean |    Error |   StdDev |         Op/s |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------- |----------------- |----------:|---------:|---------:|-------------:|-------:|------:|------:|----------:|
| BuildJumpTable |                3 |  47.23 ns | 0.363 ns | 0.339 ns | 21,170,952.0 | 0.0023 |     - |     - |     216 B |
| BuildJumpTable |                5 |  55.81 ns | 0.253 ns | 0.237 ns | 17,918,140.1 | 0.0023 |     - |     - |     216 B |
| BuildJumpTable |               11 | 198.98 ns | 3.136 ns | 2.934 ns |  5,025,694.4 | 0.0093 |     - |     - |     864 B |

and the jump table is more or less in par too:

|                     Method | TestHttpMethod |      Mean |     Error |    StdDev |          Op/s | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------------------- |--------------- |----------:|----------:|----------:|--------------:|------:|------:|------:|----------:|
|  DictionaryPolicyJumpTable |            GET |  3.532 ns | 0.0203 ns | 0.0377 ns | 283,087,866.7 |     - |     - |     - |         - |
| SingleEntryPolicyJumpTable |            GET |  3.114 ns | 0.0162 ns | 0.0152 ns | 321,089,261.4 |     - |     - |     - |         - |
|  DictionaryPolicyJumpTable |          Merge | 12.173 ns | 0.0979 ns | 0.0868 ns |  82,148,783.6 |     - |     - |     - |         - |
| SingleEntryPolicyJumpTable |          Merge |  3.375 ns | 0.0141 ns | 0.0125 ns | 296,294,147.0 |     - |     - |     - |         - |
|  DictionaryPolicyJumpTable |           POST |  4.088 ns | 0.0230 ns | 0.0203 ns | 244,630,229.7 |     - |     - |     - |         - |
| SingleEntryPolicyJumpTable |           POST |  3.368 ns | 0.0180 ns | 0.0160 ns | 296,949,234.1 |     - |     - |     - |         - |

@ladeak
Copy link
Contributor Author

ladeak commented Nov 14, 2023

If I understand correctly the error why the build failed is a known error? Could you re-trigger the build?

@JamesNK JamesNK changed the title 47569 - Using FrozenDictionary in HttpMethodDictionaryPolicyJumpTable Optimize HTTP method lookup in routing jumptable Nov 14, 2023
@JamesNK
Copy link
Member

JamesNK commented Nov 15, 2023

FYI, it looks good to me. I want someone else on the team to take a look and give a thumbs-up before merging.

_ => (null, 0),
};

if (matchedMethod is not null && method.Equals(matchedMethod, StringComparison.OrdinalIgnoreCase))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we can speed-up the method.Equals(matchedMethod, StringComparison.OrdinalIgnoreCase) since we know we are targeting a well-known method. Maybe we can leverage the fact that it's very likely ASCII on the header to perform two memory comparisons and or the results for the individual characters?

Copy link
Contributor Author

@ladeak ladeak Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to compare with upper-case and lower-case, OR the results and check if all values set to 1? I actually wonder if there is something like this in System.Text.Ascii already that I could use straight away. And I assume we cannot expect it as ASCII, so if the match fails, fall back to regular string comparison, right?

  • I will implement it, measure it and come back to you with the results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there is potential to speed this up:

  • One option is to generate a trie to compare the strings. That's probably overkill.
  • A simpler optimization would be to have a fast path that assumes upper case method (which basically everyone uses) and compares against hardcoded chars using SequenceEqual. If it doesn't match then fallback to string.Equals + OrdinalIgnoreCase.

However, this PR has a nice improvement and sets up good infrastructure. Merge now and iterate in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have the two versions of the method in a buffer, (upper and lowercase), compare against it and or the results. If done with simd it's about 4 ops.

Disclaimer that I haven't done it myself, but I know in other places we use things like SWAR (Simd within a register) so it's likely useful to apply it here too, rely on SIMD instructions directly or use any specialized helper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to come explore @javiercn , I have written some vectorized code in the past, let me see if I can do something "simple". But Ascii type also uses vectorization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have the two versions of the method in a buffer, (upper and lowercase), compare against it and or the results. If done with simd it's about 4 ops.

It would be a bit more complicated because mix case is valid, e.g. Post.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JamesNK and @javiercn I promised to come back to this thread. TLDR; none of the above suggestions were performing better to what is already merged.

Using ASCII

Relevant code:

if (matchedMethod is not null && (System.Text.Ascii.EqualsIgnoreCase(matchedMethod, method) || method.Equals(matchedMethod, StringComparison.OrdinalIgnoreCase)))
{
    return destination;
}

ASCII Results

|                     Method | TestHttpMethod |      Mean |     Error |    StdDev |    Median |          Op/s | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------------------- |--------------- |----------:|----------:|----------:|----------:|--------------:|------:|------:|------:|----------:|
|  DictionaryPolicyJumpTable |            GET |  6.182 ns | 0.0793 ns | 0.0703 ns |  6.147 ns | 161,768,263.8 |     - |     - |     - |         - |
| SingleEntryPolicyJumpTable |            GET |  2.721 ns | 0.0557 ns | 0.1344 ns |  2.674 ns | 367,483,825.2 |     - |     - |     - |         - |
|  DictionaryPolicyJumpTable |          Merge | 12.403 ns | 0.1286 ns | 0.1140 ns | 12.396 ns |  80,624,929.0 |     - |     - |     - |         - |
| SingleEntryPolicyJumpTable |          Merge |  3.363 ns | 0.0221 ns | 0.0196 ns |  3.363 ns | 297,385,268.2 |     - |     - |     - |         - |
|  DictionaryPolicyJumpTable |           POST |  6.330 ns | 0.0457 ns | 0.0405 ns |  6.336 ns | 157,968,543.3 |     - |     - |     - |         - |
| SingleEntryPolicyJumpTable |           POST |  3.442 ns | 0.0702 ns | 0.0721 ns |  3.444 ns | 290,548,813.4 |     - |     - |     - |         - |

Using SequenceEquals Ordinal comparison

Relevant code:

if (matchedMethod is not null && (method.AsSpan().SequenceEqual(matchedMethod)
        || method.Equals(matchedMethod, StringComparison.OrdinalIgnoreCase)))
{
    return destination;
}

SequenceEqual Results

|                     Method | TestHttpMethod |      Mean |     Error |    StdDev |          Op/s | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------------------- |--------------- |----------:|----------:|----------:|--------------:|------:|------:|------:|----------:|
|  DictionaryPolicyJumpTable |            GET |  5.325 ns | 0.0951 ns | 0.0794 ns | 187,810,373.8 |     - |     - |     - |         - |
| SingleEntryPolicyJumpTable |            GET |  3.136 ns | 0.0436 ns | 0.0387 ns | 318,917,169.1 |     - |     - |     - |         - |
|  DictionaryPolicyJumpTable |          Merge | 12.382 ns | 0.1478 ns | 0.1383 ns |  80,761,722.6 |     - |     - |     - |         - |
| SingleEntryPolicyJumpTable |          Merge |  3.360 ns | 0.0353 ns | 0.0330 ns | 297,646,602.7 |     - |     - |     - |         - |
|  DictionaryPolicyJumpTable |           POST |  4.894 ns | 0.0540 ns | 0.0505 ns | 204,328,749.4 |     - |     - |     - |         - |
| SingleEntryPolicyJumpTable |           POST |  3.343 ns | 0.0078 ns | 0.0065 ns | 299,173,275.5 |     - |     - |     - |         - |

Vector

A slightly different implementation I take more optimistic approach that only compares with upper-case vectorized, and if no match falls back to regular approach. The problem with vectors is that they are a bit more difficult to create when the size of the vector is larger than the input.

So what I ended up doing is creating vectorized versions of the known HTTP verbs and a vector mask corresponding to their length.

       // Set values to vConnect, vDelete, vGet etc. fields in ctor

        vConnect = Vector256.LoadUnsafe<ushort>(ref Unsafe.As<char, ushort>(ref MemoryMarshal.GetReference(HttpMethods.Connect.AsSpan()))) & vMask7;
        vDelete = Vector256.LoadUnsafe<ushort>(ref Unsafe.As<char, ushort>(ref MemoryMarshal.GetReference(HttpMethods.Delete.AsSpan()))) & vMask6;
        vGet = Vector256.LoadUnsafe<ushort>(ref Unsafe.As<char, ushort>(ref MemoryMarshal.GetReference(HttpMethods.Get.AsSpan()))) & vMask3;
        // ...
        vMask3 = new Vector<ushort>(new ushort[16] { ushort.MaxValue, ushort.MaxValue, ushort.MaxValue, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }).AsVector256();
        vMask4 = new Vector<ushort>(new ushort[16] { ushort.MaxValue, ushort.MaxValue, ushort.MaxValue, ushort.MaxValue, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }).AsVector256();
         //...
    }

    public int GetDestination(string method)
    {
        if (Vector256.IsHardwareAccelerated)
        {
            (var vMatched, var vMask, var expectedLength, var destination) = (method[0]) switch
            {
                'C' => (vConnect, vMask7, 7, _connectDestination),
                'D' => (vDelete, vMask6, 6, _deleteDestination),
                'G' => (vGet, vMask3, 3, _getDestination),
                'H' => (vHead, vMask4, 4, _headDestination),
                'O' => (vOptions, vMask7, 7, _optionsDestination),
                'P' => method.Length switch
                {
                    3 => (vPut, vMask3, 3, _putDestination),
                    4 => (vPost, vMask4, 4, _postDestination),
                    _ => (vPatch, vMask5, 5, _patchDestination),
                },
                'T' => (vTrace, vMask5, 5, _traceDestination),
                _ => (Vector256<ushort>.Zero, Vector256<ushort>.Zero, 0, 0),
            };

            if (expectedLength == method.Length)
            {
                ref var source = ref Unsafe.As<char, ushort>(ref MemoryMarshal.GetReference(method.AsSpan()));
                var vSource = Vector256.LoadUnsafe<ushort>(ref source);
                vSource = vSource & vMask;

                if (Vector256.Equals(vSource, vMatched) == Vector256<ushort>.AllBitsSet)
                {
                    return destination;
                }
            }
        }

       // else fallback to non-vectorized approach

}

I was considering to implement the lower-case comparison too but given the performance tests indicate this solution is already slower to the string.Equals even when the input is upper cased, extending it won't get any faster.

Vectorized Results:

|                     Method | TestHttpMethod |      Mean |     Error |    StdDev |    Median |          Op/s | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------------------- |--------------- |----------:|----------:|----------:|----------:|--------------:|------:|------:|------:|----------:|
|  DictionaryPolicyJumpTable |            GET |  4.669 ns | 0.0417 ns | 0.0369 ns |  4.669 ns | 214,196,679.8 |     - |     - |     - |         - |
| SingleEntryPolicyJumpTable |            GET |  3.119 ns | 0.0193 ns | 0.0181 ns |  3.116 ns | 320,565,879.3 |     - |     - |     - |         - |
|  DictionaryPolicyJumpTable |          Merge | 13.416 ns | 0.1905 ns | 0.1688 ns | 13.446 ns |  74,538,092.7 |     - |     - |     - |         - |
| SingleEntryPolicyJumpTable |          Merge |  3.375 ns | 0.0281 ns | 0.0249 ns |  3.367 ns | 296,253,388.9 |     - |     - |     - |         - |
|  DictionaryPolicyJumpTable |           POST |  4.382 ns | 0.0920 ns | 0.1682 ns |  4.292 ns | 228,209,040.8 |     - |     - |     - |         - |
| SingleEntryPolicyJumpTable |           POST |  3.364 ns | 0.0283 ns | 0.0237 ns |  3.374 ns | 297,300,993.8 |     - |     - |     - |         - |

@JamesNK JamesNK merged commit c7365a0 into dotnet:main Nov 15, 2023
@ghost ghost added this to the 9.0-preview1 milestone Nov 15, 2023
@danmoseley
Copy link
Member

Thanks for this work @ladeak ! We'd welcome more contributions, if there's something that interests you.

@ghost
Copy link

ghost commented Nov 16, 2023

Hi @danmoseley. It looks like you just commented on a closed PR. The team will most probably miss it. If you'd like to bring something important up to their attention, consider filing a new issue and add enough details to build context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use FrozenDictionary where appropriate
6 participants