Skip to content

Latest commit

 

History

History

ODataWriterVsSystemTextJson

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

ODataWriter vs System.Text.Json performance

Overview

This experiment compares the performance differences between JsonSerializer (System.Text.Json) and ODataWriter. It's a follow up to this issue

The experiments are based on serializing a collection of Customer entities.

There's an IExperimentWriter interface that abstracts writing a collection of Customers to a stream. And we compare 3 implementations:

  • JsonExperimentWriter based on JsonSerializer.SerializeAsync
  • ODataExperimentWriter based on an asynchronous ODataWriter. This simulates the steps taken by WebApi's ODataResourceSetSerializer to serialize an entity set response.
  • ODataSyncExperimentWriter based on a synchronous ODataWriter. Similar to ODataExperimentWriter but uses synchronous write methods instead of async.

The experiments compare writing to a memory stream, writing to a file as well as writing a http response from a simple server. The purpose of this is to account for potential I/O overhead that may affect the metrics.

The HTTP servers are based on a simple class (ExperimentServer) that takes an IExperimentWriter as input and sends a collection of Customers as response to all requests.

Project setup

The main project includes Microsoft.OData.Core and Microsoft.OData.Edm as project references. So for the solution to build, you need have a local copy of the odata.net repo and add the project references in Visual Studio.

Running experiments

Benchmarks

There are some benchmarks that compare the 3 serializers across different data sizes and destinations.

To run them:

  • In the Main method of the project, make sure the RunBenchmarks() method is uncommented, and comment out all the remaining statements in the method.
  • Then in your terminal, navigate to the directory that contains the .csproj file and run:
dotnet run -c Release -- --filter *

Here are some sample results

Method Job UnrollFactor dataSize Mean Error StdDev Median Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
WriteJson DefaultJob 16 1000 1.988 ms 0.0302 ms 0.0268 ms 1.995 ms 1.00 0.00 109.3750 109.3750 109.3750 525.52 KB
WriteOData DefaultJob 16 1000 59.668 ms 1.1631 ms 1.6681 ms 59.344 ms 29.89 1.19 5000.0000 - - 24490.41 KB
WriteODataSync DefaultJob 16 1000 39.740 ms 0.5698 ms 0.4448 ms 39.915 ms 20.01 0.43 4923.0769 153.8462 76.9231 20595.88 KB
WriteJsonFile Job-ABHABG 1 1000 3.771 ms 0.3180 ms 0.8917 ms 3.494 ms ? ? - - - 98.05 KB
WriteODataFile Job-ABHABG 1 1000 64.677 ms 2.2464 ms 6.5173 ms 63.439 ms ? ? 5000.0000 - - 23937.82 KB
WriteODataSyncFile Job-ABHABG 1 1000 57.521 ms 2.1010 ms 6.0281 ms 56.991 ms ? ? 4000.0000 1000.0000 - 20088.62 KB
WriteJson DefaultJob 16 5000 9.479 ms 0.1632 ms 0.1526 ms 9.418 ms 1.00 0.00 828.1250 718.7500 718.7500 4068.3 KB
WriteOData DefaultJob 16 5000 236.482 ms 3.0738 ms 2.8753 ms 236.680 ms 24.95 0.48 28000.0000 1000.0000 - 121955.65 KB
WriteODataSync DefaultJob 16 5000 210.549 ms 4.0937 ms 4.0206 ms 210.272 ms 22.22 0.62 24000.0000 - - 104437.29 KB
WriteJsonFile Job-ABHABG 1 5000 15.386 ms 0.9091 ms 2.6520 ms 14.725 ms ? ? - - - 410.66 KB
WriteODataFile Job-ABHABG 1 5000 341.139 ms 24.7906 ms 69.9224 ms 322.275 ms ? ? 28000.0000 2000.0000 - 119509.02 KB
WriteODataSyncFile Job-ABHABG 1 5000 275.562 ms 8.6091 ms 25.1130 ms 275.487 ms ? ? 24000.0000 1000.0000 - 100345.96 KB
WriteJson DefaultJob 16 10000 18.647 ms 0.2563 ms 0.2398 ms 18.706 ms 1.00 0.00 937.5000 750.0000 750.0000 8149 KB
WriteOData DefaultJob 16 10000 908.103 ms 112.1642 ms 330.7187 ms 867.588 ms 26.70 2.52 55000.0000 2000.0000 - 243963.89 KB
WriteODataSync DefaultJob 16 10000 699.142 ms 31.6311 ms 87.1212 ms 690.979 ms 37.54 2.97 49000.0000 - - 208854.99 KB
WriteJsonFile Job-ABHABG 1 10000 34.629 ms 1.6557 ms 4.8558 ms 32.933 ms ? ? - - - 816.33 KB
WriteODataFile Job-ABHABG 1 10000 704.676 ms 39.3033 ms 112.1344 ms 673.265 ms ? ? 55000.0000 2000.0000 - 238992.23 KB
WriteODataSyncFile Job-ABHABG 1 10000 621.102 ms 36.1759 ms 105.5269 ms 601.100 ms ? ? 49000.0000 1000.0000 - 200667.64 KB

The benchmarks include scenarios for writing to a memory stream (WriteJson, WriteOData and WriteODataSync) as well as writing to a local temp file (WriteJsonFile, WriteODataFile, WriteODataSyncFile). I also have commented out scenarios based on http requests to a simple local server, but when running those, the ODataWriter-based serializers would hang at some point and throw a TaskCancelled exception. I'm not sure what causes this.

Anyway, from the results above, we clearly see that the JsonSerializer is around 15x-20x faster than the ODataWriters regardless of whether we're writing to a file or memory or data size.

It is expected that for a single request, WriteODataSync should perform better than WriteOData due to the extra async overhead in the async writers.

While this analysis primarily focusses on response time and CPU usage, we can see that the memory overhead is considerably greater in OData as well.

Comparing Http Server response times

To compare the response time of http requests based on the different serializers, go to the Main method of the main project and uncomment the TestServers() statement and comment out the remaining statements in the method. Then run the application (preferrably in Release mode).

This will launch three local servers:

  • JsonSerializer server on http://localhost:8080
  • async ODataWriter server on http://localhost:8081
  • synchronous ODataWriter server on http://localhost:8082

All the servers work exactly the same way, they only differ on the serializer used to write the response (see ExperimentServer). By default, the server will send as a response a JSON collection of 5000 Customer entities. It also logs the response time on the console (Note: the response time is actually the time taken to write the response, it doesn't include the time taken to generate the data.). You can request a different data size by adding a count query option (e.g. http://localhost:8080?count=10000)

The first couple of requests take a bit of time (probably some warmup activities taking place). So you should make a couple of requests to each of the endpoints until the response times converge before you start comparing them.

Here are sample response times observed on my machine:

Sample Response Times

The JsonSerializer server about 10x faster than the synchronous OData server and at least 25x faster than the async OData server.

Note that comparing response times of single request gives us insights into the latency, but not scalability/throughput. It's expected that async will may result in slower response times for individual requests but improve overall scalability. However in this case, the JsonSerializer is also async (using the SerializeAsync method), so I believe this is a fair comparison and could also translate to throughput.

CPU Profiling

I used the Visual Studio performance profile to drill down and try and find what the sources of the bottlenecks are. I used the profile to collect performance metrics on the 3 servers. I let the servers "warm up" by making a bunch of requests until the response times converged. The profiler session I used in the following analysis can be found in SystemJsonVsODataWriterAsyncVsODataWriterSync.diagsession

The following graph shows CPU usage during requests against the 3 servers. The tiny bump on the left is a request to the JsonSerializer server, the large bump in the middle is a request to the async OData server and the bump on the right is a request to synchronous OData server.

Servers CPU Graph

The JsonSerializer server took 21ms to complete the request. 18ms (85.71%) were spent in JsonSerializer.WriteAsyncCore(). 11ms (52.38%) are spent in JsonSerializer.WriteCore() and only 4ms (19.05%) are actually spent writing to underlying network stream (Stream.WriteAsync()). We also see that the majority of the stream writing, 3ms (14.29%) is spent the strange dynamicClass.IL_STUB_PInvoke method. Using ILSpy, I found that this a stub defined in Interop.HttpApi as:

[DllImport("httpapi.dll", SetLastError = true)]
internal unsafe static extern uint HttpSendResponseEntityBody(SafeHandle requestQueueHandle, ulong requestId, uint flags, ushort entityChunkCount, HTTP_DATA_CHUNK* pEntityChunks, uint* pBytesSent, Microsoft.Win32.SafeHandles.SafeLocalAllocHandle pRequestBuffer, uint requestBufferLength, NativeOverlapped* pOverlapped, void* pLogData);

I believe it's a stub that calls into a native method defined in httpapi.dll that handles the low level response transmission.

JSON Server CPU call tree

The asynchronous ODataWriter request took 678ms while the synchronous version took 335ms. I focussed my analysis on the synchronous version because the current implementation of the async ODataWriter consists of wrapping the synchronous calls with tasks. The async API is also currently in the process of being rewritten. Also, the synchronous writer represents the best case ODataWriter and still performs poorly compared to the async JsonSerializer.

About 296ms (88.36%) were spent in the OData writer methods. It's a bit trickier to determine where the majority of the bottleneck is because the CPU time is scattered across so many methods and deep call trees:

Synchronous OData server CPU call tree

Switching to the functions view to make it easier to spot the most dominant methods, I quickly noticed that 29ms (8.66%) is spent in the native http call. This accounted for only 3ms in the JsonSerializer case. Both serializers are writing about the same amount of data (OData payload context an @odata.context url property) but the OData serializer spends about 10x times in the native http stack. My suspicion is that this is due to the lack of buffering and frequent calls to StreamWriter.Write and flushing.

Synchronous OData Server CPU functions view

The following images show that underlying ODataWriter write methods end up calling either StreamWriter.Write(string) (28ms) or StreamWriter.Write(char) (9ms). In both methods, the majority of the time is actually spent flushing to the underlying stream (24ms/28ms for Write(string) and 8ms/9ms for Write(char)). So the amount of time the OData writer was spending flushing single characters to the http stream is twice the time JsonSerializer spent writing the entire payload to the stream. This indicates that there's a significant overhead associated with sending a single character to the stream and our frequent calls to the stream writer lead to lots of inefficiencies.

StreamWriter flushing CPU time

StreamWriter Write(string) CPU time StreamWriter Write(char) CPU time

Another thing I quickly noticed from the functions view graph is that string concatenation accounted for 11ms (3.28%), which is also more than double the time the JsonSerializer spends writing to its stream.

So the flushing and string concatenation account for 48ms. I am curious to find out how the remaining 248ms are distributed: how much of that is spent in write-related activities, how much is spent in Edm-related activities, etc. But I haven't found the time to do that breakdown yet.

I also analyzed the CPU usage in the async OData server just to try understand why the response time in the async case is almost double that of the sync version if their implementations are largely identical. Furthermore the gap between sync and async OData in the in-memory and file benchmarks was not that significant. Well surprisingly, it seems that over 50% of CPU time was actually spent on event tracing. Apparently, the Microsoft.OData.TaskUtils.FollowOnSuccessWithTask(this Task antecedentTask, Func<Task, Task> operation) leads to event writes to the ETW (Event Tracing for Windows), which in this case seems to be quite expensive. I don't know why this happens, I haven't yet verified whether this consistently happens even when we're not attaching a profiler and I don't know if it has anything to do with the specific implementation of our FollowOnSuccessWithTask or the fact it was making network requests. It could be worthwhile to investigate this further.

ETW CPU Time

Comparing implementations

If we take another look at the JsonSerializer's call tree, we noticed that out of the 18ms spent in the JsonSerializer.WriteCoreAsync method, 11ms are spent in JsonSerializer.Write and only 4ms in Stream.WriteAsync. The JsonSerializer.Write method is (as the name suggests) a synchronous method. It does not write to the stream, instead it writes to a buffered writer that rents buffers from the shared ArrayPool (avoids memory allocations). When the buffer is about 90% full then it writes the buffer content to the stream. The buffered writer starts with an initial buffer size of 16k by default and is designed to not expand the buffer more than 4 times (expanding the buffer involves renting a larger buffer from the pool and copying data from the old buffer). For reference, the source code of JsonSerializer.WriteAsync can be found here.

The JsonSerializer uses the Utf8JsonWriter to write the utf8-encoded JSON to the buffered writer. This writer is also designed not to allocate memory. It uses Span<> and ReadOnlySpan<> of char and byte instead of working with strings directly. When it needs to get a block of memory in order to write a string into, it either allocates an array on the stack using stackalloc (when the size < 256 bytes) or rents from the shared array pool. For example, when it needs to write a string property name or value, it first checks if the string needs to be escaped. If the string needs to escaped, then it will stackalloc or rent an array to store the escaped string into. The following excerpt from the code provides an example of this. I've annotated the code with comments and remove some debug assertions for better clarity. You can check the actual source here.

private void WriteStringEscapeValue(ReadOnlySpan<char> value, int firstEscapeIndexVal)
{
    // this is later used to check whether the allocated block
    // is rented from the pool, so that it can be returned
    char[]? valueArray = null;

    // this determines the size to allocate for the escaped string
    // if it has found at least one character that needs to be escaped
    // (i.e. firstEscapeIndexVal >= 0), then it assumes that the
    // remaining characters could also be escaped and allocates
    // enough memory to store the characters if they were escaped.
    // In the worst case, escaping a 1-byte ASCII character could increase its size by 6x
    // (see: https://source.dot.net/#System.Text.Json/System/Text/Json/JsonConstants.cs,60)
    // So in the worst case, this length will be 6x the size of the input
    int length = JsonWriterHelper.GetMaxEscapedLength(value.Length, firstEscapeIndexVal);

    // the stackalloc threshold is 256
    // (see: https://source.dot.net/#System.Text.Json/System/Text/Json/JsonConstants.cs,54)
    // allocate the array on the stack if length <= 256 otherwise rent from pool
    Span<char> escapedValue = length <= JsonConstants.StackallocThreshold ?
        stackalloc char[length] :
        (valueArray = ArrayPool<char>.Shared.Rent(length));

    // copies the input string into the allocated array
    // escaping characters where necessary
    // this returns the actually size written because
    // the allocated array is likely larger than the actual length
    // of the escaped string
    JsonWriterHelper.EscapeString(value, escapedValue, firstEscapeIndexVal, _options.Encoder, out int written);

    // write the escaped string to the buffered writer's memory
    WriteStringByOptions(escapedValue.Slice(0, written));

    // return array if it was rented
    if (valueArray != null)
    {
        ArrayPool<char>.Shared.Return(valueArray);
    }
}

Another thing I have noticed is that there seems to be no method in the Utf8JsonWriter that takes an object parameter or returns an object. This probably helps prevents boxing of primitive values. If this is indeed the case, I'd be curious to know how it pull out values from the object being serialized without boxing.

Now let's look at the OData's JsonValueUtils.WriteEscapedJsonStringValue which accomplishes a similar purpose (it accounts for 22ms or 6.57% of CPU usage). I've also added some comments to the following code and removed some error-checking statements.

internal static void WriteEscapedJsonStringValue(
    TextWriter writer,
    string inputString,
    ODataStringEscapeOption stringEscapeOption,
    Ref<char[]> buffer,
    ICharArrayPool bufferPool)
{
    int firstIndex;
    if (!JsonValueUtils.CheckIfStringHasSpecialChars(inputString, stringEscapeOption, out firstIndex))
    {
        // no characters to escape, write the string as-is
        writer.Write(inputString);
    }
    else
    {
        int inputStringLength = inputString.Length;

        // this gets a char[] buffer. If the bufferPool is not null,
        // the it rents an array (fixed BufferLength of 128) from the pool
        // if the bufferPool is null, the it allocates a new array
        // using new char[BufferLengh]
        buffer.Value = BufferUtils.InitializeBufferIfRequired(bufferPool, buffer.Value);
        int bufferLength = buffer.Value.Length;
        int bufferIndex = 0;
        int currentIndex = 0;

        // Let's copy and flush strings up to the first index of the special char
        while (currentIndex < firstIndex)
        {
            int subStrLength = firstIndex - currentIndex;

            // If the first index of the special character is larger than the buffer length,
            // flush everything to the buffer first and reset the buffer to the next chunk.
            // Otherwise copy to the buffer and go on from there.
            if (subStrLength >= bufferLength)
            {
                inputString.CopyTo(currentIndex, buffer.Value, 0, bufferLength);
                writer.Write(buffer.Value, 0, bufferLength);
                currentIndex += bufferLength;
            }
            else
            {
                WriteSubstringToBuffer(inputString, ref currentIndex, buffer.Value, ref bufferIndex, subStrLength);
            }
        }

        // start writing escaped strings
        // this goes through a loop of writing escaped characters to the
        // buffer then flushing the buffer to the writer whenever
        // the buffer fills up
        WriteEscapedStringToBuffer(writer, inputString, ref currentIndex, buffer.Value, ref bufferIndex, stringEscapeOption);

        // write any remaining chars to the writer
        if (bufferIndex > 0)
        {
            writer.Write(buffer.Value, 0, bufferIndex);
        }
    }
}

This method writes to the writer a lot, that accounts for 19/22ms. It allocates a buffer to store the escaped string, but the buffer might not be large enough, so flushes the content of the buffer to the stream, then resets the buffer to accept new characters. I don't think preallocating a buffer large enough to store the escaped string will result in noticeable performance gains if we still have to write flush that buffer to the stream after escaping it. I think the main gains from JsonSerializer comes from the fact that when we're writing to the writer's memory, it's in memory and not a stream.

We can see that the JsonWriter also uses an array pool. The ICharArrayPool is actually a custom interface defined in Microsoft.OData.Buffers. But there's no default implementation provided either in the core library or in WebApi. So it's not used in the WebApi's formatter. In principle, an array pool should be passed to the ODataMessageWriterSettings.ArrayPool property. When it's not provided, then JsonWriter simply allocates a new array when BufferUtils.RentFromBuffer is called. Using an actual pool here could save memory allocations and reduce GC pressure.

I also noticed that the JsonValueUtils.CheckIfStringHasSpecialChars method used in OData accounts for 2ms of CPU. It's a simple loop that goes through each character and returns at the first character that's in a character map of special characters. JsonSerializer uses an encoder to find the first special character, the default one is a highly optimized method that first tries to SIMD if supported then uses an unrolled loop when checking the remaining characters (see source here). While 2ms may not be that much on its own, I think this indicates that there are other areas where we could micro-optimize certain methods and the overall result could add up to noticeable improvements.

Another thing to note is that the Utf8JsonWriter can write utf8 bytes directly (JSON is based on utf-8). But C# chars are based on utf16. According to this video, support utf8 directly does lead to better performance, but I don't know whether it would make a difference in our case.

Conclusion

While we should expect the OData writer to be less performant than JsonSerializer because it simply has to do more work, I believe based on these findings that it's possible to gain noticeable performance improvements on ODataWriter without sacrificing the "OData expectations" (e.g. writing context urls and annotations, possible Edm validations, etc.).

Here are a couple of things that I think we should try in an effort to improve performance. Note that I have not yet tried any of them, so I'm not sure what type of gains we should expect or how effective they will be:

  • Pass an ICharArrayPool to the ODataMessageWriterSettings when setting up the writer
  • Consider using Utf8JsonWriter instead of our own JsonWriter if possible (Maybe this could result in public API changes? or adapters)
  • Avoid frequent writes to the stream by using a buffered writer with a decently-sized buffer rented from an array pool
  • Use highly-optimized BCL methods whenever possible instead of reinventing the wheel
  • Scan our methods for micro-optimization opportunities like avoiding memory allocations, avoiding string concatenations, loop-unrolling, etc. wherever such optimizations could add up to decent performance gains

One concern I have regarding our async writer implementation, including in the current ongoing rewrite, is that we make all our write methods async all the way down to the bottom of the stack. (Even writing a single char is async). On the other hand, the JsonSerializer's writer synchronously writes to the underlying buffer's memory and only writes to asynchronously to the underlying stream when the buffer fills. I wonder whether we'll need to reconsider our writer's public APIs if we want to adopt a similar implementation. Or maybe we could have a buffered writer below the AsyncJsonWriter layer that's synchronous for most of the writes even if called from an async method.