Byte Array Benchmarks

Motivation

The benchmarks were done to improve performance of getting values for keys via Java API of RocksDB database.

Environment

Benchmarks were run on Windows 10 Home with Intel(R) Core(™) i7-6700K CPU with 4 cores and clock speed of 4GHz. The JVM version was as follows:

openjdk version "11.0.8" 2020-07-14
OpenJDK Runtime Environment (build 11.0.8+10-post-Ubuntu-0ubuntu118.04.1)
OpenJDK 64-Bit Server VM (build 11.0.8+10-post-Ubuntu-0ubuntu118.04.1, mixed mode)

Methodology

JMH Java harness was used to run the benchmarks. The benchmarks were running in SampleTime mode with a time unit set to nanosecond. There were 100 warm up iterations and 500 measurement iterations. The process of running all the benchmarks with above settings was repeated separately 10 times, which produced 10 CSV result files. The files were processed and the results were visualized using Python script with Pandas and Matplotlib libraries.

Implementation

There were two benchmark classes implemented: ByteArrayFromNativeBenchmark - to compare different options for writing byte arrays on native side and returning it to Java and ByteArrayToNativeBenchmark - to compare different options for passing byte array. There is a common part of code for all the benchmarks, which mocks getting data for the returned byte array:

std::string value = GetByteArrayInternal(key);

The function GetByteArrayInternal() copies the first 38 bytes of the input byte array and finds the return byte array in an std::unordered_map structure. Depending on the key, one of 6 different values can be returned with sizes as follows: 10, 50, 512, 1024, 4096 or 16384 bytes.

ByteArrayFromNativeBenchmark

This part presents performance of writing byte array data in native methods and returning it. There were 9 approaches compared, in each of them key is passed as Java byte array (byte[]) and read using JNI function GetByteArrayRegion unless otherwise stated:

basicGetByteArray - Java byte array for value is allocated and written using JNI functions: NewByteArray and SetByteArrayRegion
preallocatedGetByteArray - byte array for value is allocated in Java and then passed to native method for writing using SetByteArrayRegion JNI function
bufferGetByteArray - ByteBuffer is instantiated in native method, its array() method is called to get its underlying byte array, finally GetPrimitiveArrayCritical and memcpy is used to write
directBufferGetByteArray - direct ByteBuffer is instantiated with value using JNI function NewDirectByteBuffer and returned
directKeyAndValueBuffersPreallocatedGetByteArray - there is a direct ByteBuffer allocated for key and a direct ByteBuffer allocated for value on Java side; in native method GetDirectBufferAddress JNI function is used to get native pointer to array and memcpy is used to write to it
directValueBufferOnlyPreallocatedGetByteArray - there is a direct ByteBuffer allocated for value on Java side; in native method GetDirectBufferAddress JNI function is used to get native pointer to array and memcpy is used to write to it
bufferValueOnlyPreallocatedGetByteArray - there is a ByteBuffer allocated for value on Java side; in native method ByteArray.array() is called to get underlying byte array and then GetPrimitiveArrayCritical and memcpy are called to write to it
preallocatedGetByteArrayWithGetPrimitiveArrayCritical - similar to preallocatedGetByteArray, but GetPrimitiveArrayCritical and memcpy are used for writing
unsafeAllocatedGetByteArray - byte array is allocated using sun.misc.Unsafe class in Java, then the memory address is passed to native method as long primitive data type

ByteArrayToNativeBenchmark

This part presents the performance of passing byte array data to native method for reading. There were 6 approaches compared, in each of them value is written to Java byte array, which is allocated in native method via JNI:

passKeyAsByteArray - Java byte array is passed to native method
passKeyAsByteArrayCritical - Java byte array is passed to native method and read using GetPrimitiveArrayCritical JNI function
passKeyAsDirectByteBuffer - direct ByteBuffer is passed to native method for reading
passKeyAsDirectByteBufferWithAllocate - direct ByteBuffer is allocated in Java and passed to native method for reading
passKeyAsUnsafe - byte buffer is passed as address to off-heap memory allocated with sun.misc.Unsafe outside of the benchmark
passKeyAsUnsafeWithAllocate - byte buffer is allocated using sun.misc.Unsafe and passed as address to memory

Results

Results for all approaches per byte array size were plotted with error as vertical segment.

ByteArrayFromNativeBenchmark

Byte array with 10 bytes

For getting 10 bytes arrays, preallocatedGetByteArrayWithGetPrimitiveArrayCritical proved to be the fastest method, just slightly better than unsafeAllocatedGetByteArray and preallocatedGetByteArray.

Byte array with 50 bytes

For getting 50 bytes arrays, preallocatedGetByteArrayWithGetPrimitiveArrayCritical, unsafeAllocatedGetByteArray and preallocatedGetByteArray.seems to be equally fast.

Byte array with 512 bytes

For getting 512 bytes arrays, unsafeAllocatedGetByteArray proved to be the best method with directBufferGetByteArray being second.

Byte array with 1024 bytes

See “Byte array with 512 bytes” above.

Byte array with 4096 bytes

See “Byte array with 512 bytes” above.

Byte array with 16384 bytes

For getting 16384 bytes arrays, unsafeAllocatedGetByteArray and directBufferGetByteArray are significantly better than the rest methods. Also, the difference between their performances seems to be less significant. Approach from unsafeAllocatedGetByteArray is slightly better.

ByteArrayToNativeBenchmark

Byte array with 38 bytes

For getting 38 bytes arrays, passKeyAsUnsafe, passKeyAsUnsafeWithAllocate, passKeyAsByteArrayCritical and passKeyAsDirectByteBuffer has all similarly good performance. passKeyAsByteArray is slightly worse and passKeyAsDirectByteBufferWithAllocate is significantly worse than the other methods.

Byte array with 128 bytes

For getting 128 bytes arrays, passKeyAsUnsafe seems to be slightly better than the other methods.

Byte array with 512 bytes

See “Byte array with 128 bytes” above.

Word on errors

The charts show results' errors as vertical segments. It is noticed that outliers usually have bigger errors, which leads to the conclusion that these samples should be discarded. The methods with more outliers or rather with samples that are more scattered along y axis (time) should be treated as less stable and possibly more affected by JVM or system state.

Conclusions

In the case of getting native byte array through native Java method the fastest approach depends on array size. For small arrays, allocating a byte array in Java and using GetPrimitiveArrayCritical JNI function and memcpy to write to the array seems the best option. The same method, but using SetByteArrayRegion seems to be almost as good together with using sun.misc.Unsafe to allocate the memory and memcpy to write to it. For bigger arrays, using sun.misc.Unsafe and memcpy becomes the better choice. For arrays with 16 kilobytes both sun.misc.Unsafe and direct ByteArray are almost equally good.
Benchmarks for reading byte arrays from different sources could be performed to better measure and compare sun.misc.Unsafe fitness for getting byte arrays.
In the case of passing native byte array to native Java method the fastest approach also depends on array size, but to a lesser degree. The results are also more unstable across benchmarking runs. For smaller arrays the results seem to favor getting data from a Java byte array using GetPrimitiveArrayCritical. For larger arrays (over 512) passing a byte array in memory allocated by sun.misc.Unsafe seems better. At the same time if overhead of copying data from a Java byte array to off-heap memory is taken into account, then sun.misc.Unsafe is a worse option.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Byte Array Benchmarks

Motivation

Environment

Methodology

Implementation

ByteArrayFromNativeBenchmark

ByteArrayToNativeBenchmark

Results

ByteArrayFromNativeBenchmark

Byte array with 10 bytes

Byte array with 50 bytes

Byte array with 512 bytes

Byte array with 1024 bytes

Byte array with 4096 bytes

Byte array with 16384 bytes

ByteArrayToNativeBenchmark

Byte array with 38 bytes

Byte array with 128 bytes

Byte array with 512 bytes

Word on errors

Conclusions

Clone this wiki locally