We’ve investigated Apache CXF Soap Performance in our testing lab, now it’s time to focus on JAX-RS. We’ll be using the same systems, so we’ll get to see the through-put difference between these approaches.
We’ll also get to see some of the config and tuning required to ready the systems to stably run the performance suite. All of our testing will use Adoptium Eclipse Temurin 17 LTS as it’s available for both PPC64LE and X64 Linux systems.
For our lab test we’ll be using the following hardware:
-
Dell PowerEdge R250
-
Intel Xeon E-2378 (8c, 16t)
-
128 GB DDR4 RAM
-
1 Gigabit Ethernet
-
Ubuntu 22.04 LTS
-
-
Raptor Blackbird
-
IBM POWER9 v2 SMT4 Sforza (8c, 32 t)
-
128 GB DDR4 RAM
-
1 Gigabit Ethernet
-
CentOS Stream 9
-
-
Dell PowerConnect 2808 (network switch)
The machines are co-located on the same switch, reducing the number of packet hops.
As of CXF 4.1 the binary distribution will contain a set of performance scripts in the samples folder. Options to test JAX-WS and JAX-RS are present.
At its core, the performance harness is a client-server request/response automation. On startup the script initializes and warms up the JVM for executing mass calls.
The client host runs a number of threads, each running a CXF client which calls the server host. For JAX-RS testing, we have a choice of calling using a verb (GET, POST, PUT, DELETE). The client side harness will run N threads for M times for the specified duration.
Once the time duration has been met, it will cease the executing clients, and tabulate the total calls.
In our previous performance lab we were attempting to achieve 1 Billion invocations in an eight-hour period. Let’s see what JAX-RS can do.
Before we start our labs we shall run a few 60-second quick tests to dial in client counts for our systems (x64 client → PPC64LE server, PPC64LE client → x64 server).
Clients | Target Calls/Second per client | Quick Test (Reality) Calls Per Second Per Thread on x64 client | Quick Test (Reality) Calls Per Second Per Thread on PPC64LE client |
---|---|---|---|
1 | 34722.2 | 1338.6 | 665.55 |
8 | 4340.27 | 2386.96 | 2325.85 |
16 | 2170.14 | 1728.17 | 1694.91 |
32 | 1085.07 | 1414.77 | 867.71 |
64 | 542.53 | 852.64 | 470.66 |
128 | 271.27 | 510.38 | 229.56 |
256 | 135.63 | 237.67 (sweet spot) | 117.10 |
512 | 67.81 | 116.97 | 57.68 |
1024 | 33.90 | 59.10 | 32.07 |
2048 | 16.95 | 30.58 | 16.98 (best fit) |
To run the performance harness we change directory into samples. Within this folder we’ll build the base harness and the various scenarios.
On each host we will open a terminal to the CXF distribution samples folder.
We’ll ensure we have JAVA_HOME and MAVEN_HOME environment variables set.
For our runs we’ll use Adoptium Eclipse Temurin 17 LTS as Client and Server side JVM.
We set our Heap size to 8GB.
MAVEN_OPTS="-Xms32m -Xmx8192m -Dmaven.artifact.threads=5"
$ cd samples
$ mvn clean install
$ cd performance/jaxrs
On the Server host we’ll execute the following maven profile:
$mvn -Pserver -Dhost=0.0.0.0 -Dprotocol=http
On the Client host we’ll execute the client profile, supplying instructions to use get operation, 256 threads (simulate 256 clients), over a time of 8 hours (60 x 60 x 8 = 28800 seconds).
$mvn -Pclient -Dhost=192.168.50.154 -Dprotocol=http -Doperation=get -Dthreads=256 -Dtime=28800
For the purposes of our lab test, we’ll allow the suite to execute without added agents to the JVM.
On our first iteration we quickly encountered a runtime error.
Client Side:
ConnectException invoking http://192.168.50.154:9000/customerservice/customers/123: Cannot assign requested address
Given our quick tests indicated we have valid configuration for connection between client and server side, we’ll attempt reduce thread count on our second run.
$mvn -Pclient -Dhost=192.168.50.154 -Dprotocol=http -Doperation=get -Dthreads=128 -Dtime=28800
Client Side:
ConnectException invoking http://192.168.50.154:9000/customerservice/customers/123: Cannot assign requested address
The "Cannot assign requested address" tends to indicate that we’re saturating the port with so many connections.
$mvn -Pclient -Dhost=192.168.50.154 -Dprotocol=http -Doperation=get -Dthreads=64 -Dtime=28800
This quickly failed as well.
Checking ulimits, file count was restricted to 1024. We update this to 10240 and retest.
$mvn -Pclient -Dhost=192.168.50.154 -Dprotocol=http -Doperation=get -Dthreads=256 -Dtime=28800
Server Side:
Aug 08, 2024 8:43:42 AM org.eclipse.jetty.server.AbstractConnector handleAcceptFailure
WARNING: Accept Failure
java.io.IOException: Too many open files
We need to increase the number of available file handles on our systems.
$sudo vi /etc/security/limits.conf
* soft nofile 655350
* hard nofile 655350
Restart system.
$ulimit -n unlimited
$ulimit -n
655350
Lets retry our initial test case:
$mvn -Pclient -Dhost=192.168.50.154 -Dprotocol=http -Doperation=get -Dthreads=64 -Dtime=28800
Results in:
Cannot assign requested address
The server side file handle exhaustion appears to be managed. The client side is still experiencing bind exceptions. We are going to resolve the bind exceptions and get this lab system rolling!
So the issue we’re hitting is called ephemeral port exhaustion.
[jgoodyear@localhost jaxrs]$ cat /proc/sys/net/ipv4/ip_local_port_range
32768 60999
Our systems local port range is about 28k connections (60999 - 32768). Our testing scenario has been attempting to push 256 threads x 237.67 calls/second == ~60843 calls/second - we exhaust the range, which reports as a bind exception to us.
We have a couple of options to improve our performance:
-
Increase port range (this has limits 65535 for IPV4 or IPV6)
-
Tweak time wait settings (not something we generally want to do)
-
Add NIC ports to scale range (load balancing clients over addresses)
We extend our port range as follows:
$ sudo sysctl -w net.ipv4.ip_local_port_range="15000 64000"
net.ipv4.ip_local_port_range = 15000 64000
This provides us with some 49000 ephemeral ports.
Now lets re-run our table of values, with 49k ports in use as a ceiling value (also retaining the other configuration changes).
Clients | PPC64LE Server / X64 Client in Calls/Second | New Connections (Threads x Calls/Second) | PPC64LE Client / X64 Server in Calls/Second | New Connections (Threads x Calls/Second) |
---|---|---|---|---|
1 | 1196.50 | 1196.50 | 1264.84 | 1264.84 |
8 | 2448.19 | 19585.52 | 2182.08 | 17456.64 |
16 | 1886.69 | 30187.04 | 1590.65 | 25450.4 |
32 | 1449.83 | 46394.56 | 1019.92 | 32637.44 |
64 | 942.33 | 60309.12 | 553.47 | 35422.08 |
These numbers represent new connections happening in a 1-second period - many of those ports are going to be in use, so we do not expect new connections/second to be through put sweet spot.
In theory having 49000 active connections/second will get us to 49000 x 28800 = 1,411,200,000 calls processed in an eight-hour period.
$mvn -Pserver -Dhost=0.0.0.0 -Dprotocol=http
$mvn -Pclient -Dhost=192.168.50.154 -Dprotocol=http -Doperation=get -Dthreads=16 -Dtime=28800
While running the perf suite, we observe:
Server Side:
[jgoodyear@localhost ~]$ ss -s
Total: 34371
TCP: 39980 (estab 16011, closed 6131, orphaned 0, timewait 6131)
Client Side:
jgoodyear@jgoodyear-PowerEdge-R250:~$ ss -s
Total: 41580
TCP: 40883 (estab 16010, closed 0, orphaned 0, timewait 0)
Several minutes later however we observed:
jakarta.ws.rs.ProcessingException: java.net.ConnectException: ConnectException invoking http://192.168.50.154:9000/customerservice/customers/123: Cannot assign requested address
We still ran out of ephemeral ports!
Our performance client is not closing out connections.
Our original client code:
try {
Response respGet = webClient.get();
Asserts.check(respGet.getStatus() == 200, "Get should have been OK");
}
Can you spot the connection leak?
Here’s a hint - the Response object retains an input stream.
We update the test client code to force response objects to close their streams. We resolve this by allowing the auto close feature to close out connections.
try (Response respGet = webClient.get()) {
Asserts.check(respGet.getStatus() == 200, "Get should have been OK");
}
With this change in place, we setup to run another test.
$mvn -Pserver -Dhost=0.0.0.0 -Dprotocol=http
$mvn -Pclient -Dhost=192.168.50.154 -Dprotocol=http -Doperation=get -Dthreads=16 -Dtime=28800
This resulted in:
=============Overall Test Result============
Overall Throughput: get 1772.9019709123222 (invocations/sec)
Overall AVG. response time: 0.5640469785734444 (ms)
8.16959077E8 (invocations), running 460803.29899999994 (sec)
============================================
In this run our system stability obtained 816,959,077 calls in an eight-hour period. Given we were running just 16 clients, this number is pretty good (comparing to our JAX-WS perf testing).
During the test run we observed the following socket statistic:
PPC64LE Server Side:
[jgoodyear@localhost ~]$ ss -s
Total: 545
TCP: 23 (estab 18, closed 0, orphaned 0, timewait 0)
x64 Client Side:
jgoodyear@jgoodyear-PowerEdge-R250:~$ ss -s
Total: 720
TCP: 5013 (estab 18, closed 4990, orphaned 0, timewait 4990)
Our concurrent connections appeared to be stable.
We’re ready to ramp up connections!
Lets run 32 Client threads and check for stability, and throughput.
This resulted in:
=============Overall Test Result============
Overall Throughput: get 1363.052828409949 (invocations/sec)
Overall AVG. response time: 0.7336472799565198 (ms)
1.256210783E9 (invocations), running 921615.624 (sec)
============================================
1,256,210,783 calls processed in eight-hour period.
Our system still appears stable, lets run 64 Clients (PPC64LE server, x64 Clients).
This resulted in client side exceptions:
jakarta.ws.rs.ProcessingException: java.net.ConnectException: ConnectException invoking http://192.168.50.154:9000/customerservice/customers/123: Cannot assign requested address
We hit the port range limit again.
One more configuration to try before swapping machine roles, in this run we’ll try 48 clients.
This resulted in :
=============Overall Test Result============
Overall Throughput: get 1092.5105255070816 (invocations/sec)
Overall AVG. response time: 0.9153229892552811 (ms)
1.510322517E9 (invocations), running 1382432.921 (sec)
============================================
This time we managed 1,510,322,517 calls in eight-hour period.
Lets turn roles around, running x64 server-side, and PPC64LE clients. Same configurations and tunings applied to each host. We will start with 32 clients.
This resulted in:
=============Overall Test Result============
Overall Throughput: get 831.0453771395204 (invocations/sec)
Overall AVG. response time: 1.203303727459535 (ms)
7.65905672E8 (invocations), running 921617.1500000001 (sec)
============================================
Our first run yielded 765,905,672 calls in an eight-hour period.
Lets double clients to 64.
This resulted in:
=============Overall Test Result============
Overall Throughput: get 413.4362013002846 (invocations/sec)
Overall AVG. response time: 2.4187528737322297 (ms)
7.62073003E8 (invocations), running 1843266.266 (sec)
============================================
This run managed to process fewer calls at 762,073,003 in eight hours.
Our prior tables suggested we need a higher number of threads to achieve maximum throughput, so we’ll try 256 clients for our next run. We’ll observe socket statistics for system pressure.
This resulted in:
=============Overall Test Result============
Overall Throughput: get 105.19034791566503 (invocations/sec)
Overall AVG. response time: 9.506575648953428 (ms)
7.75651656E8 (invocations), running 7373791.1450000005 (sec)
============================================
This time we processed 775,651,656 calls in eight-hours. A slight improvement, but our average response time is starting to suffer for a slight throughput gain.
The PPC64LE acting as client host did manage to keep its connections stable:
[jgoodyear@localhost ~]$ ss -s
Total: 787
TCP: 7012 (estab 260, closed 6747, orphaned 0, timewait 6747)
As our first foray into JAX-RS performance testing, we quickly learned about system resources that would become bottlenecks. Once we adjusted those values we could start running our eight-hour test cases.
The key bottleneck per system turned out to be managing client side ephemeral port exhaustion. Ensuring our clients close in-use ports as quickly as possible was our first major improvement towards running our test cases, dialing in the total number of client threads was the second.
Parameter | Setting |
---|---|
MAVEN_OPTS | -Xms32m -Xmx8192m |
file handle ulimit | /etc/security/limits.conf hard & soft limits increased to 655350 |
ip_local_port_range | sysctl -w net.ipv4.ip_local_port_range="15000 64000" to allow 49K connections. |
Compared to our JAX-WS performance testing our JAX-RS runs managed to process more calls in total while running on modest sized heaps.
The PPC64LE system acting as client or server appeared to be less sensitive to port exhaustion than the X64 machine, however it still would suffer the same bottleneck.
The X64 system appeared to run out of gas to process more requests (32 → 256 clients yielded very similar results), it would be interesting to run JVM tunings here to see if there is another bottleneck at play.
There are of course more scenarios we could test, which we intend to perform in follow-up posts.
-
Retest on Java 21 LTS
-
Larger Heap spaces
-
Adjust thread stack size
Please do not hesitate to reach out with questions and comments, here on the Blog, or through the Savoir Technologies website at https://www.savoirtech.com.
Thank you to the Apache CXF community.
(c) 2024 Savoir Technologies