-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
P999 may not be accurate if Never Send is high. #15
Comments
Thanks! Indeed, with insufficient resources the client itself can become the bottleneck. We typically run the load generator with spinning kthreads - see here and many cores. When one client is insufficient to generate load, we typically use multiple machines. What are the details of your machine, and what configurations are you using for your client? |
The cpu is Intel Xeon 2.20GHz with 20 Hyper-Threadings (10 Phy cores), which are set to performance mode. The network is 100Gb RDMA.
I also notice that even client stresses at a low throughput (like 0.75 M) where resources should be sufficient, Never Send rate is still above 1%. |
Can you post the output of a client here (and the parameters used to launch it)? Looking at some recent runs I see that even at 1MPPS my never sent rate is < .1% |
Here is example to run 0.8M throughput. synthetic --config synthetic.config 10.100.100.102:5190 --output=buckets --protocol memcached --mode runtime-client --threads 16 --runtime 32 --barrier-peers 1 --barrier-leader node151 --distribution=exponential --mpps=0.8 --samples=1 --transport tcp --nvalues=3200000 And
|
Hm, that is quite high. Can you post a log with many samples at lower loads (change the above command to --samples 20). Also can you try reducing the number of kthreads to 8 and see if that has any impact? |
I run server with
It makes the cores Would you plz share your configuration of server? |
The server had 20 kthreads (20 guaranteed, 0 spinning). Does varying the server configuration impact the client behavior here? |
Yes. I think 20 kthreads can handle 1M pps. You can try my configuration. |
The point here is not how many guaranteed kthreads the Caladan server used. Instead, given the number of guaranteed kthreads (say 4 cores), we send client requests at a rate that is close (but lower) to the maximum capacity that the Caladan server can handle (say 1 Mpps), in this setup, no matter how many physical cores are used at client machines (even one core per connection), Never-Send rate is always high. As a result, the generated requests exhibit a distribution which is less bursty as expected. W/ our modified clients (disable scheduling and let the softirq process one packet at a time), Never-Send rate is low. At this time, the generated requests follow a distribution that is more consistent to a Poisson distribution, but Caladan’s P999 latency becomes much higher. |
Does this behavior change if you use many connections to the server? Say 100? |
I'm trying to understand where the source of the delay is coming from that is causing so many never-sent packets. Please correct me if I am wrong in understanding the scenario here: the server machine is being tested at a load point close to its peak throughput. The client process/machine is not at full utilization and is not a bottleneck. Does this seem correct? |
Yes, this is correct |
It seems the Never Send rate becomes higher as # of connections grows. |
I'd be interested in trying to reproduce these results since they generally don't match what I've seen in my setup so far. Can you provide me the commit numbers that you are running on for caladan and mecached, the configuration files for both clients and server, the launch parameters and output logs for the iokernel, memcached, and the loadgen instances? |
Configs for Replaycaladan-all: 37a3822be053c37275f0aefea60da26246fd01cb Client
Server
Other Setups
|
Would you like to share your configuration and result? Specially, the Never Send when the request rate is close to the maximum capacity that the Caladan server can handle. |
Can you also share the outputs/logs from the various programs that you've launched? Also, caladan-all @ 37a3822b points to caladan @ 4a254bf, though I see some of your configurations imply a later version of caladan (ie using the directpath_pci config etc). Can you please confirm the version that you are running, and whether there are any modifications that are made to it? |
We use Caladan @ 1ab79505 and memcached from caladan-all @ 37a3822b. |
Hi all,
I find the
synthetic
inCaladan
endures a high Never-Send rate (above 1%) when clients issue requests with a relatively high rate which is close to server’s capacity.This is especially problematic under a Poisson distribution: when two adjacent requests are generated within a short time window (i.e., a bursty period), the latter one is more likely to be droped due to the Never-Send logic (see code). We have profiled Caladan's client logic, and find that the scheduling often causes the request to be delayed (which has already violated the Poisson distribution) and finally be dropped.
We further designed an experiment to confirm this. We modify the Caladan client with the scheduling policy disabled: specifically, workers are bound to different cores, which can execute
send
,do_softirq
(directpath),handle_timeout
, andrecv
in cycles without yielding.We equip Caladan server with 4 kthreads and launch 16 client workers (each of which owns a TCP connection) to generate requests w/ a Poisson distribution and vary the request rate (last for 32 seconds). The following table shows the experiment result:
The text was updated successfully, but these errors were encountered: