Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connectTimeout not fully respected #1397

Closed
ysi-camerona opened this issue Feb 9, 2024 · 11 comments · Fixed by #1399
Closed

connectTimeout not fully respected #1397

ysi-camerona opened this issue Feb 9, 2024 · 11 comments · Fixed by #1399

Comments

@ysi-camerona
Copy link
Contributor

Describe the bug
WebSocketClient takes in a connectTimeout parameter. This parameter is used for the initial socket connection, but not when setting up the websocket connection. Poor network connectivity may mean we can set up the socket connection but not the websocket, leading to scenarios where connectBlocking blocks indefinitely.

To Reproduce
I ran this test on a Debian 10 system, using iptables to simulate the poor connectivity condition.

  1. Set up iptables firewall rules to begin dropping TCP traffic after six packets for a connection have been transferred:
sudo iptables -F INPUT
# Allow direct SSH access
sudo iptables -A INPUT -p tcp --dport 22 -j ACCEPT
# Allow pings
sudo iptables -A INPUT -p icmp -j ACCEPT
# Allow all DNS lookups
sudo iptables -A INPUT -p udp --source-port 53 -j ACCEPT
# For all other connections, only allow the first six packets (both directions considered) to pass
sudo iptables -A INPUT -m connbytes --connbytes-mode packets --connbytes-dir both ! --connbytes 6 -j ACCEPT
# Drop all other traffic
sudo iptables -A INPUT --in-interface eth0 -j DROP
  1. Compile and run this sample: https://github.com/ysi-camerona/java-websocket-bug-feb-9-2024
  2. Observe that the graceful client (client with a fix to respect connectTimeout for the websocket) fails after five seconds, whereas the default client blocks indefinitely

Example application to reproduce the issue
https://github.com/ysi-camerona/java-websocket-bug-feb-9-2024

Expected behavior
I would have expected that if connectBlocking() is called, and connectTimeout has elapsed before onOpen(ServerHandshake) is called, connectBlocking would return false.

Debug log
Logs from the sample application:

INFO  | 2024-02-09 17:05:41 | main | WebsocketConnectionTest | 21 | Graceful client - connecting with 5000 ms timeout
INFO  | 2024-02-09 17:05:42 | WebSocketConnectReadThread-9 | GracefulTimeoutClient | 53 | Set socket timeout to 5000 ms
INFO  | 2024-02-09 17:05:47 | WebSocketConnectReadThread-9 | GracefulTimeoutClient | 41 | onError called
INFO  | 2024-02-09 17:05:47 | WebSocketWriteThread-10 | GracefulTimeoutClient | 36 | onClose called
INFO  | 2024-02-09 17:05:47 | main | WebsocketConnectionTest | 23 | Graceful client - finished connecting, result: false
INFO  | 2024-02-09 17:05:47 | main | WebsocketConnectionTest | 25 | Graceful client - closed connection
INFO  | 2024-02-09 17:05:47 | main | WebsocketConnectionTest | 30 | Default client - connecting with 5000 ms timeout

Nothing else had been logged by 2024-02-09 17:10:00.

Environment:

  • Version used: Java-Websocket 1.5.3, 1.5.5, 1.5.6
  • Java version: 1.8.0_332

Additional context
GracefulTimeoutClient shows a potential fix for this: set the socket SO timeout to the user-defined value in onSetSSLParameters, then set it back to zero after the websocket has connected.

@PhilipRoman
Copy link
Collaborator

Hi, thanks for the very detailed bug report and steps to reproduce! I wish more of our reports were so descriptive 😆

Does this behaviour cause real issues for you? My impression is that the timeout parameter to connectBlocking() should take care of this for most use cases. Either way, I will look into it, but there are multiple high priority issues currently on my backlog, so I can't promise anything.

I looked at your implementation of GracefulTimeoutClient, but I'm not sure why the fix is related to SSL parameters? Does the issue happen only with encrypted WSS? I assume it also affects regular unencrypted websockets.

@ysi-camerona
Copy link
Contributor Author

Hi Philip, thanks for your reply! Here are my responses:

Does this behaviour cause real issues for you?

Not often, but yes. I use Java-WebSocket on devices deployed in the field with cellular connections. The cellular quality can really vary, and I've seen this happen a number of times. When it happens, connectBlocking usually blocks for 10-20 minutes before returning.

My impression is that the timeout parameter to connectBlocking() should take care of this for most use cases

I had looked at this too, but I think using it would cause the connect attempt to continue in the background even after connectBlocking(long, TimeUnit) returns, whereas when the no-arg connectBlocking() returns false, the connection has been torn down. Does this understanding sound right?

I'm not sure why the fix is related to SSL parameters? Does the issue happen only with encrypted WSS? I assume it also affects regular unencrypted websockets.

Good question, I should have mentioned that the fix is unrelated to SSL parameters. I decided to hook my changes in there because I use WSS exclusively. It's a convenient callback that is invoked after the socket is created, but before the first read from the socket's InputStream. Yes, this also affects plain WS connections: I pointed my example project to ws://ws.ifelse.io/ and confirmed that the graceful client stalls out (makes sense since onSetSSLParameters is not invoked).

@PhilipRoman
Copy link
Collaborator

PhilipRoman commented Feb 9, 2024

using it would cause the connect attempt to continue in the background even after connectBlocking(long, TimeUnit) returns, whereas when the no-arg connectBlocking() returns false, the connection has been torn down.

Looks like you're right, it is a flaw in the function design. The private method reset seems like it would do what you expect, but it is only used in reconnect currently. If you don't call reconnect, there is no socket cleanup as far as i can tell.

While I can implement the fix similar to your provided demo, I think the correct way would be making connectBlocking with timeout properly clean up the connection threads on failure. IMO it is not good idea to rely on socket timeouts, as they have a hacky implementation in the JDK on some platforms. With connect timeout+cleanup we could be sure there are no corner cases left.

@ysi-camerona
Copy link
Contributor Author

Agree, having connectBlocking(long, TimeUnit) handle the cleanup on timeout seems like a cleaner option. Would you like me to test that approach and put together a PR?

@PhilipRoman
Copy link
Collaborator

That would be great!

@ysi-camerona
Copy link
Contributor Author

Great, I'll start on that in the next few days with the goal of having it done on Friday.

@marci4
Copy link
Collaborator

marci4 commented Feb 13, 2024

@PhilipRoman ping me if I should make a release for this

@ysi-camerona
Copy link
Contributor Author

@PhilipRoman I spiked out a fix this afternoon but have not tested it extensively: ysi-camerona@f7d51a8

Can you give it a once-over and let me know what you think? I want to make sure we're happy with this approach before testing it thoroughly. The commit message has some additional context. Thanks!

@PhilipRoman
Copy link
Collaborator

I think it is good. I recommend adding null check (and also a comment) before socket.close, as while calling reconnect() on a failed-to-connect socket is questionable, I would still expect it to work.

@ysi-camerona
Copy link
Contributor Author

Thanks, will do! I'll make those changes, add a unit test, and then aim to have a PR ready in the next day or so.

@ysi-camerona
Copy link
Contributor Author

PR submitted: #1399

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants