Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignoring PRESENTATION every minute for long time #230

Open
HenryNe opened this issue Nov 19, 2020 · 8 comments
Open

Ignoring PRESENTATION every minute for long time #230

HenryNe opened this issue Nov 19, 2020 · 8 comments

Comments

@HenryNe
Copy link
Contributor

HenryNe commented Nov 19, 2020

After long running (30 days) and a short network problem, connection between 2 of 8 nodes are not establisht.
This message comes every minute on both nodes:

Accepting PRESENTATION from 4.3.2.7:12000 ..
Session established with 4.3.2.7:12000.
Cipher suite: ecdhe_rsa_aes256_gcm_sha384
Elliptic curve: sect571k1
Added system route: eth0 - 4.3.2.7/32 => 188.138.112.1 - metric 0
Ignoring PRESENTATION from 4.3.2.2:12000 as an active session currently exists with this host.
Error deciphering data message from 4.3.2.7:12000: error:00000000:lib(0):func(0):reason(0)
Error deciphering data message from 4.3.2.7:12000: error:00000000:lib(0):func(0):reason(0)
Session with 4.3.2.7:12000 lost (timeout).

All other 6 connections (of totaly 7 for 8 nodes) works perfectly over all the time.
After ~2 hours the connection was stable again.

The same issue have some days before, and I have fixed it by restart freelan on one host.

I feel, both nodes start the connection at the same time. Then both see the other connection and both terminate the session. After exatly 1 Minute they starts the same again.

It is possible to add a random delay before they try reconnection?
Can I setup an unique delay for every host. So, they not try connections in same time interval (1 minute)?

@richman1000000
Copy link

richman1000000 commented Nov 19, 2020

I think this is issue with your internet. I had similar issue.
this 2 messages
Ignoring PRESENTATION from 4.3.2.2:12000
and
Error deciphering data message from 4.3.2.7:12000: error:00000000:lib(0):func(0):reason(0)
are not related.

@HenryNe
Copy link
Contributor Author

HenryNe commented Nov 19, 2020

I should say, that I use UDP and one of the host stays behind a NAT.
Is the error a side effect of concureny PRESENTATION and the UDP port over NAT?

Both hosts have good connections to 6 other nodes at this time.

If I stop freelan for 5 or more seconds on one of the hosts, and start it again, than all will kork.
But, if I use "restart", then the connection comes not back again.

@HenryNe
Copy link
Contributor Author

HenryNe commented Nov 19, 2020

Here are last lenes of syslogs, 2 minutes before the connection establisht. The timestamps in hosts are in synch.

15:58:14 host_6: Accepting PRESENTATION from host 1
15:58:14 host_1: Ignoring PRESENTATION from host 6
15:58:24 host_1: Session lost (timeout)

15:58:46 host_1: Accepting PRESENTATION from host 6
15:58:46 host_6: Ignoring PRESENTATION from host 1
15:58:47 host_6: Session lost (timeout)

15:59:14 host_6: Accepting PRESENTATION from host 1
15:59:14 host_1: Ignoring PRESENTATION from host 6
15:59:24 host_1: Session lost (timeout)

15:59:46 host_1: Accepting PRESENTATION from host 6
15:59:46 host_6: Ignoring PRESENTATION from host 1
15:59:47 host_6: Session lost (timeout)

host_1.txt
host_6.txt

@richman1000000
Copy link

this error - is definitely network issue "Error deciphering data message from 4.3.2.7:12000: error:00000000:lib(0):func(0):reason(0)"

on your NAT routers did you made port forward? or you using Dynamic contacts?

@HenryNe
Copy link
Contributor Author

HenryNe commented Nov 21, 2020

Only one host is behind a NAT with a static port forward.
All 8 hosts have 7 entries in "contact=", and all have a static IP address. See config for node1:
freelan.conf.txt

@HenryNe
Copy link
Contributor Author

HenryNe commented Nov 21, 2020

I don't belive a network problem, because 6 other connections between the other 7 hosts have not this issue. Typicaly reconnect 1 minute after a problem.

  • host_0 no errors
  • host_1 14:08 Session lost with host_4, 6 * 14:09 reconnected with host_4 * 16:00 with host_6
  • host_2 14:08 Session lost with host_4, 5, 6, 7 * 14:48 reconnected with all
  • host_3 no errors
  • host_4 14:08 Session lost with host_1, 2 * 14:09 reconnected with all
  • host_5 14:08 Session lost with host_2 * 14:47 reconnected
  • host_6 14:08 Session lost with host_1, 2 * 14:08 reconnected with host_2 * 16:00 with host_1
  • host_7 14:08 Session lost with host_2 * 14:09 reconnected with host_2

@richman1000000
Copy link

ok. try to check PMTU on between both nodes, or fix mtu withing freelan config

@HenryNe
Copy link
Contributor Author

HenryNe commented Dec 11, 2020

Have checked MTU between all nodes. It is 1500 every there.
I used command like this

# ping -c 1 -M do -s 1472 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 1472(1500) bytes of data.
76 bytes from 8.8.8.8: icmp_seq=1 ttl=119 (truncated)

--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 22.291/22.291/22.291/0.000 ms

A negativ check:

# ping -c 1 -M do -s 1473 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 1473(1501) bytes of data.
ping: local error: Message too long, mtu=1500

--- 8.8.8.8 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants