-
-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
serve expired fails when DNSSEC is enabled #994
Comments
After reproducing the issue I can clearly see that the issue is not related to DNSSEC.
This happens because on step 3 Unbound will get the new message from upstream which includes both A and NS records. Coming to the last step, SERVFAIL is happening because of the upstream broken connection.
(The TTL values are relative to current time; not the actual values) The next second all this is resolved since the message is considered expired and the sanity check does not complain. |
gthess@ thanks for identifying the transient servfail issue which recovers quickly. But i don't think it's related to this issue. It seems like the manual reproduce i provided above may not give consistent failures sometime it succeeds. Therefore i uploaded some supporting data here to show a new one line command that has better chance of reproducing it. Inside the zip you will find Looking at the verbose unbound log, the issue is likely related to DNSSEC (validator module). It seems like when delegations expire and have to be refreshed due to incoming query and/or could be prefetch-key, if there is an outage to upstream (i.e. .com servers in this case) the DNSSEC is unable to do the validation because of reaching single delegation point limit
Because the validator module comes before iterator it doesn't give the query a chance to be answered from stale cache. This explains why turning off validator module seems to be consistently working. Can you please take a look and confirm this theory when you get a chance. |
I am afraid I still can't reproduce with the provided script. The serve-expired logic (looking for expired records before replying) when I am digging through the logfile in the mean time in case something pops up. |
You may need to run the script multiple times if you haven't already. This is the unbound config i used. These are the configs i modified from the default config that comes with the install.
|
I am confused now :) This config has no serve-expired activated options |
Ah my mistake! The unbound config i provided was hand jammed and got mixed up with multiple installations of unbound in my box. Was the unbound log i provided earlier helpful? I think i found a possible reason why it's hard to reproduce. I got a new box and installed unbound and was unable to reproduce it as well. When comparing with past boxes i noticed a subtle difference in the environments. In the older boxes, unbound was getting many queries in the background from other on host clients, whereas the new box was not. So here is another attempt at reproduce that seem to work for me on the new box consistently which i presume is more similar to your testing environment. This time i happened to use unbound 1.12.0
Unbound log file for this run |
I was able to reproduce it at the end with the new information and I discovered the following when using
For this specific issue what was happening based on timing was either:
I'll need to schedule and address all the above since more scenarios could be problematic based on timing and combination of modules. |
#1143 fixed the generic case of preferring expired answers to resolution/validation errors. The specific case mentioned here (affected insecure zone because of parent DNSSEC errors) is also introduced as an explicit test case for this behavior. |
Describe the bug
Unbound's serv-expired feature seems to break when DNSSEC is enabled.
To reproduce
Steps to reproduce the behavior:
/etc/unbound/unbound.conf
and ensure DNSSEC and serving stale is enabled. Also reduce cache max ttl for reproduce. See sample config below.unbound-control -c /etc/unbound/unbound.conf
dig s3.amazonaws.com
Expected behavior
Repeat steps from 1 - 5 with DNSSEC disabled. Change module-config to:
module-config: "iterator"
With DNSSEC disabled Unbound responds from stale cache. This is confirmed by the 30 second TTL on answers.
System:
Linux 5.10.201-191.748.amzn2.aarch64 GNU/Linux
unbound -V
output:Both Unbound 1.12.0 and 1.160.0 seems to have the same issue.
Additional information
prefetch: no
andprefetch-key: no
configs.The text was updated successfully, but these errors were encountered: