-
-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Critical Bug: After FTL Queries First Per-Domain Upstream , FTL Sticks to it for All Client Queries, Ignoring Default DNS #2169
Comments
Thank you for this very detailed report. We are in the past phase of preparing the v6 release of Pi-hole which has large changes in all the topics this issue covers. Before I try to reproduce all this on v6, could I maybe ask you to try to reproduce the same on a |
Thanks for the swift response. I set up the new docker, at some point realized that the config file has changed and Dnsmasq.d is default ignored, but after fixing this part, I was able to test again. Regarding syntax or errors in the file, they can be corrected without removing the custom files, but it's very unclear from the new UI, whether there were any problems in loading any custom data, or if it was uploaded at all. The FTL system errors and diagnostics warnings no longer appear, and while you reboot or reload the resolver from system settings, you have to check all the way under pihole logs, to see that everything did in fact load. Re the wrong DNS issue, it’s actually worse, more confusing, very inconsistent and random, but there are serious issues. The prior before (1.1.1.1 conditional “gluing” all post queries) has changed, but now domains that are not listed as conditions and are supposed to take the default route, may be routed through the conditional DNS servers, the conditionals ,may route via default, one condition may take the route of the other, or even both. Also, the query log often does not match what Pihole log shows, either showing correct routing while not so, the opposite, or often falsely showing resolved from cache. With one example, dig command and pihole log shows resolving to nonexistent domain, through the same route that a manual check produces a valid IP result. Of course there are also current and consistent resolutions, but in this case it's binary - either works and can be trusted, or not. Edit: Eventually, without any use of the system for a few hours, it appears all queries the system does 'resort' to using the 1.1.1.1 consistently on any domain...:( Some examples:
Whlle pihole log shows it was actually routed to both!
Atttv on pihole log incorrectly sent to 8.8.8.8, while on query log, showing up as cache, and response no data (dig @pihole 172.16.0.2), while dig @8.8.8.8 (the actual stated queried server, though the wrong one), shows complete response with resolved IP Akamaihd.net, shows on pihole log sent to 1.1.1.1 this time correctly, with result nxdomain (so far correctly),while on query log it shows as served from cache (though it had never been never queried before that). From Pi-hole log-
From Query log in contrast:
Response: dig atttvnow.com @172.16.0.2 (pihole) - NXDOMAIN
VS dig atttvnow.com @8.8.8.8 atttvnow.com with ipv4 results:
Edit: And a few hours later:
I hope this is enough for you to go on for now, let me know if I can do anything else to help, since it’s really critical and I would like to do anything I can to help. |
Okay, thank your for your additional work, let me walk though your comments here one-by-one for clarity: 1. abcnews.com configured to route to 1.1.1.1
This is actually expected and by design. Whenever you have defined more than one possible upstream server,
While you configured Conclusion: Everything is correct here. 2. domains in the 1.1.1.1 list sequentially queried:atttvnow.com and akamaihd.net.
|
And yes it still leaves the fact that the incorrect domain was quarried and appears on pi hole log, but on query report shows up as cache, as do other successfully and sometimes incorrectly resolved often do too.
Re more output, I hope you noticed the output at the end of my report, where query after query, non of which or on the list, were incorrectly routed to 1.1.1.1. While before, this was an absolute trigger that RESET itself when you switched devices, while not absolute, the mere passage or time, eventually ended up with the same result. So you can add those if you missed it: att.com, abba.com, baby .com. Max.com (correct but on the 1.1.1.1 list)
Anyway, you asked so I ran a few more before hitting the comment button (24 hours later), not in lists all but one routed incorrectly again to 1.1.1.1 : elephant.com, monkey.com, wave.com, wind.com, dog.com, player.com (vs bbciplayer.com which is on the 9.9.9.9. list and as before for the most part with 9.9.9.9, was correct e3494.e2.akamaiedge.net also 9.9.9.9 correct, a RARE default 8.8.8.8 correct resolution - telephony.goog, however brown.com, black.com (not on list) 1.1.1.1 again after that (see below may have missed a couple). If you'd like entire logs, don't hesitate to ask. Here is a new list, and a few more suggestions:
I may have mentioned it (I was really out when I wrote you last night), but I think that the lack thereof an indication of error, either loading (including when nothing loads) on system, right where you flush network table, restart FTL, and an indicator in the logs area, is prone to errors, especially since the format changed so much. It took me a while in the beginning to understand that I was running making a list for you, without dnsmsq even loading (don't worry I didn't include those). The fact that you use the exact same hidden pihole (formerly FTL) log file, to test known bugs, and to deal with unknown or expected ongoing issues, is a problem (before, FTL would fail miserably and barely 'pick itself up', but you received immediate feedback on the relevant system page, plus pointer to the relevant line, and omitting the comprehensive report from the UI, I think is a mistake. And, dnsmasq should by default be set to "true", since not only this is the expected behavior, there is really no downside in enabling it, and I know some people that can barely install the software, let alone edit config files buried deep close to docker root directory. BTW.- docker compose password is not accepted at first login, and the API for the pihole remote app, with both update to password and token, can not connect (and would even be an efficient tool if it connected not only to custom black and white entries, but did not ignore adjusts - but that is not new and not related to you I assume. Let me know if. you'd like a copy of my log files, and I'd be happy to do some more QA when progress is made. Would be delighted to be kept in the loop! |
There is nothing you'd need to be forgiven for :-) The wrong display as cached vs. forwarded has already been merged and should be within the next Not including the The password specified in The other issue about the forwarding to the wrong server is in discussion with the
You should now see extra log lines in
The lines I marked with |
Yeah, that's right. The correct name for the password env var is Which is a departure from v5's Just thinking out loud, I could add some code into Docker that detects Id avoided too much hand holding in this department so far, but maybe it will ease the path of upgrade for some |
Hey,
✅ FTLCONF_extraLogging=true (should already be equivalent to FTLCONF_misc_extraLogging=true) Before I proceed with this update, can you confirm if this covers what you’re looking for? It seems like this would add the expected information without needing to rebuild the container, but I’d like to make sure before applying it.
To be more specific, I compared the comprehensive (all DNS records dig) from new device iPhone a couple of days ago for the domain google.tv, that after total 'fixation' on Mac on 1.1.1.1 for non conditional domains on my Mac, correctly resolved through default google DNS servers, and the same procedure with iPad today, domain cats.com (you may guess is not on any list), which did not 'reset' to default DNS, and used 1.1.1.1 again.This is followed by something very peculiar also and in between obsessive RTP repeated endlessly when no lookups are made, with what seems to be an endless loop - 127.0.0.1 asking who is pihole, receiving an answer 127.0.01 which triggers the same query endlessly in between activity. Starting with the bottom line in the initial query of iPad on nightly for cats.com is immediately routed to 1.1.1.1 starting with: Jan 31 23:18:54 dnsmasq[179]: query[AAAA] cats.com from 192.168.3.54 just like that, with zero database check, neither for conditional domain, cache, creating a sequence that without these checks, renders the term "reason" useless, as it is entirely arbitrary, and with no check, there could be no reason. Query log btw presents the answers as "cached. Immediately following that, an RTP from local 192.168.4.1 to discover 192.168.3.54 client is iPad.lan
#2 is covered here:
B
C
D. Then almost this part as it has no apparent trigger, repeated RTP for iPad.lan first from reverse forward system set server (left in default dnsmasq 01-pihole.conf - the rest was commented out - boxes sequential divided to make this clear):
E. Immediately followed by RTP for who is 8.8.8.8, initiated again by 127.0.0.1 local host, revealing DNS.google, the default DNS:
F. This is when it gets truly psychotic - local host queries the name of the local RTP 192.168.4.1, receiving response from config file (only thing left in 01-pihole.conf after the rest was commented out):
G. Then local host moves on to reverse lookup who is 1.1.1.1 the conditional that keeps taking over, answer received from stale cach:
H. Finally to renew this 'vital' info, it queries 8.8.8.8 default DNS, receives a reply that it is 0ne.one.one.one BUT also (without the question being asked and apparently in breach of private query limited to being addressed to 192.168.4.1, a reply is received (no mention of the quenotni asked though as if from 1.1.1.1 or from DNSMASQ itself, that 192.168.3.54 is iPad.lan, information attained 40 mins earlier from the legitimate authority internall.
There may be a perfectly good explanation for the this that at almost 7 am does not 'jump out' at me. The following observation ad more strategic suggestion I wrote earlier, and hope it helps on beyond doing my 'QA duties'. 3. Macro level observations and food for thought I realized though, that the presence of previously AdGuardHome, and hopefully soon PiHole and it's benefits, while intuitively feel they belong as a part of a network device, within a default docker environment, just present issues and compromises, starting from not even being access Pi-hole interface in less than multi NAT network traversal from a disjoint network that uses a single IP (takes two or three hops to even access the interface), to applying mcvlan bridges to enable policy based routing. I realized a default docker network, residing on a segment that is completely estenal to the network, can not even be seen as a 'device' in it, and the need to break firewall policies, and lose performance and efficiency both for accessing the device, and certainly the extra riding necessary to apply firewall capabilities on it. |
You are invited to enabled as much verbose logging options as you can find. However, we will still need the special branch as the normal debug output will not cover how the upstream server is chosen. This needs additional output which is only available in this special branch. It will also not find its way into the "normal" code as it will become unnecessary to see this once we fixed this bug. All the PTRs you have observed are expected. They are done by Pi-hole itself on first sight of new clients or upstream servers to get human-readable names for the IP addresses. These are the names Pi-hole shows you on the Query Log and various other interfaces. The
This says "config" because Pi-hole has a default setting to prevent sending internal IP address PTRs (and
Once you enable the As to the lower part of your post (starting from the bold text starting with "3. Macro level observations and food for thought") I'd like to get @pi-hole/ftl-maintainers 's opinion because I am not involved myself too much into docker-business and any kind of complex networking in general so I could not provide you with an adequate answer. |
You're right, you have not proof read any of this.
I thought the wall of text was fishy (and far too opinionated/pushy) the last part seals it. I'm not discussing this with an LLM. If you want engagement, or to foster communication on an idea - don't post walls of AI Generated text. |
Edit: below is the original response I was about to send to you, before reading your last message, and yes, it was proof read by a machine ( not to mention that English is not my first language and I have a tendency for typos. BTW - it was specifically the business plan part, and the initial report I wrote everything else myself. Why did I use an LLM to proof read the business idea? because that is what I've been doing as VP of Product Management in a large tech company, that will not be named, and it represents respect for yourself, and the person taking the time to engage with your ideas, and unlike you I am aware of my weaknesses, and being overly concise is one of them, so I apologize ofr not making excluging some messages from the exhaustion this one may entail, but its the mass murdered preaching to the guy whom he blames for spending more money and not stealing at the store (analogy). Ironic that the person who wrote the worst code IN HISTORY and almost got it released, and notwithstanding the one person that alerted him on it, is assigned the QA positions (may be you're right, and this beyond your capacity to reproduce these fatal issues, or else what can explain the outcome, that renders tha platform not useless, dangerous. If I am guilty of using a machine to help someone like you finally (selectively) see the result before passing out, it will takes tears for language models be be apt to deal with your shortcomings. You only proof read to improve, such as this way overly long and time wasting message), and although I worked me ass off so you end up with a disaster, you feel used since I spared you the reading ,but you prove that that is even irrelevant and your selective reading style is for offloading, but who am I to talk proof reader shameful me. I will hit send without reading the edit part even once, hopefully that will be the begining of paying for my sins. In the following text written prior to seeing your unbelievable warning, I used chat GPT because I was giving you the last life line, since you missed out on my prior message, making it crystal clear what I am here for.. My response, proof read by LLM, is toned down and even this one, takes into account that this is a public forum. Not to repeat myself, but you used the one person that you should be grateful to, for uncovering what you were going (and are welcome to ) release, where you were able to completely break DNSMASQ without noticing it. Professionally, I've never seen someone in your line of work (despite being exposed to all kinds daily), that is so bad at their job, that this is what they have to show for. But much worse, and totally interrelated, is personality, the sense of entitlement, lack of any values, and pure stupidity, that can even lead to such poor outcome. The details are below, but the last thing you did was collaborate with me, you just used me to do your work after getting everything you need for it from day one. From building a container with the pathetic excuse that I "can" reproduce the extremely well documented Fuckups of yours, as if it was so trivial. You missed the part about changes in the architecture and that settings in dnsmasq neither apply, nor are alerted about, which combined with your results, is the recipe for real disasters instead of appreciating that I gave you everyihg you needed on a silver platter and saved you from the horrible outcomes of yourself. Of course, I wasted hours running queries on a system that did not read my config, since you didn't bother telling me that this involves a small change that isnt documented (that btw assumes I am totally literate in linux, which I am quite, but you couldn't care lesss, You then ordered me to run more and more queries, and when I didn't rebuild a second system, after some bullshit that you could not reproduce any issues (I dare your to release the current version if that is the case), as the most natural thing in the world. The nickel dropped, I think they say, after your previous to last "response", which you had no interest in reading (but its explained below), missed what could have gained, saw the smallest pieces of text, completely took them out of context and when you did not address the the core whatsoever, it was clear that what your interests were. For someone that I think the LLM kindly explains crossed the line from the start, 'threatening me' the person that saved your ass, and strategically exposing your true intentions at the same time, that spent three entire nights do comply with your orders (not even requests), to threaten to save the slave that used a machine to no more than tighten and "proof read" what you only caught my chance now, being untruthful about recreating any of the very very easy to recreate bugs (just use the thing, the scripted way I enslaved myself to make it easy for you is indeed not needed when it is so bad, to complain about a business plan polished by a machine, is really like the inventor of slavory, accusing to dismiss his slave because he was wearing shoes when picking cotton. Ugratful, blind and not knowing how to leverage help (your last stupid message that no LLM reading and ignoring...anyway its all written down, you don't have too read it it was proof read. It all comes together and the proof is in the pudding. Good luck, you'll need it. Original comment to your previous message PROOF READ BY CHAT GPT, AN OPEN AI INC PRODUCT, I have carefully reviewed your latest response, and I must say that your approach to this issue has been incredibly frustrating. Instead of fully engaging with the detailed analysis and insights I provided, you have continued to assign me more tasks as if I were working for you. This is not a productive or respectful way to collaborate, especially given the extensive time I have already invested in identifying and detailing the core problems in Pi-hole's behavior. Multiple Assignments & Offloading WorkFrom the very beginning, you have assumed that I should be the one repeatedly setting up new test environments, despite the fact that you have all the necessary details to reproduce these issues on your end. Initially, you framed this as me being the best person to reproduce the problem since I had already encountered it. However, instead of simply verifying my findings, you continued to pile on more demands—asking me to run additional queries, create another container, and do work that you could easily perform yourself in a fraction of the time. Ignoring Core InsightsMore concerning than your refusal to reproduce these problems yourself is the fact that you have completely ignored the most critical insights I uncovered. The key finding—that the current system does not appear to check any databases before applying conditional forwarders incorrectly—was entirely missing from your response. Instead, you nitpicked minor points, such as the behavior of PTR queries, while completely missing the overarching issue that could explain multiple serious bugs in the current implementation. I specifically compared a five-page detailed report from a previously working setup with a new setup exhibiting failure, and the discrepancies were glaring. However, rather than acknowledging this or taking the time to reproduce it, you diverted the conversation to side issues. You did not even address the fact that the system is defaulting to 1.1.1.1 incorrectly, without verifying data, despite multiple queries showing that expected database lookups simply never occurred. Dismissive & Condescending AttitudeYour response also included explanations of basic DNS concepts, such as why local PTR queries are not sent upstream, as if I had no knowledge of how DNS operates. This is both condescending and unnecessary. What I actually reported was that the stale query behavior and unnecessary PTR lookups indicate a major prioritization issue in how Pi-hole processes queries. The fact that queries for local hostnames loop repeatedly while critical database checks never happen before a resolution is made is not "expected behavior," as you suggested—it is a clear sign of a fundamental flaw in the query-handling logic. Additionally, your dismissal of the Final PositionAt this point, it is clear that continuing to work this way is not productive. If you genuinely want to fix these problems, you should:
If you are not willing to do this, then there is no point in me continuing to waste my time. I have already gone far beyond what should be expected of any user or contributor in diagnosing these issues, and I am not going to continue being assigned new tasks simply because you do not want to do them yourself. This is your loss. The information I provided could have saved this project from releasing a critically flawed version, and instead of leveraging it, you have chosen to deflect and dismiss. If you choose to ignore these problems and move forward with a broken release, that is your responsibility—but I will not participate in this process under these conditions. EDIT 2: the offers expressed above have expired, but I will not tamper with a proof read document again, for fear of being sued or probably criminally prosecuted |
You are not working for me - but nor am I working for you. Pi-hole is entirely free. You are way overrating your contribution here. You reported many examples, that's true and we have been able to narrow this down to two underlying issues. But there was no real "analysis" or "insight", it was just repeated (different) examples showing the same symptoms. One ("cached" shown instead of "forwarded" but without any other real consequences) has been resolved on the same day. The other one is what you called "incorrect routing" and nobody has succeeded in reproducing this locally from the core team. Hence, you are not only the best but also the only person being able to help fixing this. Despite nobody else having reported something even remotely close to what you have seen, we're willing to invest resources solving this issue only you seem to be affected by.
Funnily enough, if you'd do a full-text search on this page than you'd notice that the phrase " I'm not going to respond to the many other accusations like that we focused on PTRs, etc. - we did that because you provided this as an example for abnormality and we explained why it is in fact totally normal. TL;DR: Our offer is still valid, if you invest the two or three minutes to set up a container with the extra version we have provided specifically for you, then we can continue to fix this second bug, too. If you do not want this, then this ticket can be closed. |
Yeah, that's not the way to get any kind of traction here. I think this thread has run its course. |
Versions
Platform
Expected Behavior
server=/example.com/CustomUpstream
) should only apply to matching queries.Steps to Reproduce
1. Sticking Conditional Forwarder Issue (Fatal - Breaks System When Using Custom Domain Server Settings)
server=/domain.com/CustomUpstream
).amazon.com → 1.1.1.1
) → Resolves correctly.2. FTL Failure on Syntax Error or Multiple Domains in a Single Forward
server=/
entry is present.(dnsmasq supports both formats, but Pi-hole does not seem to handle them correctly.)
.conf
file under/etc/dnsmasq.d/
OR aggregate multiple domains under a singleserver=/.../
entry.pihole restartdns
) or reboot the system.Impact
Proposed Severity
CRITICAL Bug:
Bug:
Suggested Fixes
Fix Per-Domain Upstream Sticking:
Improve FTL Handling of Syntax Errors:
server=/
entries instead of breaking FTL, or skip invalid config files.Investigate Query Behavior in Logs (Google.tv Example)
google.tv
lookup on a new device) extends 5x processing lines in logs.Query Log & Incorrect Routing Table
Explanation of query workflow (Ran in Order)
net.net
: Not on any conditional forwarder → Resolves correctly to 8.8.8.8.cats.net
: Not on any conditional forwarder → Resolves correctly to 8.8.8.8.apple.co.uk
: On Google's conditional forwarder → Resolves correctly to 8.8.8.8 (since Google is also a default).bbc.com
: On UK conditional forwarder (9.9.9.9) → Resolves correctly.No issue yet.
amazon.com
: On US conditional forwarder (1.1.1.1) → Resolves correctly.However, all subsequent queries are now forced to 1.1.1.1.
heroes.com
: Not on any list but still resolves via 1.1.1.1 instead of default (Bug triggered).get.com
: Also not on any list, yet still resolves via 1.1.1.1 instead of default.apple.com
: On Google conditional forwarder → Resolves correctly, appears to reset the bug.google.tv
: Queried from a different device, and resolves correctly via 8.8.8.8.Attachments
https://tricorder.pi-hole.net/MP2GX8XA/( Debug token again)
extended query log.txt
The text was updated successfully, but these errors were encountered: