-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Macro files on certain systems remain open leading to stack corruption #205
Comments
Here are 4 syslogs (gzip'ed) from instances when this bug occurred. For each syslog, there is a corresponding .txt file. The first two have some notes and/or thoughts I had when it occured and the other two are just the same log messages (with the timestamps). syslog.error.1.gz |
Thanks for the logs, but I'm afraid there is no debug info in it at all. To obtain a proper debug log, please do the following:
|
@chrishamm Did you look at syslog.error.x.gz or did you look at syslog.error.x.txt for which I said "For each syslog, there is a corresponding .txt file. The first two have some notes and/or thoughts I had when it occured and the other two are just the same log messages (with the timestamps)." Last 10 lines of syslog.error.1.gz: Sep 28 10:34:30 ender-5 DuetControlServer[2772]: [debug] Trigger: Finished code M409 K"network" I1 => here is what I see in: Oct 06 20:26:33 ender-5.dooey.org DuetControlServer[8333]: [debug] Trigger: Starting code M409 K"network" I1 which look to contain pretty similar debug information. |
Ah right, I only looked at the txt files, I missed your previous comment. I am looking into it. |
Thanks for the logs. AFAICS, this isn't a problem in DSF but a potential bug in RRF causing the stack to get messed up. Can you please install the attached firmware build and get me one more debug log? I'm still unable to reproduce it here. |
I installed the zip and it now shows the following firmware versions: Duet 3 Mini 5+ | Mini5plus | 3.5.3+1 |
Here's the journal for a failed print: journal1.txt You can ignore the "Tool 0 was not driven" warnings. I'm "printing" a 1mmx1mmx0.2mm cylinder with the no actual heating to speed up being able to reproduce the problem. I think the logging increases the likelihood of the problem occurring. After turning on debugging logging, I had the problem occur about 50% of the time. This failure occurred on the very first print I ran after installing your firmware. The corresponding console log is:
|
@chrishamm Any progress on this? |
@crpalmer Unfortunately, no. No matter how hard I try, I cannot reproduce this bug on my setup(s). From the debug log I can tell the requested variable isn't deleted before it's requested by the meta G-code expression so I suspect the firmware is on a wrong stack level when the evaluation request comes in. Can you check if it still happens when you insert |
PS: I've got one more idea, please check if the attached DCS package fixes it (install via |
Here's the journal for an instance of the error occurring with the new DCS package installed (installed the package, rebooted the raspberry pi to ensure I didn't miss something that needed to be restarted, emergency stop on the duet): |
And it still occurs with a M400 immediately after M98 P"/sys/retractprobe-forced.g". I removed the M400 and inserted the two M122 commands. Here's the diagnostic output after var probing_temp = ... and here's the M122 output after the M98 P"/sys/retractprobe-forced.g" and before the echo: |
And for good measure, here's the gcode file that I'm using to reproduce it in case you want to verify anything in the M122 output against what is being executed: |
Thanks! So it looks like one of the stack items actually remains open when it should be closed already. M122-starting (stack level 2):
M122-after-retractprobe (stack level 3):
Could you repeat the exercise and put Meanwhile I'll dig through the debug logs again and and will report back if I find something of interest. |
@chrishamm I can do this but would it be easier for you to instrument the code and get exactly the information you want logged? I can reproduce quite easily (with the current code) so if you want to insert targeted debugging messages to help you narrow it down I'd be happy to run the instrumented code for you. At the moment, I'm fine with this printer being non-operational while we figure this out. |
Ideally, you'd put statements like
in your macros around M401/M402/M98 and then send me another debug log. Sadly, I'm still unable to reproduce it here so it's rather difficult to debug. The nicest solution would be to have a set of macros that let me reproduce it on a bench setup but I understand that may be difficult to provide. |
I made these changes: crpalmer/3d-printing@master...rrf-bug-205 and the full configuration of what I'm running is on this branch: https://github.com/crpalmer/3d-printing/tree/rrf-bug-205/ender5/sys and here's the console output (it doesn't look like the M122 output is in the journal): |
Thanks for that. I looked through that debug log again and I could track it down to this section:
After M401 it's still OK but after retractprobe-forced.g the stack level is off. You could try to comment out each of those lines one by one and check when the macro completes. Once you figure out which line causes it, you'd have to repeat the exercise with echo and M122 in the corresponding (sub-)macros, i.e., G28 Z -> homez.g, G32 -> bed.g, or M98 -> retractprobe-forced.g. Note that T-codes also call macros (tpreNg/tpostN.g/tfreeN.g). Btw, is there any reason why you don't put everything from |
I don't have everything in homez-common.g in homez.g because I guess I never finished fixing an issue on this printer. Right now when you G28 it lifts z and then homes x/y and then lifts z some more and then homes z. I started changing my printers to only do the lift z in once (so homez.g should lift z then call homez-common.g). I can't comment out the first G28 Z but commenting out the G32 I was able to run the test gcode 20 times without any errors. Instrumenting bed.g: I was able to run another 20 times without error. I removed all the instrumentation in bed.g and was able to reproduce it again but at the moment I'm out of time. Any suggestions for next steps? |
Thanks again, that sounds promising - so it must have something to do with G32, possibly also G30, because that uses the stack as well. Please add some I'm wondering if running G32 outside a file also messes with the stack. |
I'll ask again about getting you to instrument the firmware. Is it not possible to add some debug messages to the firmware which log why it is pushing onto/popping off of the stack and dumps the current stack each time it pushes or pops a stack frame? That way you don't need me to keep trying to add much heavier instrumentation to the gcode which may be disrupting the timing. |
Yes, I'll do that after next debug log provided I still cannot identify the cause then. |
I was not able to reproduce the problem with M122's inserted into bed.g so I'm not sure I'll be able to generate one with even more instrumentation in there. I'm currently suspicious of the M99 in deployprobe.g. If I remove the M99 and replace it with an "else" clause: then I can't reproduce the problem. If I put the M99 back and also add it to retract-probe.g: it doesn't seem any more reproducible if that means anything. |
Interesting, that's good to know. I'll check if I can reproduce it with M99 in deployprobe.g. |
Yes, there seems to be a problem. I'm working on a fix. |
Thank you very much for reporting and helping me track down the underlying bug. I've got fixes ready for v3.5 and v3.6. |
Great! If you want to prepare a build, I'll install it and run my testing on it. |
Duet Forum Discussion Thread
https://forum.duet3d.com/topic/36466/sbc-mode-random-errors-in-macros-with-3-5-2/12?_=1727389195411
Which Duet products are you using?
Firmware Version
3.5.3
Duet Web Control Version
3.5.3
Are you using a Single Board Computer (RaspberryPi) with your Duet?
Please upload the results of sending M122 in the gcode console.
Please upload the content of your config.g file.
Please upload the content of any other releveant macro files.
No response
Details specific to your printer.
No response
Links to additional info.
No response
What happened?
Expected result
Observed result
Steps to reproduce
The text was updated successfully, but these errors were encountered: