-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] MK4 with Octoprint - M486 A<long name> crashes the printer #4123
Comments
Cannot upload a file, so here is some Gcode that triggers the problem. ; length 80 |
@DerMouse thanks for the report. I just tried this and wasn't able to reproduce it. I don't know much about octoprint, I just followed this guide. I also tried reproducing this via serial port terminal and also didn't get any crash. |
I also can't reproduce any "crash" related to gcode "M486 A" so far. @DerMouse did you see a BSOD or Red screen when the crash happened ? And if yes, can you please take a picture of the screen ? or was the serial port just not responsive ? |
Sorry for the late answer. |
Here is the data from the serial.log file when sending the file :
|
Thanks for the screenshot and logs. Looks like redscreen is just a symptom of octoprint sending M112 as a part of error recovery. It seems that octoprint doesn't like |
@danopernis If you need any more data, or some tests to be done, just send a comment here. |
Yeah, seems to be an issue with octoprint and not the prusa firmware at first sight. I will try some tests against the checksum algorithm just to be sure. |
It seems that octoprint treats checksum errors as no real error and does resend the line assuming that the checksum error happened due to serial issues. With the PR #4168 this should be handled differently by octoprint instead, causing the print to error out. |
Forgive me if I've missed something here, perhaps not having seen everything in detail. But why should OctoPrint stop sending commands in this situation? Could you explain this a bit more? Generally, the point of checksums is to react to communication errors. So normally a checksum mismatch would result in a resend request, so communication could recover. Stopping sending any further commands means that communication is never recoverable, which is not what we want. |
This is also implying to me that OctoPrint is sending the right thing. Apologies again if I'm being dense, but could you clarify what OctoPrint is doing wrong here (or what it should be doing instead)? I'm not very clear. |
From my understanding of this issue here, the problem is indeed not "with OctoPrint", but rather with the checksum validation on firmware side failing due to a buffer overrun in the firmware. And as far as I understand the core issue here, the slicer (also by prusa) is generating a line (with a prusa specific command?) that is SO close to the firmware's buffer sizes that problems are just waiting to happen, as witnessed here. We currently don't have a mechanism to communicate buffer sizes from the firmware to OctoPrint, least of all something firmware agnostic, and checksum mismatches due to serial communication issues are indistinguishable from checksum issues due to firmware side buffer overruns. The correct way to handle checksum mismatches in the (way more common) first case is to resend the line. In the second case, I guess all that can be done is abort the print, or maybe reset the line numbers in the hopes that shortens the line length enough so that the firmware doesn't mangle the checksum calculation. But without a way to distinguish one from the other, there's nothing that can be changed in OctoPrint that won't entirely break the first case, and break things for everyone else out there. As you said yourself, what OctoPrint is sending is correct, it's the firmware that can't handle it and triggers the failure mode. From your explanation above, the firmware should be reading the whole line from last new line to next new line, not just the part that fits into its arbitrarily (and apparently undersized) line buffer. |
@foosel thanks for the insight, I agree with your points. It is a shame that there is no standard way to distinguish those cases, nor to communicate the actual buffer size. As it stands, firmware is limited in memory and we can't just keep increasing buffers whenever somebody hits the limit. Maybe the slicer should limit the size of cancel object names, we will discuss it. Worth mentioning, there are actually two limits: One is for the actual line of the gcode and other is for this weird serial protocol which adds line number prefixes and checksum suffixes. Obviously the second one needs to be larger. Is there a way to turn off this serial protocol in OctoPrint? That could be a workaround @DerMouse could use, if the serial line is not noisy. |
I hit this behaviour when slicing (and then sending to print) some STL with a very long file name (90+ characters). Once I understood the problem came from the M486 line, and that it was built from the file name, I shortened the name and everything went back to normal. Some warning from the slicer, or limiting the M486 A size, would indeed be great to avoid the problem. |
I was just about to point out that said "somebody" was probably sitting some rooms further on your corridor ;) This is a very much prusa-internal problem, created by your slicer in your firmware. It would have arised with any host software adhering to the de-factor standard communication protocol as an intermediary, that it happened with OctoPrint was only thanks to its market share.
Oooooof. Ok, I have to admit, this sentence isn't what I'd have expected to see coming from a firmware dev 😅 This "weird serial" protocol is the established industry standard for serial communication in consumer 3d printers, which is notoriously sensitive to EMI thanks to ridiculous cable lengths being used in the field, plus low quality wiring, crossing wires with motor cabling, even refrigerators or other home appliances on the same noisy mains line, down to firmware abusing interrupts, etc. In short: this communication is surprisingly error prone, the communication protocol hence includes ways to implement an error correction at least on the sending side (sadly not on the receiving side, but that's a different story), and that's line numbers and checksums. And that has been the case ever since consumer 3d printing became a thing and significantly predates the existence of OctoPrint and even Marlin. I'm frankly a bit reluctant about telling you that yes, those CAN in fact be disabled, because I'm afraid that the next thing that will happen is an article in the Prusa Knowledgebase, telling MK4 users who want to use their printer with OctoPrint to disable that crucial error correction "because of an issue with OctoPrint". It wouldn't be the first time that the blame for an issue with your firmware is shifted to me, and the disgruntled users then screaming at me for not properly interoperating with their expensive hardware, when in fact it's the in-house developed firmware that still has some serious bugs to iron out. So instead of telling you how to disable this error correction as a work around, I'm instead encouraging you to solve the issue at the (in-house) source and use more sensible cancel object labels in the slicer. |
Side note, given that this is caused by the slicer not limiting the line length and just using the user supplied STL file name to generate the label here, if I were you I'd take a close look at what happens if the line grows beyond your buffer length even without line numbers and checksums, when coming from a file stored on the printer. My money is on that at least causing label corruption, if not outright crashes as well due to run away quoted strings... |
It could even be worse : a buffer overflow may lead to code execution on the printer (dunno if M486 A may include binary data, though, so several assumptions here). A hand-written malicious GCODE file could then make for interesting results. |
Indeed. A crash in such a case would indicate a potential attack vector. |
I am not really in the mood of discussing how the entire "established industry standard for serial communication in consumer 3d printers" is just a pile of hacks which are put on top of other hacks which miraculously work most of the time. It resembles how the evolution created totally useless 4.5m nerve in giraffes because it can't go back and fix it because of backwards compatibility 😄 . Proper engineering solution would involve using proper link layer on top of physical layer, containing proper mechanism for error detection, retransmission and what not. Also not parsing character-oriented instructions like crazy and using proper binary format, with backward compatibilities, versioning, deprecation, proper vendor extensions etc. But it is too late for that, just like it is late for fixing the giraffes, so that's just me ranting. Not sure if that's what you'd expect from firmware developer or not 😉 . We will fix this, eventually, as we do. It is just tiresome and your Schadenfreude is just not helping to be honest 😫 . P.S. Crashes are not happening, and the attack vector is not there. Firmware is properly refusing the string which is too long, not overwriting anything. I have no idea what you are talking about. |
There's no Schadenfreude here anywhere, I'm not sure where you are getting that from. I got an issue reported on OctoPrint about this, I went here, explained how things are, gave pointers how to fix this and where else to look for potential related issues and also explained why I didn't want to go on the record with a potential workaround due to those having been abused in communication in the past to shift blame. Try to look at this from my perspective. Something in this firmware misbehaves, an analysis is posted that even comes to the conclusion that OctoPrint is doing everything correctly, and yet the reporting user is sent to OctoPrint's repository to open a ticket about it. If this would be an isolated case, shit happens, but this has sadly become a bit of a pattern with this and the former firmware, and so I feel the need to be very clear on where the issue lies. That is not meant as a personal attack, Schadenfreude or anything like that. I just want to see this fixed for the sake of the users, without OctoPrint getting the blame for it (which - sadly - indeed has happened before, repeatedly).
We agree on that, however, as you said, it is what it is, and for the sake of interoperability we should see how we can make it work. Throughout the years I've pushed things in the a better direction whenever I could, but with the firmware landscape being as fragmented as it is, there's not much that can be done without fragmenting it even further. |
@foosel It's not about blaming Octoprint doing anything wrong here at all. It does make sense that in case a checksum fails, the transmission is being resent. It is more about resending an already failed message multiple times again (more than 10 times if I saw it correctly). But maybe this is intentional behaviour by Octoprint, but imho, this clearly should be limited to be less than 5 retries, especially if lines are numbered and checksumed. And I wouldn't expect that a M112 is being send just because a sent message obviously failed and pausing/aborting a print would be sufficient. I didn't dig deep enough in the octoprint repo to see if that is default by design, so sorry for that, if I missed anything obvious. It is pretty clear that the Prusa printer firmware has to send an error message indicating that the gcode line is too long, that's why I made a PR in order to fix that instead of claiming the checksum is wrong although it was right. As @danopernis already said, there is no vulnerability at all, which I can confirm as I tried to use fuzzing and source debugging to find the root cause of this issue. The printer firmware as of now simply ignores the message and throws a checksum failed error. |
Thank you for your contribution to our project. This issue has not received any updates for 60 days and may be considered "stale." If this issue is still important to you, please add an update within the next 7 days to keep it open. Administrators can manually reopen the issue if necessary. |
This issue has been closed due to lack of recent activity. Please consider opening a new one if needed. |
Printer model
MK4
Firmware version
6.0.1=14848
Upgrades and modifications
No response
Printing from...
USB/Octoprint
Describe the bug
PruscaSlicer writes M486 A into the gcode file. If is longer thant 83 characters, sending the file to print through Octoprint USB connection will crash/reboot the printer. No problem if same file printed through regular USB key.
How to reproduce
Slice a (very long named) STL file. Send it to MK4 through Octoprint. No need to have any real printing commands in the Gcode file.
Expected behavior
Should not crash anything.
Files
No response
The text was updated successfully, but these errors were encountered: