-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cockroach version
panic in schema change
#83864
Comments
Hello, I am Blathers. I am here to help you get the issue triaged. Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here. I have CC'd a few people who may be able to assist you:
If we have not gotten back to your issue within a few business days, you can try the following:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
cc @cockroachdb/bulk-io |
This crash seems unreasonable without something wild going on. Maybe some sort of memory corruption? The code is here:
This is init time code processing some narrowly defined structures. There's not a ton of leeway here. The only instantiations of that struct are in the below file and given the rest of the error, we know that the structs must have been one of these. cockroach/pkg/sql/schemachanger/scplan/rules.go Lines 732 to 749 in 11787ed
|
They shouldnt. See the latest comment in #83706, I now suspect this particular go runtime / OS combination is borked in a way independent of crdb. |
Given the evidence of runtime corruption, I'm closing this. |
If the root cause is memory corruption, isn't this still a bug? |
If the memory corruption is in the go runtime, is it a cockroach bug? We generally write code assuming the go runtime works according to the language specification, I think in this case, it doesn't. |
I didn't think we had any evidence that the corruption was caused by the Go runtime. (Obviously if anything in the process is corrupting memory, then the corruption can appear in any subsystem -- and indeed we've seen corruption in CockroachDB as well as a few different parts of the Go runtime.) But if it's preferable, I can track this issue outside this project and reopen it if/when I have more data. |
I interpreted #83706 (comment) as an indication of widespread memory corruption. I can't make a claim as to whether this corruption is in I hope we all can get to the bottom of it. |
Just to close the loop on this: see oxidecomputer/omicron#1146 for gory details, but at this point I believe the corruption in that issue was due to this illumos OS issue. The behavior there is that memory that is supposed to be zero'd by the Go runtime may not be properly zero'd. It looks to me like that could explain this bug, so that seems the likely culprit. |
thank you. |
(originally filed as #83706, moving here as requested)
Describe the problem
While trying to reproduce #82958, I found a case where
cockroach version
exited with status 2. Thinking maybe it would be reproducible, I started running that in a loop. I saw this panic:To Reproduce
Just run
cockroach version
in a loop unti it exits. At first, I just did:Now I'm using this:
Unfortunately I lost the status code but I've fixed the above script to avoid that.
It took just over 53h and 1.4M iterations to hit this.
Expected behavior
This should run indefinitely without issue.
Environment:
This is on helios helios-1.0.21004.
CC @knz
Jira issue: CRDB-17319
The text was updated successfully, but these errors were encountered: