-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
concurrent map iteration and map write panic #243
Comments
Okay, finally getting around to debugging this. I'm not sure how accurate the thread dump you get from a concurrent map iteration and map write is though. k8s.io/[email protected]/pkg/apis/meta/v1/zz_generated.deepcopy.go:689 is copying the Workspace annotations. I think It would appear that the controller work queue is only mostly deduplicated. I found some PRs:
I think the only thing we could do to fix this would be to disable the cache for workspace objects (by adding Workspace to the It's worth noting that while crossplane-runtime does configure the controller to recover from panics, concurrent map iteration and map write is not a panic, it's a runtime error which in unrecoverable. |
@toastwaffle Thanks for looking into this! Just to make sure I understand the scenario:
Is the read and write operating on the same shared client cache? It's unfortunate that it's a fatal error and not a panic - even two panics, one on each goroutine, would be preferable to an unrecoverable error. |
I'm suddenly doubting my hypothesis - if it were 2 concurrent reconciles of the same Workspace, the first must have DeepCopy'd the Workspace as part of the cached Get call, and so any writes to the annotations should be on a different map than the one in the cache. Makes me wonder if there is something reading from the same indexer without doing the deepcopy, but I don't know why such a thing would be writing to the annotations. My only remaining hypotheses are that something is wrong with the DeepCopy implementation, or that something is wrong with the Golang concurrent iteration/write detection - both are exceedingly unlikely! I'll think about this some more tomorrow to see if I can come up with anything better... |
Okay, I have not been able to come up with any other hypotheses. I'm going to try to run the provider under the race detector, but apparently "memory usage may increase by 5-10x and execution time by 2-20x" 😬 |
@toastwaffle Any update on this? |
HI @bobh66, sorry for dropping the ball on this. Unfortunately other priorities took over, and I've since moved to a different team at my company which isn't using Crossplane. I've lost of my permissions, but I can see from the logs I still have access to that it has happened 4 times in the past 2 weeks - I'll see if I can get somebody on my old team to look into this some more. |
What happened?
Full thread dump is here. I intend to do some debugging myself, but creating the issue now.
How can we reproduce it?
Absolutely no idea
What environment did it happen in?
The text was updated successfully, but these errors were encountered: