You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This has been previously reported here and here. Unfortunately, it hasn't been resolved yet.
This issue is prevalent in big environments where the provider manages many resources. The end result is that the Upjet provider pod restarts and affects stability in large Crossplane-managed environments using Upjet generated providers.
How can we reproduce it?
This is observed in a large Upjet-managed environment, for example one with a few hundred SQS Queues.
Root cause
I looked into this a bit and ran a race detector on Upjet itself, it fails with:
==================
WARNING: DATA RACE
Write at 0x000103c39580 by goroutine 82:
github.com/crossplane/upjet/pkg/resource/fake.(*Observable).SetObservation()
/Users/[email protected]/25feb/upjet/pkg/resource/fake/terraformed.go:32 +0x3c
github.com/crossplane/upjet/pkg/resource/fake.(*Terraformed).SetObservation()
<autogenerated>:1 +0x20
github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKExternal).Update()
/Users/[email protected]/25feb/upjet/pkg/controller/external_tfpluginsdk.go:718 +0x530
github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKAsyncExternal).Update.func1()
/Users/[email protected]/25feb/upjet/pkg/controller/external_async_tfpluginsdk.go:199 +0x2a0
Previous write at 0x000103c39580 by goroutine 79:
github.com/crossplane/upjet/pkg/resource/fake.(*Observable).SetObservation()
/Users/[email protected]/25feb/upjet/pkg/resource/fake/terraformed.go:32 +0x3c
github.com/crossplane/upjet/pkg/resource/fake.(*Terraformed).SetObservation()
<autogenerated>:1 +0x20
github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKExternal).Create()
/Users/[email protected]/25feb/upjet/pkg/controller/external_tfpluginsdk.go:660 +0x5c0
github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKAsyncExternal).Create.func1()
/Users/[email protected]/25feb/upjet/pkg/controller/external_async_tfpluginsdk.go:166 +0x2a0
Goroutine 82 (running) created at:
github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKAsyncExternal).Update()
/Users/[email protected]/25feb/upjet/pkg/controller/external_async_tfpluginsdk.go:178 +0x244
github.com/crossplane/upjet/pkg/controller.TestAsyncTerraformPluginSDKUpdate.func3()
/Users/[email protected]/25feb/upjet/pkg/controller/external_async_tfpluginsdk_test.go:286 +0x94
testing.tRunner()
/opt/homebrew/Cellar/go/1.24.0/libexec/src/testing/testing.go:1792 +0x180
testing.(*T).Run.gowrap1()
/opt/homebrew/Cellar/go/1.24.0/libexec/src/testing/testing.go:1851 +0x40
Goroutine 79 (finished) created at:
github.com/crossplane/upjet/pkg/controller.(*terraformPluginSDKAsyncExternal).Create()
/Users/[email protected]/25feb/upjet/pkg/controller/external_async_tfpluginsdk.go:145 +0x244
github.com/crossplane/upjet/pkg/controller.TestAsyncTerraformPluginSDKCreate.func3()
/Users/[email protected]/25feb/upjet/pkg/controller/external_async_tfpluginsdk_test.go:242 +0x94
testing.tRunner()
/opt/homebrew/Cellar/go/1.24.0/libexec/src/testing/testing.go:1792 +0x180
testing.(*T).Run.gowrap1()
/opt/homebrew/Cellar/go/1.24.0/libexec/src/testing/testing.go:1851 +0x40
==================
--- FAIL: TestConnect (0.00s)
testing.go:1490: race detected during execution of test
It looks like it might be the cause of the problem, where the controller's Create and Update call the SetObservation method concurrently on a vanilla Go map which is not thread safe.
Solution
Introduce synchronization or use a thread-safe data structure like sync.Map.
The text was updated successfully, but these errors were encountered:
What happened?
This has been previously reported here and here. Unfortunately, it hasn't been resolved yet.
This issue is prevalent in big environments where the provider manages many resources. The end result is that the Upjet provider pod restarts and affects stability in large Crossplane-managed environments using Upjet generated providers.
How can we reproduce it?
This is observed in a large Upjet-managed environment, for example one with a few hundred SQS Queues.
Root cause
I looked into this a bit and ran a race detector on Upjet itself, it fails with:
It looks like it might be the cause of the problem, where the controller's Create and Update call the SetObservation method concurrently on a vanilla Go map which is not thread safe.
Solution
Introduce synchronization or use a thread-safe data structure like sync.Map.
The text was updated successfully, but these errors were encountered: