-
-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(vm): do not delete a VM during retry on create
#1711
Conversation
Signed-off-by: Pavel Boldyrev <[email protected]>
WalkthroughThe pull request introduces changes to two files: a test file for VM resources and a VM creation method in the Proxmox provider. In the test file, a minor modification was made to the expected error message verification. In the VM creation method, a new retry mechanism was implemented to handle specific error scenarios during VM creation, particularly when dealing with already existing VMs or worker-related issues. Changes
Sequence DiagramsequenceDiagram
participant Client
participant VMCreationMethod
participant ProxmoxAPI
Client->>VMCreationMethod: Create VM
VMCreationMethod->>ProxmoxAPI: Request VM Creation
alt VM Creation Fails
ProxmoxAPI-->>VMCreationMethod: Error
alt Error is "already exists" and retrying
VMCreationMethod-->>Client: nil (ignore error)
else Error is "got no worker upid"
VMCreationMethod->>ProxmoxAPI: Retry VM Creation
else Other Error
VMCreationMethod-->>Client: Error
end
else VM Creation Succeeds
ProxmoxAPI-->>VMCreationMethod: Success
VMCreationMethod-->>Client: VM Created
end
Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
proxmox/nodes/vms/vms.go (1)
106-107
: Consider adding debug logging for the retry state change.While the code is correct, adding a debug log when setting
retrying = true
would improve observability.retrying = true +tflog.Debug(ctx, "VM creation entered retry state")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
fwprovider/test/resource_vm_test.go
(1 hunks)proxmox/nodes/vms/vms.go
(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- fwprovider/test/resource_vm_test.go
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: qodana
🔇 Additional comments (2)
proxmox/nodes/vms/vms.go (2)
82-82
: LGTM! The retry logic improvement looks solid.The changes effectively address the VM creation reliability issues by:
- Using the
retrying
flag to differentiate between first attempt and retries- Gracefully handling "already exists" errors during retries
- Not deleting VMs unnecessarily during retries
This makes the VM creation process more robust and predictable.
Also applies to: 84-94
108-110
: Verify retry condition coverage.The retry condition now specifically targets "got no worker upid" errors. Let's verify there are no other transient errors we should handle.
✅ Verification successful
The retry condition is appropriately scoped.
The current retry condition specifically targets task creation failures with "got no worker upid", while other transient errors are already handled by dedicated retry mechanisms in their respective operations (task waiting, configuration locking, etc.).
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for other error patterns in VM creation that might need retry handling rg -A 2 "error creating VM" --type goLength of output: 173
Script:
#!/bin/bash # Search for error patterns in VM operations and retry conditions echo "=== Error patterns in VM operations ===" rg "error.*VM" --type go -B 2 -A 2 echo -e "\n=== Retry conditions in codebase ===" rg "RetryIf.*error" --type go -B 2 -A 2 echo -e "\n=== Proxmox API error patterns ===" rg "proxmox.*error" --type go -B 2 -A 2Length of output: 18388
Contributor's Note
Another attempt to make the VM
create
operation more reliable. In PVE, thecreate
process isn’t idempotent and can sometimes fail with transient errors, such asgot no worker upid - start worker failed
.This change removes
delete
from the retry flow and enables retries only for specific errors. As a result, if we encounter an error like "unable to create VM 100 - VM 100 already exists on node 'pve'," it would indicate an actual attempt to "re-write" an existing VM rather than a fluke caused by a retry./docs
for any user-facing features or additions./fwprovider/tests
for any new or updated resources / data sources.make example
to verify that the change works as expected.Proof of Work
I was able to reliable reproduce this error in acceptance tests after scaling down my test node:
Not reproducible after the fix. All acceptance tests pass.
Community Note
Relates #0000
Summary by CodeRabbit
Bug Fixes
Tests