Couldn't deleted snapshot data #7826

rtjdamen · 2024-07-11T04:54:22Z

Are you using XOA or XO from the sources?

XOA

Which release channel?

latest

Provide your commit number

No response

Describe the bug

A CBT snapshot backup finishes with warning " Couldn't deleted snapshot data"

Error message

Couldn't deleted snapshot data

error
{"code":"VDI_IN_USE","params":["OpaqueRef:79798f68-e639-494b-8adf-cade435be5fb","data_destroy"],"call":{"method":"VDI.data_destroy","params":["OpaqueRef:79798f68-e639-494b-8adf-cade435be5fb"]}}
vdiRef
"OpaqueRef:79798f68-e639-494b-8adf-cade435be5fb"

To reproduce

Random behavior, seems to be related to specific vms as it reoccurs on the same vms every time

Expected behavior

if the vdi.data-destroy fails i would expect a retry, a manual retry does work. Maybe a timing issue?

Screenshots

No response

Node

18.20.2

Hypervisor

XCP-ng 8.2

Additional context

Does happen on some vms

The text was updated successfully, but these errors were encountered:

rtjdamen · 2024-08-31T19:18:06Z

Issue still occuring on latest version, seems like an issue with timing, snapshot is still in use. After a few seconds the command can be processed by hand.

olivierlambert · 2024-09-01T15:30:46Z

Thanks for your feedback @rtjdamen !

fbeauchamp · 2024-09-03T16:04:03Z

Hi @rtjdamen ,

After a careful analysis with the xcp-ng team, we found that this error is raised when the xapi failed to unplug the vdi after a non modifiable delay of 4s.

We patched your installation with a retry on XO side. If it's ok with you we'll monitor this night jobs and see i it's enough to handle this edge case.

Regards

sometimes the capi take too long to detach the VDI in this case, the timeout is fixed at 4s, non modifiable when the timeout is reached the xapi raise a VDI_IN_USE error this is an internal process of the xapi This commit add a retry on XO side to give more room for the xapi to work through this process, as XO already do it one vdi destroying fix #7826

rtjdamen · 2024-09-03T16:24:43Z

Yes, no problem, we will keep an eye on it too!

sblanchouin · 2024-09-03T18:32:55Z

Hi,

Can you patch our installation too ASAP ?

Thanks !

rtjdamen · 2024-09-04T04:54:29Z

@fbeauchamp unfortunatly it seems like the snapshot data is still not destroyed in every case, i think the 4s is still too short. Maybe we need to increase it to 10s to start with?

fbeauchamp · 2024-09-04T17:08:24Z

patch has been redeployed on proxy. Waiting for this night run to be sure

rtjdamen · 2024-09-05T07:19:25Z

patch has been redeployed on proxy. Waiting for this night run to be sure

Seems like that did the trick! no more orphan VDI's this morning! Also no VDI_In_Use destroy messages, are these related or not?

fbeauchamp · 2024-09-05T08:40:19Z

yes because the vdi were not deleted (VDI_IN_USE ) and stayed as orphans. Now we purge them correctly.

rtjdamen · 2024-09-05T08:41:53Z

So 2 issues fixed!

sometimes the capi take too long to detach the VDI in this case, the timeout is fixed at 4s, non modifiable when the timeout is reached the xapi raise a VDI_IN_USE error this is an internal process of the xapi This commit add a retry on XO side to give more room for the xapi to work through this process, as XO already do it one vdi destroying fix #7826

rtjdamen · 2024-09-16T17:50:47Z

Issue not resolved completely, original fix was solving the issue but the version now active in XOA is not resolving the issue.

julien-f · 2024-09-17T13:43:25Z

@rtjdamen Are you sure both your XOA and your XO Proxies are up-to-date on latest channel?

If they are, we need to take a look at them.

rtjdamen · 2024-09-17T14:06:09Z

According to the gui they are.

julien-f · 2024-09-24T08:42:26Z

@rtjdamen After looking at your infra, it seems that there is still a major improvement, there are very few VDI_IN_USE errors now 🙂

The only problem we saw comes from the fact that one of your VDIs is still attached to the control domain and our XCP-ng team is still investigating this issue.

We will still continue to monitor this problem.

rtjdamen · 2024-09-24T08:47:02Z

Ok, hope they find a solution for that issue soon! Met mobiele groet Robin Damen

…

________________________________ Van: Julien Fontanet ***@***.***> Verzonden: Tuesday, September 24, 2024 10:42:49 AM Aan: vatesfr/xen-orchestra ***@***.***> CC: Robin Damen | RDM Media BV ***@***.***>; Mention ***@***.***> Onderwerp: Re: [vatesfr/xen-orchestra] Couldn't deleted snapshot data (Issue #7826) CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. @rtjdamen<https://github.com/rtjdamen> After looking at your infra, it seems that there is still a major improvement, there are very few VDI_IN_USE errors now 🙂 The only problem we saw comes from the fact that one of your VDIs is still attached to the control domain and our XCP-ng team is still investigating this issue. We will still continue to monitor this problem. — Reply to this email directly, view it on GitHub<#7826 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AEL4XJWKML5OYS3TV7Q3R5DZYEQYTAVCNFSM6AAAAABKWFZ35CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZQGY2DGNBTGU>. You are receiving this because you were mentioned.Message ID: ***@***.***>

rtjdamen · 2024-09-24T09:04:45Z

I just checked but the one failed yesterday is not hanging at the control domain. So this is incorrect

rtjdamen added the type: bug 🐛 label Jul 11, 2024

marcungeschikts assigned marcungeschikts and fbeauchamp and unassigned marcungeschikts Jul 12, 2024

fbeauchamp mentioned this issue Sep 3, 2024

feat(backups/CBT): retry data_destroy when error is VDI_IN USE #7960

Merged

rtjdamen mentioned this issue Sep 5, 2024

VDI_IN_USE(OpaqueRef:d525416f-65c7-48b6-ad1c-200e8b315014, destroy) #7956

Closed

julien-f closed this as completed in #7960 Sep 10, 2024

julien-f closed this as completed in 9504132 Sep 10, 2024

marcungeschikts reopened this Sep 17, 2024

marcungeschikts assigned julien-f Sep 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Couldn't deleted snapshot data #7826

Couldn't deleted snapshot data #7826

rtjdamen commented Jul 11, 2024

rtjdamen commented Aug 31, 2024

olivierlambert commented Sep 1, 2024

fbeauchamp commented Sep 3, 2024

rtjdamen commented Sep 3, 2024

sblanchouin commented Sep 3, 2024

rtjdamen commented Sep 4, 2024

fbeauchamp commented Sep 4, 2024

rtjdamen commented Sep 5, 2024

fbeauchamp commented Sep 5, 2024

rtjdamen commented Sep 5, 2024

rtjdamen commented Sep 16, 2024

julien-f commented Sep 17, 2024

rtjdamen commented Sep 17, 2024

julien-f commented Sep 24, 2024

rtjdamen commented Sep 24, 2024 via email

rtjdamen commented Sep 24, 2024

Couldn't deleted snapshot data #7826

Couldn't deleted snapshot data #7826

Comments

rtjdamen commented Jul 11, 2024

Are you using XOA or XO from the sources?

Which release channel?

Provide your commit number

Describe the bug

Error message

To reproduce

Expected behavior

Screenshots

Node

Hypervisor

Additional context

rtjdamen commented Aug 31, 2024

olivierlambert commented Sep 1, 2024

fbeauchamp commented Sep 3, 2024

rtjdamen commented Sep 3, 2024

sblanchouin commented Sep 3, 2024

rtjdamen commented Sep 4, 2024

fbeauchamp commented Sep 4, 2024

rtjdamen commented Sep 5, 2024

fbeauchamp commented Sep 5, 2024

rtjdamen commented Sep 5, 2024

rtjdamen commented Sep 16, 2024

julien-f commented Sep 17, 2024

rtjdamen commented Sep 17, 2024

julien-f commented Sep 24, 2024

rtjdamen commented Sep 24, 2024 via email

rtjdamen commented Sep 24, 2024