Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Win11 VM can't stop after ~5min #14816

Closed
gpelz-graph-it opened this issue Jan 20, 2025 · 5 comments · Fixed by #14850
Closed

Win11 VM can't stop after ~5min #14816

gpelz-graph-it opened this issue Jan 20, 2025 · 5 comments · Fixed by #14850
Assignees
Milestone

Comments

@gpelz-graph-it
Copy link

Using Ubuntu 22.04.5 LTS (5.15.0-130-generic) with lxd from snap

snap-id:      J60k4JY0HppjwOjW8dZdYc8obXKxujRu
tracking:     6.1/stable

I setup a Windows 11 VM following https://ubuntu.com/tutorials/how-to-install-a-windows-11-vm-using-lxd which worked fine.
Sometimes I can't shutdown the vm using lxc stop wm11test.
After some testing I noticed that with lxc start win11test && sleep 260 && lxc stop win11test or lower sleep times starting and shutting down the vm works. Using a time >= 300 like lxc start win11test && sleep 330 && lxc stop win11test it seems like Windows doesn't get or react to the shutdown request. (I can still connect to the vm and shut it down manually.)
Here's what I see using lxc monitor --pretty:

DEBUG  [2025-01-16T13:14:52Z] Handling API request                          ip=@ method=GET protocol=unix url=/1.0 username=root
DEBUG  [2025-01-16T13:14:52Z] Handling API request                          ip=@ method=GET protocol=unix url=/1.0/instances/win11test username=root
DEBUG  [2025-01-16T13:14:52Z] Handling API request                          ip=@ method=GET protocol=unix url=/1.0/events username=root
DEBUG  [2025-01-16T13:14:52Z] Event listener server handler started         id=beec0abf-889a-4d4b-be26-376fe3962443 local=/var/snap/lxd/common/lxd/unix.socket remote=@
DEBUG  [2025-01-16T13:14:52Z] Handling API request                          ip=@ method=PUT protocol=unix url=/1.0/instances/win11test/state username=root
DEBUG  [2025-01-16T13:14:53Z] New operation                                 class=task description="Starting instance" operation=dc9acf02-bbc2-4ed0-bc0f-1e2682a343c3 project=default
DEBUG  [2025-01-16T13:14:53Z] Started operation                             class=task description="Starting instance" operation=dc9acf02-bbc2-4ed0-bc0f-1e2682a343c3 project=default
DEBUG  [2025-01-16T13:14:53Z] Instance operation lock created               action=start instance=win11test project=default reusable=false
DEBUG  [2025-01-16T13:14:53Z] Start started                                 instance=win11test instanceType=virtual-machine project=default stateful=false
INFO   [2025-01-16T13:14:53Z] ID: dc9acf02-bbc2-4ed0-bc0f-1e2682a343c3, Class: task, Description: Starting instance  CreatedAt="2025-01-16 13:14:52.994712085 +0000 UTC" Err= Location=none MayCancel=false Metadata="map[]" Resources="map[instances:[/1.0/instances/win11test]]" Status=Running StatusCode=Running UpdatedAt="2025-01-16 13:14:52.994712085 +0000 UTC"
INFO   [2025-01-16T13:14:53Z] ID: dc9acf02-bbc2-4ed0-bc0f-1e2682a343c3, Class: task, Description: Starting instance  CreatedAt="2025-01-16 13:14:52.994712085 +0000 UTC" Err= Location=none MayCancel=false Metadata="map[]" Resources="map[instances:[/1.0/instances/win11test]]" Status=Pending StatusCode=Pending UpdatedAt="2025-01-16 13:14:52.994712085 +0000 UTC"
DEBUG  [2025-01-16T13:14:53Z] Handling API request                          ip=@ method=GET protocol=unix url=/1.0/operations/dc9acf02-bbc2-4ed0-bc0f-1e2682a343c3 username=root
DEBUG  [2025-01-16T13:14:53Z] MountInstance started                         driver=btrfs instance=win11test pool=default project=default
DEBUG  [2025-01-16T13:14:53Z] MountInstance finished                        driver=btrfs instance=win11test pool=default project=default
DEBUG  [2025-01-16T13:14:53Z] Skipping lxd-agent install as unchanged       installPath=/var/snap/lxd/common/lxd/virtual-machines/win11test/config/lxd-agent instance=win11test instanceType=virtual-machine project=default srcPath=/snap/lxd/30130/bin/lxd-agent
DEBUG  [2025-01-16T13:14:53Z] Starting device                               device=eth0 instance=win11test instanceType=virtual-machine project=default type=nic
DEBUG  [2025-01-16T13:14:53Z] Scheduler: network: tapa1dde8a0 has been added: updating network priorities 
DEBUG  [2025-01-16T13:14:53Z] Starting device                               device=root instance=win11test instanceType=virtual-machine project=default type=disk
DEBUG  [2025-01-16T13:14:53Z] Starting device                               device=vtpm instance=win11test instanceType=virtual-machine project=default type=tpm
WARNING[2025-01-16T13:14:53Z] Using writeback cache I/O                     devPath=/var/snap/lxd/common/lxd/storage-pools/default/virtual-machines/win11test/root.img device=root fsType=btrfs instance=win11test instanceType=virtual-machine project=default
DEBUG  [2025-01-16T13:14:53Z] UpdateInstanceBackupFile started              driver=btrfs instance=win11test pool=default project=default
DEBUG  [2025-01-16T13:14:53Z] UpdateInstanceBackupFile finished             driver=btrfs instance=win11test pool=default project=default
DEBUG  [2025-01-16T13:14:53Z] QMP monitor started                           path=/var/snap/lxd/common/lxd/logs/win11test/qemu.monitor
DEBUG  [2025-01-16T13:14:53Z] Scheduler: virtual-machine win11test started: re-balancing 
INFO   [2025-01-16T13:14:53Z] Action: instance-started, Source: /1.0/instances/win11test, Requestor: unix/root (@) 
INFO   [2025-01-16T13:14:53Z] ID: dc9acf02-bbc2-4ed0-bc0f-1e2682a343c3, Class: task, Description: Starting instance  CreatedAt="2025-01-16 13:14:52.994712085 +0000 UTC" Err= Location=none MayCancel=false Metadata="map[]" Resources="map[instances:[/1.0/instances/win11test]]" Status=Success StatusCode=Success UpdatedAt="2025-01-16 13:14:52.994712085 +0000 UTC"
DEBUG  [2025-01-16T13:14:53Z] Instance operation lock finished              action=start err="<nil>" instance=win11test project=default reusable=false
DEBUG  [2025-01-16T13:14:53Z] Start finished                                instance=win11test instanceType=virtual-machine project=default stateful=false
DEBUG  [2025-01-16T13:14:53Z] Success for operation                         class=task description="Starting instance" operation=dc9acf02-bbc2-4ed0-bc0f-1e2682a343c3 project=default
DEBUG  [2025-01-16T13:14:53Z] Event listener server handler stopped         listener=beec0abf-889a-4d4b-be26-376fe3962443 local=/var/snap/lxd/common/lxd/unix.socket remote=@
DEBUG  [2025-01-16T13:20:23Z] Handling API request                          ip=@ method=GET protocol=unix url=/1.0 username=root
DEBUG  [2025-01-16T13:20:23Z] Handling API request                          ip=@ method=GET protocol=unix url=/1.0/events username=root
DEBUG  [2025-01-16T13:20:23Z] Event listener server handler started         id=03d3ccdc-c362-432a-a3db-4c0614566d9f local=/var/snap/lxd/common/lxd/unix.socket remote=@
DEBUG  [2025-01-16T13:20:23Z] Handling API request                          ip=@ method=PUT protocol=unix url=/1.0/instances/win11test/state username=root
DEBUG  [2025-01-16T13:20:23Z] New operation                                 class=task description="Stopping instance" operation=ae3b9800-210e-458f-b257-3f1efd290410 project=default
INFO   [2025-01-16T13:20:23Z] ID: ae3b9800-210e-458f-b257-3f1efd290410, Class: task, Description: Stopping instance  CreatedAt="2025-01-16 13:20:23.745657275 +0000 UTC" Err= Location=none MayCancel=false Metadata="map[]" Resources="map[instances:[/1.0/instances/win11test]]" Status=Running StatusCode=Running UpdatedAt="2025-01-16 13:20:23.745657275 +0000 UTC"
DEBUG  [2025-01-16T13:20:23Z] Started operation                             class=task description="Stopping instance" operation=ae3b9800-210e-458f-b257-3f1efd290410 project=default
INFO   [2025-01-16T13:20:23Z] ID: ae3b9800-210e-458f-b257-3f1efd290410, Class: task, Description: Stopping instance  CreatedAt="2025-01-16 13:20:23.745657275 +0000 UTC" Err= Location=none MayCancel=false Metadata="map[]" Resources="map[instances:[/1.0/instances/win11test]]" Status=Pending StatusCode=Pending UpdatedAt="2025-01-16 13:20:23.745657275 +0000 UTC"
DEBUG  [2025-01-16T13:20:23Z] Shutdown started                              instance=win11test instanceType=virtual-machine project=default timeout=10m0s
DEBUG  [2025-01-16T13:20:23Z] Instance operation lock created               action=stop instance=win11test project=default reusable=true
DEBUG  [2025-01-16T13:20:23Z] Handling API request                          ip=@ method=GET protocol=unix url=/1.0/operations/ae3b9800-210e-458f-b257-3f1efd290410 username=root
DEBUG  [2025-01-16T13:20:23Z] Shutdown request sent to instance             instance=win11test instanceType=virtual-machine project=default

Startup works fine, but no response to the shutdown request. I first thought this might be an issue with older virtio drivers, so I installed https://fedorapeople.org/groups/virt/virtio-win/direct-downloads/archive-virtio/virtio-win-0.1.266-1/virtio-win-0.1.266.iso and restarted the VM, but that didn't help either.

When it's working (smaller sleep timer), one can see a reply a few seconds later.

DEBUG  [2025-01-16T12:12:33Z] Shutdown request sent to instance             instance=win11test instanceType=virtual-machine project=default
DEBUG  [2025-01-16T12:12:38Z] Instance stopped                              instance=win11test instanceType=virtual-machine project=default reason=guest-shutdown target=stop
DEBUG  [2025-01-16T12:12:38Z] Waiting for VM process to finish              instance=win11test instanceType=virtual-machine project=default
DEBUG  [2025-01-16T12:12:38Z] Instance operation lock inherited for stop    action=stop instance=win11test instanceType=virtual-machine project=default
DEBUG  [2025-01-16T12:12:38Z] onStop hook started                           instance=win11test instanceType=virtual-machine project=default target=stop
DEBUG  [2025-01-16T12:12:38Z] QMP monitor stopped                           path=/var/snap/lxd/common/lxd/logs/win11test/qemu.monitor
DEBUG  [2025-01-16T12:12:39Z] VM process finished                           instance=win11test instanceType=virtual-machine project=default
DEBUG  [2025-01-16T12:12:39Z] Stopping device                               device=vtpm instance=win11test instanceType=virtual-machine project=default type=tpm
DEBUG  [2025-01-16T12:12:39Z] Stopping device                               device=root instance=win11test instanceType=virtual-machine project=default type=disk
DEBUG  [2025-01-16T12:12:39Z] Stopping device                               device=eth0 instance=win11test instanceType=virtual-machine project=default type=nic
DEBUG  [2025-01-16T12:12:39Z] UnmountInstance started                       driver=btrfs instance=win11test pool=default project=default
DEBUG  [2025-01-16T12:12:39Z] UnmountInstance finished                      driver=btrfs instance=win11test pool=default project=default
INFO   [2025-01-16T12:12:39Z] Action: instance-shutdown, Source: /1.0/instances/win11test
DEBUG  [2025-01-16T12:12:39Z] onStop hook finished                          instance=win11test instanceType=virtual-machine project=default target=stop
DEBUG  [2025-01-16T12:12:39Z] Instance operation lock finished              action=stop err="<nil>" instance=win11test project=default reusable=true
DEBUG  [2025-01-16T12:12:39Z] Shutdown finished                             instance=win11test instanceType=virtual-machine project=default timeout=10m0s

When it's not working it runs in the 10 minute timeout:

DEBUG  [2025-01-16T13:30:23Z] Shutdown finished                             instance=win11test instanceType=virtual-machine project=default timeout=10m0s
DEBUG  [2025-01-16T13:30:23Z] Failure for operation                         class=task description="Stopping instance" err="Failed shutting down instance, status is \"Running\": context deadline exceeded" operation=ae3b9800-210e-458f-b257-3f1efd290410 project=default
INFO   [2025-01-16T13:30:23Z] ID: ae3b9800-210e-458f-b257-3f1efd290410, Class: task, Description: Stopping instance  CreatedAt="2025-01-16 13:20:23.745657275 +0000 UTC" Err="Failed shutting down instance, status is \"Running\": context deadline exceeded" Location=none MayCancel=false Metadata="map[]" Resources="map[instances:[/1.0/instances/win11test]]" Status=Failure StatusCode=Failure UpdatedAt="2025-01-16 13:20:23.745657275 +0000 UTC"
DEBUG  [2025-01-16T13:30:23Z] Event listener server handler stopped         listener=03d3ccdc-c362-432a-a3db-4c0614566d9f local=/var/snap/lxd/common/lxd/unix.socket remote=@

Am I missing something? Should I install something? Is it a bug? Is there way to specify the fail timeout (10min)?

@tomponline
Copy link
Member

tomponline commented Jan 21, 2025

Perhaps lxc/incus#1043 and lxc/incus#1280 could help.

@tomponline tomponline added Bug Confirmed to be a bug and removed Bug Confirmed to be a bug labels Jan 21, 2025
@tomponline tomponline added this to the lxd-6.3 milestone Jan 21, 2025
@gpelz-graph-it
Copy link
Author

lxc exec win11test -- dir also does not work and reports: Error: LXD VM agent isn't currently running.
Calling in a powershell: Get-Service QEMU-GA does show that the service is running:

Status   Name               DisplayName
------   ----               -----------
Running  QEMU-GA            QEMU Guest Agent

unstalling latest qemu-guest-agent from virtio-win-0.1.266.iso doesn't change this either.
=> maybe these are related.

@tomponline
Copy link
Member

lxc exec win11test -- dir also does not work and reports: Error: LXD VM agent isn't currently running. Calling in a powershell: Get-Service QEMU-GA does show that the service is running:

Status   Name               DisplayName
------   ----               -----------
Running  QEMU-GA            QEMU Guest Agent

unstalling latest qemu-guest-agent from virtio-win-0.1.266.iso doesn't change this either. => maybe these are related.

Well thats expected, as there is no Windows version of the lxd-agent. LXD doesn't communicate with the qemu-guest-agent.

@gpelz-graph-it
Copy link
Author

Ah, okay. Wasn't sure about that. :)

@gpelz-graph-it
Copy link
Author

Sending the shutdown notice twice, like suggested in lxc/incus#1043 (see: lxc/incus@5c9f4dc)
does work.
Tested it by calling (win11test = name of my vm):
lxc query -X PUT -d '{"action":"stop","timeout":1}' /1.0/instances/win1test/state && sleep 1 && lxc stop win11test
and
lxc query -X PUT -d '{"action":"stop"}' /1.0/instances/win11test/state && sleep 1 && lxc stop win11test
so once lxc/incus#1043 is integrated, the problem should be fixed.

kadinsayani pushed a commit to kadinsayani/lxd that referenced this issue Jan 24, 2025
This is mostly needed to get Windows to perform a clean shutdown, Linux
seems to be fine with just the one event.

Closes canonical#14816.

Signed-off-by: Stéphane Graber <[email protected]>
(cherry picked from commit 5c9f4dcaebada4648251c39176821d4238c5c1fc)
Signed-off-by: Kadin Sayani <[email protected]>
License: Apache-2.0
tomponline added a commit that referenced this issue Jan 24, 2025
…ress tracking to export (from Incus) (#14850)

Cherry-picks from lxc/incus#1043.

Closes #14816.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants