Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

guest/net: New implementation of network setup with SLAAC and own DHC… #111

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sbrivio-rh
Copy link
Contributor

…P client

The existing implementation has a couple of issues:

  • it doesn't support IPv6 or SLAAC

  • it relies on either dhclient(8) or dhcpcd(8), which need a significant amount of time to configure the network as they are rather generic DHCP clients

  • on top of this, dhcpcd, by default, unless --noarp is given, will spend five seconds ARP-probing the address it just received before configuring it

Replace the IPv4 part with a minimalistic, 73-line DHCP client that just does what we need, using option 80 (Rapid Commit) to speed up the whole exchange.

Add IPv6 support (including IPv4-only, and IPv6-only modes) relying on the kernel to perform SLAAC. Safely avoid DAD (we're the only node on the link) by disabling router solicitations, starting SLAAC, and re-enabling them once addresses are configured.

Instead of merely triggering the network setup and proceeding, wait until everything is configured, so that connectivity is guaranteed to be ready before any further process runs in the guest, say:

$ ./target/debug/muvm -- ping -c1 2a01:4f8:222:904::2
PING 2a01:4f8:222:904::2 (2a01:4f8:222:904::2) 56 data bytes
64 bytes from 2a01:4f8:222:904::2: icmp_seq=1 ttl=255 time=0.256 ms

--- 2a01:4f8:222:904::2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.256/0.256/0.256/0.000 ms

The whole procedure now takes approximately 1.5 to 2 ms (for both IPv4 and IPv6), with the DHCP exchange and configuration taking somewhere around 300-500 µs out of that, instead of hundreds of milliseconds to seconds.

Matching support in passt for option 80 (RFC 4039) and for the DHCP "broadcast" flag (RFC 2131) needs this series:

https://archives.passt.top/passt-dev/[email protected]/

[I'll update this commit message once we have an upstream release with it]

@sbrivio-rh
Copy link
Contributor Author

Supersedes #64. Test passt static builds, along with RPMs and Debian packages (x86 only, sorry) at https://passt.top/builds/latest/x86_64/

crates/muvm/Cargo.toml Outdated Show resolved Hide resolved
@sbrivio-rh sbrivio-rh force-pushed the main branch 2 times, most recently from ca7ea9c to e1eb9df Compare November 25, 2024 18:42
@sbrivio-rh
Copy link
Contributor Author

Hmm, I can't test this anymore after rebasing to latest upstream. I'm getting one of these two errors ("Failed to create the microVM" about 30% of the times, Failed to execute muvm-server as child process about 70%). These are attempts in a relatively tight loop:

$ ./target/debug/muvm -- /bin/false
Error: Failed to execute `muvm-server` as child process

Caused by:
    No such file or directory (os error 2)
$ ./target/debug/muvm -- /bin/false
Error: Failed to create the microVM

Caused by:
    Invalid argument (os error 22)
$ ./target/debug/muvm -- /bin/false
Error: Failed to create the microVM

Caused by:
    Invalid argument (os error 22)
$ ./target/debug/muvm -- /bin/false
Error: Failed to execute `muvm-server` as child process

Caused by:
    No such file or directory (os error 2)
$ ./target/debug/muvm -- /bin/false
Error: Failed to execute `muvm-server` as child process

Caused by:
    No such file or directory (os error 2)

With RUST_LOG=debug:

[2024-11-25T19:05:58Z ERROR krun] Building the microVM failed: GuestMemoryMmap(UnsortedMemoryRegions)

or:

[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::linux::passthrough] do_lookup: "muvm-server"
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::server] opcode: 3
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::worker] Fs: queue event: 1
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::server] opcode: 1
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::linux::passthrough] do_lookup: "muvm-server"
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::worker] Fs: queue event: 1
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::server] opcode: 1
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::linux::passthrough] do_lookup: "muvm-server"
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::worker] Fs: queue event: 1
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::server] opcode: 15
[2024-11-25T19:08:04Z DEBUG devices::virtio::fs::linux::passthrough] read: 56
Error: [2024-11-25T19:08:04Z DEBUG devices::virtio::fs::server] opcode: 25
Failed to execute `muvm-server` as child process

I reverted a few commits but I can't seem to get this to work anymore.

@sbrivio-rh
Copy link
Contributor Author

Oh, okay, I didn't pull for a while. It works if I do:

284b520 (HEAD) Revert "Share /dev/shm as a separate mount with DAX"
44ef0b6 Revert "Protect muvm-server with a cookie"
cdf7027 Revert "Start a privileged muvm-server"
df6c549 Revert "Implement a host memory monitor"

...posting another version of that commit with two functions taken out of configure_network() now that I can test things. Those other issues I'm facing, I have no idea where to start debugging them...

@sbrivio-rh
Copy link
Contributor Author

$ ./target/debug/muvm -- /bin/false
Error: Failed to execute `muvm-server` as child process

Caused by:
    No such file or directory (os error 2)

Fixed by #112

$ ./target/debug/muvm -- /bin/false
Error: Failed to create the microVM

Caused by:
    Invalid argument (os error 22)

Fixed by updating libkrun.

@sbrivio-rh
Copy link
Contributor Author

This is now supported by passt 2024_11_27.c0fbc7e, matching Fedora updates passt-0^20241127.gc0fbc7e-1.fc40, passt-0^20241127.gc0fbc7e-1.fc41, passt-0^20241127.gc0fbc7e-1.fc42, as well as Debian's passt-0.0~git20241127.c0fbc7e-1.

@slp
Copy link
Collaborator

slp commented Nov 28, 2024

Thanks a lot @sbrivio-rh , I really like this approach. A couple questions:

  • What should we do with resolv.conf? Perhaps I'm missing it, but doesn't seem like it's doing nameserver resolution.
  • This introduces a dependency on neli, but it's not yet packaged into Fedora. Do you want to package it yourself?

@sbrivio-rh
Copy link
Contributor Author

  • What should we do with resolv.conf? Perhaps I'm missing it, but doesn't seem like it's doing nameserver resolution.

Ah, right, I thought resolv.conf could simply be the one from the host, but with systemd-resolved it's common to have a loopback address as resolver, and that wouldn't work.

So, yes, I should also configure resolv.conf here from DHCP options (6 and 119) and NDP (RDNSS, option 25), passt already sends the right ones. Let me fix this.

  • This introduces a dependency on neli, but it's not yet packaged into Fedora. Do you want to package it yourself?

Oops, I didn't check. I'm not exactly the right person as I barely understand what a crate is (do I?) and I don't even use Fedora regularly, but yes, I can probably do that, it should be low effort.

I found this abandoned Copr by the way, https://copr.fedorainfracloud.org/coprs/zurdo/i3status-rs-update/package/rust-neli/, it looks pretty easy. Let me give it a try, but if you know somebody else who could be interested...

@sbrivio-rh
Copy link
Contributor Author

This introduces a dependency on neli, but it's not yet packaged into Fedora. Do you want to package it yourself?

Naive question: if it's statically linked, does it really become a dependency? Or is it just a build dependency? Does that matter also if the crate is downloaded as needed...?

@slp
Copy link
Collaborator

slp commented Nov 28, 2024

This introduces a dependency on neli, but it's not yet packaged into Fedora. Do you want to package it yourself?

Naive question: if it's statically linked, does it really become a dependency? Or is it just a build dependency? Does that matter also if the crate is downloaded as needed...?

It's just a build dependency. In Fedora, every crate you depend on must be independently packaged, and builds are done offline. Luckily, rust2rpm helps a lot with this.

@sbrivio-rh
Copy link
Contributor Author

It's just a build dependency. In Fedora, every crate you depend on must be independently packaged, and builds are done offline. Luckily, rust2rpm helps a lot with this.

Oh, oops, I just had a look at https://src.fedoraproject.org/user/slp/projects... let me package that. :)

@sbrivio-rh
Copy link
Contributor Author

So, yes, I should also configure resolv.conf here from DHCP options (6 and 119) and NDP (RDNSS, option 25), passt already sends the right ones. Let me fix this.

I just added support for nameservers over DHCP (option 6), omitting for the moment:

  • handling of the domain search list (option 119) also implemented by passt. It's at least 10-20 LoCs due to domain compression encoding from RFC 1035, and I doubt it's of any use for muvm... we can add it later for completeness anyway but I'd rather fix the current situation first
  • handling of NDP option 25 (RDNSS), implemented by passt as well, for DNS resolvers over SLAAC. That's also some extra bit of complexity. I think this one makes sense no matter what, for IPv6-only setups, but I'd rather avoid to make this patch explode at the moment

@sbrivio-rh
Copy link
Contributor Author

Oops, I just noticed the cargo fmt -- --check warnings, I thought cargo clippy would be enough. Fixing those as well...

@sbrivio-rh
Copy link
Contributor Author

Oops, I just noticed the cargo fmt -- --check warnings, I thought cargo clippy would be enough. Fixing those as well...

Gosh, the reformatted version looks horrible, with 100 columns that don't fit pretty much anywhere and things wildly misaligned. Is cargo fmt enforced?

@sbrivio-rh
Copy link
Contributor Author

It's just a build dependency. In Fedora, every crate you depend on must be independently packaged, and builds are done offline. Luckily, rust2rpm helps a lot with this.

Oh, oops, I just had a look at https://src.fedoraproject.org/user/slp/projects... let me package that. :)

Package reviews:
https://bugzilla.redhat.com/show_bug.cgi?id=2329411
https://bugzilla.redhat.com/show_bug.cgi?id=2329412

@slp
Copy link
Collaborator

slp commented Dec 3, 2024

Well, current spec file and corresponding error:

https://hastebin.skyra.pw/hudaqicari.prolog

We had this problem with some of the rust-vmm packages. If the LICENSE file is in the root of the workspace, it doesn't get included in the crate. The trick here is adding the file as a source as done here:

https://src.fedoraproject.org/rpms/rust-virtio-queue/blob/rawhide/f/rust-virtio-queue.spec

@sbrivio-rh
Copy link
Contributor Author

Well, current spec file and corresponding error:
https://hastebin.skyra.pw/hudaqicari.prolog

We had this problem with some of the rust-vmm packages. If the LICENSE file is in the root of the workspace, it doesn't get included in the crate. The trick here is adding the file as a source as done here:

https://src.fedoraproject.org/rpms/rust-virtio-queue/blob/rawhide/f/rust-virtio-queue.spec

That's what I did there (I think?):

Source1:        https://raw.githubusercontent.com/Nugine/const-str/00ce0d143164e1e393a5dcdda6ab2f0dbae9a243/LICENSE

[...]

%license %{crate_instdir}/LICENSE

...there must be some subtle detail I'm missing.

@sbrivio-rh
Copy link
Contributor Author

That's what I did there (I think?):

Source1:        https://raw.githubusercontent.com/Nugine/const-str/00ce0d143164e1e393a5dcdda6ab2f0dbae9a243/LICENSE

[...]

%license %{crate_instdir}/LICENSE

...there must be some subtle detail I'm missing.

Grr, of course. From the rust-virtio-queue spec file:

%prep
%autosetup -n %{crate}-%{version} -p1
%cargo_prep
cp %{SOURCE1} .
cp %{SOURCE2} .

Thanks @slp for the example.

@sbrivio-rh
Copy link
Contributor Author

Fedora package review request for rust-const-str at: https://bugzilla.redhat.com/show_bug.cgi?id=2330150

@teohhanhui
Copy link
Collaborator

Looks like the merge conflicts have not been resolved... 🙈

@sbrivio-rh
Copy link
Contributor Author

sbrivio-rh commented Jan 2, 2025

Looks like the merge conflicts have not been resolved... 🙈

I just clicked around randomly, I'm anyway not able to test it anymore because of #117 (comment) and a new Error: Address already in use (os error 98) coming out of nowhere (no EADDRINUSE from strace, and extra difficult to bisect because of that commit I need to keep reverted plus Cargo.lock committed in the repository... now writing some scripts to handle that at least for my git sanity).

I'll push fixed conflicts once I can test this again.

@sbrivio-rh
Copy link
Contributor Author

and a new Error: Address already in use (os error 98) coming out of nowhere

Oh, that's because my usual trick of commenting out setgid(gid).context("Failed to setgid")?; and setuid(uid).context("Failed to setuid")?; in setup_user() to get a root shell conveniently (I need to do stuff with iproute2 in the guest) doesn't work anymore. The second instance of muvm-server fails to start.

On the other hand, if I skip starting it, then everything hangs...

@WhatAmISupposedToPutHere
Copy link
Collaborator

WhatAmISupposedToPutHere commented Jan 2, 2025

Looks like the merge conflicts have not been resolved... 🙈

I just clicked around randomly, I'm anyway not able to test it anymore because of #17 (comment) and a new Error: Address already in use (os error 98) coming out of nowhere (no EADDRINUSE from strace, and extra difficult to bisect because of that commit I need to keep reverted plus Cargo.lock committed in the repository... now writing some scripts to handle that at least for my git sanity).

I'll push fixed conflicts once I can test this again.

Try using --sommelier, it might on a headless machine that way (sorry for close/reopen, literally a misclick)

@sbrivio-rh
Copy link
Contributor Author

sbrivio-rh commented Jan 2, 2025

Try using --sommelier, it might on a headless machine that way

...thanks, yes, that also works because it reverts the effect of commit b40bf03d2ce4 ("Default to using x11bridge instead of sommelier"), and I guess that the sommelier path has a proper fallback path (I didn't really check): I don't have sommelier installed.

On the other hand I guess we should have the same fallback with no options at all (at least as long as muvm is supposed to be a generic tool).

@slp
Copy link
Collaborator

slp commented Jan 9, 2025

Fedora package review request for rust-const-str at: https://bugzilla.redhat.com/show_bug.cgi?id=2330150

Looks like the package review got stuck because it depends on packaging rust-const-str-proc-macro first (same repo, different crate). Do you need a hand with that?

@sbrivio-rh
Copy link
Contributor Author

Fedora package review request for rust-const-str at: https://bugzilla.redhat.com/show_bug.cgi?id=2330150

Looks like the package review got stuck because it depends on packaging rust-const-str-proc-macro first (same repo, different crate). Do you need a hand with that?

Thanks for the offer! But actually, this is stuck because I'm waiting for jbaublitz/neli#241. It's not blocking, but I thought it makes sense to wait a bit anyway as the author is quite close to merge it. Once it's merged, I would then spend some "hours of fun" on packaging matters (not just that package) and try to sort all those.

List of things (kind of) blocking this at the moment:

@WhatAmISupposedToPutHere
Copy link
Collaborator

  • I can't simply comment out the setuid()/setgid() calls anymore to get a guest root shell. That would be pretty convenient to test the whole thing, see

There is now an "official" way to getting the root shell, by running sth like muvm -p 3335 -it bash from a separate terminal tab.

@sbrivio-rh
Copy link
Contributor Author

There is now an "official" way to getting the root shell, by running sth like muvm -p 3335 -it bash from a separate terminal tab.

Ah, thanks, I didn't know that. For some reason it hangs for a while, then aborts for me:

$ target/debug/muvm -p 3335 -it bash
Error: could not request launch to server: could not connect to muvm server: Connection reset by peer (os error 104)

with the guest started as target/debug/muvm --sommelier bash.

On the other hand, it's much less usable for me than something like muvm -r / --root (you know, I'm using a terminal for it already...), so I'd rather invest a moment into implementing that if I ever find the time, rather than debugging this.

@sbrivio-rh
Copy link
Contributor Author

sbrivio-rh commented Jan 15, 2025

Now superseded by jbaublitz/neli#256 (merged), so I can proceed with everything. I'll continue as soon as I find a couple of hours in a row.

…P client

The existing implementation has a couple of issues:

- it doesn't support IPv6 or SLAAC

- it relies on either dhclient(8) or dhcpcd(8), which need a
  significant amount of time to configure the network as they are
  rather generic DHCP clients

- on top of this, dhcpcd, by default, unless --noarp is given, will
  spend five seconds ARP-probing the address it just received before
  configuring it

Replace the IPv4 part with a minimalistic, 90-line DHCP client that
just does what we need, using option 80 (Rapid Commit) to speed up
the whole exchange.

Add IPv6 support (including IPv4-only, and IPv6-only modes) relying
on the kernel to perform SLAAC. Safely avoid DAD (we're the only
node on the link) by disabling router solicitations, starting SLAAC,
and re-enabling them once addresses are configured.

Instead of merely triggering the network setup and proceeding, wait
until everything is configured, so that connectivity is guaranteed to
be ready before any further process runs in the guest, say:

  $ ./target/debug/muvm -- ping -c1 2a01:4f8:222:904::2
  PING 2a01:4f8:222:904::2 (2a01:4f8:222:904::2) 56 data bytes
  64 bytes from 2a01:4f8:222:904::2: icmp_seq=1 ttl=255 time=0.256 ms

  --- 2a01:4f8:222:904::2 ping statistics ---
  1 packets transmitted, 1 received, 0% packet loss, time 0ms
  rtt min/avg/max/mdev = 0.256/0.256/0.256/0.000 ms

The whole procedure now takes approximately 1.5 to 2 ms (for both
IPv4 and IPv6), with the DHCP exchange and configuration taking
somewhere around 300-500 µs out of that, instead of hundreds of
milliseconds to seconds.

Configure nameservers received via DHCP option 6 as well: passt
already takes care care of translating DNS traffic directed to
loopback addresses read from resolv.conf, so we can just write those
to resolv.conf in the guest.

At least for the moment being, for simplicity, omit handling of
option 119 (domain search list), as I doubt it's going to be of much
use for muvm.

I'm not adding handling of the NDP RDNSS option (25, RFC 8106) either,
for the moment, as it involves a second netlink socket subscribing to
the RTNLGRP_ND_USEROPT group and listening to events while we receive
the first router advertisement. The equivalent userspace tool would be
rdnssd(8), which is not called before this change anyway. I would
rather add it at a later time instead of making this patch explode.

Matching support in passt for option 80 (RFC 4039) and for the DHCP
"broadcast" flag (RFC 2131) needs at least passt 2024_11_27.c0fbc7e:

  https://archives.passt.top/passt-user/20241127142126.3c53066e@elisabeth/

Signed-off-by: Stefano Brivio <[email protected]>
Co-authored-by: Teoh Han Hui <[email protected]>
@sbrivio-rh
Copy link
Contributor Author

Now it should be all clean again. I also added handling of option 26 (Interface MTU), as passt can conveniently use 65520 bytes (the host kernel does segmentation for TCP anyway). That's one small step for a DHCP client, one giant leap for throughput.

I can't fathom for the life of me how you all can live without version logs for patches.

@sbrivio-rh
Copy link
Contributor Author

I sent a new version for (package) review. Now we have an additional problem though:

The updated version, with jbaublitz/neli#182, needs derive-builder 0.11. Fedora packages 0.20.2. I'll pretend I know what I'm doing and send a patch for https://github.com/jbaublitz/neli/, but generally speaking this doesn't seem like a viable approach to development.

@sbrivio-rh
Copy link
Contributor Author

Additional review request: rust-const-str-proc-macro, https://bugzilla.redhat.com/show_bug.cgi?id=2338679.

@sbrivio-rh sbrivio-rh mentioned this pull request Jan 17, 2025
@teohhanhui
Copy link
Collaborator

@sbrivio-rh It's specified by Fedora's Rust packaging guidelines: https://docs.fedoraproject.org/en-US/packaging-guidelines/Rust/#_packaging_multiple_versions

@sbrivio-rh
Copy link
Contributor Author

@sbrivio-rh It's specified by Fedora's Rust packaging guidelines: https://docs.fedoraproject.org/en-US/packaging-guidelines/Rust/#_packaging_multiple_versions

Sure, that makes sense, I just think that having a dependency manager and in general an ecosystem happily pulling in whatever new crate du jour (that's just Rust and it's all fine per se) along with the requirement that every and each dependency crate is packaged separately (and that's Fedora's policy) poses a significant barrier for contributors (especially occasional ones), and not in a good way, without technical merit or quality benefit whatsoever. Cargo already enforces what's needed.

This thing was pretty much ready three months ago. But okay, sure, also libnl needed to be packaged for all distributions eventually. I hope we're almost done adding packages (for this).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants