Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kbs/plugins: Add ID_KEY plugin #698

Closed

Conversation

tylerfanelli
Copy link
Contributor

@tylerfanelli tylerfanelli commented Feb 6, 2025

The ID_KEY resource backend creates a new key tied to a Trustee server instance and mostly pertains to workloads that want a key unwrapping as a result of a successful attestation.

Admins can POST to the repository giving a base64 encoded byte vector as the resource tag. This resource tag will then be encrypted with the key and returned to the admin. Upon successful attestation, attestation clients can then send the encrypted data that was provisioned by the admin to the backend via a GET request. The resource tag in this case must be a base64 encoding of the encrypted data's bytes. The data will then be decrypted and sent back to the attestation client.

I will likely provide another PR in the near future allowing the server's encryption key to be derived from an HSM module for security purposes. While it won't be required, it will be preferred.


Client use-case

SVSM will require attestation to establish trust and unlock persistent vTPM state to run CVMs. Virtual disks are made available to accommodate this and bootstrap persistent vTPMs. A block device will include:

  • OS rootfs sealed with vTPM
  • vTPM NVDATA (required for unsealing) encrypted with an "NVDATA Encryption Key" (NEK).
  • NEK wrapped by a key on an attestation server.
┌────────────────────────────┐
│                            │
│  ┌──────────────────────┐  │
│  │vTPM-sealed rootfs    │  │
│  └──────────────────────┘  │
│  ┌──────────────────────┐  │
│  │Encrypted vTPM NVDATA │  │
│  │(wrapped by NEK)      │  │
│  └──────────────────────┘  │
│  ┌──────────────────────┐  │
│  │NEK (wrapped by key on│  │
│  │attestation server)   │  │
│  └──────────────────────┘  │
│                            │
│                Virtual disk│
└────────────────────────────┘

A simple unlocking and boot would be as follows:

  • SVSM attests, gives encrypted NEK to attestation server, server unwraps it with ID-key.

  • SVSM uses unwrapped NEK to decrypt vTPM NVDATA, sets up vTPM.

  • SVSM boots OS with vTPM.

  • OS initrd (systemd-cryptenroll) unseals and mounts rootfs.

With this chain, an OS will not be able to boot without SVSM first attesting itself.

Admin set-up

Initializing and setup of the virtual disks should be done from the guest owner themselves. They should also be the ones in control (with admin privileges/credentials) of the Trustee server in which their CVMs will attest with. As such, the admin will:

  • Initialize a "fresh" vTPM
  • Seal their OS's rootfs to the newly-created vTPM
  • Create a new symmetric NEK
  • Encrypt the vTPM's NVDATA with the newly-created NEK
  • Wrap the NEK with a key on an attestation server

Once completed, the guest owner can supply this virtual disk to untrusted parties for running CVMs.

The highlighted portion pertains to this PR. Because the NEK (and corresponding key stored on attestation server that wraps it) is sensitive, only the KBS admin must be able to wrap NEKs that the KBS generates. I also plan to add support for deriving the wrap key from an HSM at some point, so an admin could just configure any KBS to use a key stored on the HSM as the wrap key.

Using the existing resource backends

Recall how the virtual disks were presented to CVMs:

┌────────────────────────────┐
│                            │
│  ┌──────────────────────┐  │
│  │vTPM-sealed rootfs    │  │
│  └──────────────────────┘  │
│  ┌──────────────────────┐  │
│  │Encrypted vTPM NVDATA │  │
│  │(wrapped by NEK)      │  │
│  └──────────────────────┘  │
│  ┌──────────────────────┐  │
│  │NEK (wrapped by key on│  │
│  │attestation server)   │  │
│  └──────────────────────┘  │
│                            │
│                Virtual disk│
└────────────────────────────┘

As an alternative, we could store the NEK on the KBS as a retrievable resource. We could then provide the resource ID of the NEK so that the client would know which KBS resource to fetch once an attestation complete. The virtual disk would then look like this:

┌────────────────────────────┐
│                            │
│  ┌──────────────────────┐  │
│  │vTPM-sealed rootfs    │  │
│  └──────────────────────┘  │
│  ┌──────────────────────┐  │
│  │Encrypted vTPM NVDATA │  │
│  │(wrapped by NEK)      │  │
│  └──────────────────────┘  │
│  ┌──────────────────────┐  │
│  │KBS resource ID of NEK│  │
│  │                      │  │
│  └──────────────────────┘  │
│                            │
│                Virtual disk│
└────────────────────────────┘

With this, we could use the existing resource backends in Trustee to complete our attestation. However, there is a few principles that prevent us from doing this:

  • Although the resource ID would be given in the virtual disk, there is nothing preventing a motivated bad actor from resources that pertain to other VMs, such as their NEKs. With some inside knowledge, an attacker could steal other VM NEKs and impersonate other VMs with their vTPM endorsement keys simply by knowing which resource ID corresponds to another VMs NEK.

  • As we cannot guarantee that all clients using persistent vTPMs in this manner will use Trustee to attest, the virtual disk format cannot tie itself to any Trustee-specific mechanism (like giving a resource ID tag of the NEK). The virtual disk format must be server-agnostic. This applies for all attestation protocols, for that matter.

  • As the virtual disk must be server-agnostic, the result of an attestation should only be the unwrapping of the NEK. No other information is supplied to the virtual disk.

ID-key Plugin

Recall the point made above: the result of an attestation should only be the unwrapping of the NEK. That is exactly what this plugin provides.

  • A way for admins to provision and encrypt NEKs
  • An endpoint where CVMs can POST their NEKs for decryption

It allows for Trustee to be used for vTPM NVDATA encryption in a way it wasn't able to be used with the normal resource backend mechanisms. It still requires that an attested client present its attestation token when decrypting an NEK (much like when fetching a resource), but also allows the POSTing of plaintext NEK data.

Preventing replay attacks

To prevent replay attacks (a third party snooping on the encrypted NEK in order to replay and get it decrypted), the ECDH mechanism for wrapping the encrypted NEK before sending it to the plugin is implemented. With this, every time an encrypted NEK is sent to the KBS, it is wrapped with an ephemeral EC key such that no third party can derive the original encrypted NEK and replay it.

@tylerfanelli tylerfanelli requested a review from a team as a code owner February 6, 2025 02:49
@fitzthum
Copy link
Member

fitzthum commented Feb 7, 2025

I'll take a look at the code tomorrow.

On a high level, what does this give you that you can't get from an existing backend? Let's say I'm an admin. I post some bytes to the plugin and get back the wrapped bytes. I give those wrapped bytes to my guest. Then the guest can get the unwrapped version from Trustee. How is that different from an admin simply uploading a secret to Trustee and then giving the client the name of the secret?

Also, are you worried about replay attacks?

@tylerfanelli
Copy link
Contributor Author

tylerfanelli commented Feb 7, 2025

I'll take a look at the code tomorrow.

On a high level, what does this give you that you can't get from an existing backend? Let's say I'm an admin. I post some bytes to the plugin and get back the wrapped bytes. I give those wrapped bytes to my guest. Then the guest can get the unwrapped version from Trustee. How is that different from an admin simply uploading a secret to Trustee and then giving the client the name of the secret?

The issue with this is that there's no way of preventing the client from fetching a secret that they're not supposed to. If there is more than one VM attempting to attest, there's no stopping them from requesting the secrets of another VM. This simplifies it by only allowing that VM's secret available to it. We only care about one secret in this scenario.

In other words, if an attacker knows the secret name, it can't just try to request it itself. The standard backend is a bit susceptible to this. This offers isolation, as it doesn't even really store any secrets, but relies on the client to store them.

It also allows a key to be derived from an HSM (not a part of this series). If the admin also has access to the HSM, they can verify their "root key" is never compromised yet was used to encrypt data (in our case, block storage).

Also, are you worried about replay attacks?

Yep. A man-in-the-middle could sniff the URL and just replay to get the decrypted secret. This is functional but unsafe, I'm thinking of adding a ECDH (ephemeral key) exchange. The public keys could be supplied via the request's body.

@tylerfanelli tylerfanelli force-pushed the resource-idkey branch 4 times, most recently from a10c871 to cadd86d Compare February 7, 2025 07:00
@tylerfanelli tylerfanelli marked this pull request as draft February 7, 2025 07:01
@tylerfanelli
Copy link
Contributor Author

This is now ready for review, but has a few blockers before it can be considered for merging. You will notice the final commit is prefixed by a FIXME tag. To facilitate the ECDH exchange, I've added a {de}serializable type to kbs-types for the data needed. The PR is here: virtee/kbs-types#57

I've included the type (as well as the associated {de}serialization methods usually found in kbs-types) in the id-key module, but once the kbs-types PR is merged, it can be removed. However, the PR is also based on a later version of kbs-types that broke the API, so I'm also blocked by #597 (which updates the API and kbs-types version) as well. Once both of these are handled, this will be ready for merging.

@fitzthum When you have the chance, would you mind reviewing?

@fitzthum
Copy link
Member

The issue with this is that there's no way of preventing the client from fetching a secret that they're not supposed to. If there is more than one VM attempting to attest, there's no stopping them from requesting the secrets of another VM. This simplifies it by only allowing that VM's secret available to it. We only care about one secret in this scenario.

The best mechanism for this kind of thing is probably init-data. With EAR we now have really graceful support for exposing the init-data to the policy. We still need some piping to get it from the KBS to the AS, but that will come soon.

Even with ECDH I'm a little unclear how this helps differentiate between two guests. How does the wrapped key get into the guest? If this is not provided via a measured channel, then can't a malicious host generate multiple guests with the same key, switch keys between guests, etc.

@fitzthum
Copy link
Member

Can you add a little description of this backend to this doc https://github.com/confidential-containers/trustee/blob/main/kbs/docs/resource_storage_backend.md

@tylerfanelli
Copy link
Contributor Author

The issue with this is that there's no way of preventing the client from fetching a secret that they're not supposed to. If there is more than one VM attempting to attest, there's no stopping them from requesting the secrets of another VM. This simplifies it by only allowing that VM's secret available to it. We only care about one secret in this scenario.

The best mechanism for this kind of thing is probably init-data. With EAR we now have really graceful support for exposing the init-data to the policy. We still need some piping to get it from the KBS to the AS, but that will come soon.

Init-data certainly helps with attestation reference values in the RVPS, but I'm not convinced it helps here. Keep in mind that any policy would need to have full understanding of what resources are available, what VM is entitled to a respective resource, etc. I want this to be done dynamically, therefore I can't rely on a policy to enforce this isolation.

Even with ECDH I'm a little unclear how this helps differentiate between two guests.

The ECDH does nothing to differentiate between guests, it merely wraps the encrypted key (which is used to differentiate guests) to prevent replay attacks.

How does the wrapped key get into the guest?

A virtual disk.

If this is not provided via a measured channel, then can't a malicious host generate multiple guests with the same key, switch keys between guests, etc.

For my use-case, this resource backend cannot solve this problem standalone. For instance, generating multiple guests with the same key can be detected later on with a monotonic vTPM boot counter.

Switching keys between guests would be a denial of service, which I'm not concerned about since it doesn't leak any secrets or sensitive info.

@tylerfanelli
Copy link
Contributor Author

Can you add a little description of this backend to this doc https://github.com/confidential-containers/trustee/blob/main/kbs/docs/resource_storage_backend.md

Sure.

Copy link
Member

@fitzthum fitzthum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks clean but I have a few high-level-ish questions.

@fitzthum
Copy link
Member

Switching keys between guests would be a denial of service, which I'm not concerned about since it doesn't leak any secrets or sensitive info.

The main guarantee that I see from this approach is that the KBS knows that it is releasing a secret to someone who has already in some way (maybe indirectly) been in contact with it. This is something, but I don't see how it differentiates one guest from another.

I know it's clunky to update the policy dynamically, but if you're limiting which guests can access which resources, won't the KBS have to be in the loop with that somehow? Additionally it seems like an identity is probably something you want to measure.

Either way, it doesn't hurt much to add a plugin. Let's see what Ding thinks once the changes are ready.

@tylerfanelli
Copy link
Contributor Author

tylerfanelli commented Feb 10, 2025

Switching keys between guests would be a denial of service, which I'm not concerned about since it doesn't leak any secrets or sensitive info.

The main guarantee that I see from this approach is that the KBS knows that it is releasing a secret to someone who has already in some way (maybe indirectly) been in contact with it. This is something, but I don't see how it differentiates one guest from another.

When you say "indirectly" are you alluding to the attestation token that was provided from a successful attestation?

I know it's clunky to update the policy dynamically, but if you're limiting which guests can access which resources, won't the KBS have to be in the loop with that somehow? Additionally it seems like an identity is probably something you want to measure.

I don't see why the KBS needs to be in the loop. The randomness of the encryption key itself is sufficient for acting as an identity. In my case, we're verifying that a guest has attested its boot environment, and now wants to unlock some storage that it's carrying locally. The beginning step of that is unwrapping the storage encryption key that it also has locally.

The dynamic nature of this makes it scalable. An admin can provision as many keys as needed for their VM fleet.

@fitzthum
Copy link
Member

When you say "indirectly" are you alluding to the attestation token that was provided from a successful attestation?

I say indirectly because as soon as the encrypted key passes through the untrusted host it could be stolen and used by other guests. Anyone who shows up with that key and a valid enclave will get the secret.

I don't see why the KBS needs to be in the loop. The randomness of the encryption key itself is sufficient for acting as an identity. In my case, we're verifying that a guest has attested its boot environment, and now wants to unlock some storage that it's carrying locally. The beginning step of that is unwrapping the storage encryption key that it also has locally.

Maybe you should spell out the full flow. I don't think we're on the same page here.

@tylerfanelli tylerfanelli changed the title kbs/plugins/resource: Add ID_KEY resource backend kbs/plugins: Add ID_KEY plugin Feb 11, 2025
@tylerfanelli tylerfanelli marked this pull request as ready for review February 11, 2025 06:11
@tylerfanelli tylerfanelli force-pushed the resource-idkey branch 2 times, most recently from 0ca3a21 to f1d86d1 Compare February 11, 2025 06:13
The ID_KEY plugin creates a new key tied to a Trustee server instance
and mostly pertains to workloads that want a key unwrapping as a
result of a successful attestation.

Admins can POST to the plugin giving a base64 encoded byte vector as
the tag. This byte vector will then be encrypted with the key and
returned to the admin. Upon successful attestation, attestation
clients can then send the encrypted byte vector that was wrapped with
the server's encryption key to the plugin via a GET request. The tag
in this case must be a base64 encoding of the encrypted data's bytes.
The data will then be decrypted and sent back to the attestation
client.

Signed-off-by: Tyler Fanelli <[email protected]>
@tylerfanelli
Copy link
Contributor Author

tylerfanelli commented Feb 18, 2025

Maybe you should spell out the full flow. I don't think we're on the same page here.

Sure, I've added some documentation showing the flow of how this plugin is expected to be used. It doesn't touch on some of the details I'll be adding in another patch series such as HSM support. I can amend the docs to include this info at that time.

Let me know if you have questions or concerns.

@fitzthum
Copy link
Member

Thanks for adding the doc. I still don't totally understand the purpose although this is starting to remind me of a "remote TPM." In the past we've talked a little bit about using trusty as a proxy for a vTPM. This didn't go anywhere because it would require major changes to the KBS protocol (because writes to the TPM are sensitive and would need to be encrypted).

Can we get review from some other @confidential-containers/trustee-maintainers

Also, this should probably be behind a feature flag.

Copy link
Contributor

@mkulke mkulke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have problems understanding the rationale behind the plugin, also after reading the docs and the comments. there seems to be some conceptual overlap with sealed secrets with the addition of a resource creating endpoint.

afaik we attempt to group stateful operations withing the realm of an admin hierarchy, which is guarded by some form of auth. Can we do something similar with the POST endpoint here?

Will this plugin work with stateless KBS deployments (i.e. you cannot perform write ops on the API)?

Would you be able to add a simple use-case and describe how the existing facilities are lacking and how the plugin will address this to the PR description?


To prevent replay attacks, the CVM must wrap the encrypted data in a ECDH key cipher. To support this, the ID-key plugin presents an endpoint `ecdh-pub-sec1` that a VM can use to fetch the EC public key in order to wrap the encrypted data.

`GET /kbs/v0/id-key/ecdh-pub-sec1`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ecdh-pub-sec1 seems to be a magic string on the /kbs/v0/id-key/{string} route, should we move it to another hierarchy?

Copy link
Contributor Author

@tylerfanelli tylerfanelli Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To prevent replay attacks, the encrypted data is wrapped with an ECDH cipher. To build this cipher, the client needs the server's EC public key. This endpoint allows for fetching that public key.

I chose SEC1 encoding because its a "standard" format for encoding EC public keys, found in Elliptic-Curve-Point-to-Octet-String encoding described in SEC 1: Elliptic Curve Cryptography (Version 2.0) section 2.3.3 (page 10).

What exactly do you mean by "hierarchy"? I think the existing way it's done is fine, but am open to suggestions if you think there could be improvements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, this is also just a question about API design. it's not optimal to have a string with special meaning on the same hiearchy as resource descriptions: like /apple/count-apples and /apple/1, /apple/2, etc....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, I can modify for the public key to be fetched at:
/kbs/v0/id-key/ecdh-pub-sec1

While the key decryption is at an endpoint like:
/kbs/v0/id-key/decrypt/...

Would this be preferred?

@@ -0,0 +1,84 @@
# ID-key Plugin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the rationale behind the name? it's not obvious to me what an ID-Key is, maybe spell that out in the doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was sort-of a reference to @fitzthum's The Mystery of the KBS Identity. Since the NEK was encrypted from a known key provisioned on a known-trusted source, the very fact that it could be decrypted indicated that the client was communicating with the "true" KBS server.

Not saying a good reference, just a reference 😄 . If you feel another name is more appropriate, I'm open to changing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but isn't the identity of the KBS settled by the KBS cert that we inject into a guest in init-data?


A credentialed guest owner is the only one authorized to request that keys be encrypted by the ID-key plugin. To do this, the guest owner will POST some plaintext data (created locally) to be encrypted. This plaintext data could be a LUKS passphrase, SecureBoot keys, etc. The actual purpose of the data is of no concern to the KBS.

`POST /kbs/v0/id-key/{base64-plaintext-data}`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would POST /kbs/v0/id-key w/ the plaintext in the body make more sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fail to see the difference in supplying it in the body versus the URL itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit unusual/counter-intuitive for rest-ish APIs to use payload as part of the resource uri in such a request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can make the change for this. Perhaps it can just be /kbs/v0/id-key/register with the plaintext in the body?


### Provisioning encrypted data

A credentialed guest owner is the only one authorized to request that keys be encrypted by the ID-key plugin. To do this, the guest owner will POST some plaintext data (created locally) to be encrypted. This plaintext data could be a LUKS passphrase, SecureBoot keys, etc. The actual purpose of the data is of no concern to the KBS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you specify guest-owner? would the POST endpoint be called from within a CVM? or would a this be performed by someone who has admin access to Trustee?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use "guest owner" as the one who has admin access to Trustee, and will be running their "owned" CVMs on an untrusted platform.

@tylerfanelli
Copy link
Contributor Author

@fitzthum @mkulke

Sorry for the delay.

I've updated the PR description outlining the use-case and how existing facilities are lacking. I'll address the individual review questions under their respective conversations.

@mkulke
Copy link
Contributor

mkulke commented Feb 20, 2025

@fitzthum @mkulke

Sorry for the delay.

I've updated the PR description outlining the use-case and how existing facilities are lacking. I'll address the individual review questions under their respective conversations.

Thank you for providing extensive context, it wasn't obvious to me that this is supporting a non-CoCo use case, so we don't have init-data or sealed secrets facilities. I'll attempt to review this more thoroughly with that in mind (but it's not trivial, and I have limited cycles atm, I'd hope someone else can chime in: cc @confidential-containers/trustee-maintainers).

I do have the feeling that the problem could/should either be generalized so it can be supported in the plain kbs API or, on the other end, that the requirements are so specific that they would warrant a /coconut plugin.

@fitzthum
Copy link
Member

fitzthum commented Feb 20, 2025

Ok. Thanks for adding the details about the use case. It helps to see the bigger picture.

I think you can use a standard resource request for this. You mention a few reasons why you didn't go that way.

Although the resource ID would be given in the virtual disk, there is nothing preventing a motivated bad actor from resources that pertain to other VMs, such as their NEKs. With some inside knowledge, an attacker could steal other VM NEKs and impersonate other VMs with their vTPM endorsement keys simply by knowing which resource ID corresponds to another VMs NEK.

To pass an attestation check, the guest must be running the expected SVSM code. Via attestation we know that this guest will have an SVSM that correctly emulates a vTPM and safeguards its endorsement key. The SVSM is protected both from the host and the guest. Probably you would already agree that we aren't worried that the nvdata will be stolen or extricated regardless of whether we use id-key or standard resources.

The concern is that one guest might be able to "impersonate" another. To me this sounds like we're heading for the tricky territory of identity. There's a lot that could be said here, but the most important thing is that I don't think id-key resolves this. You mention that an attacker could impersonate another guest if they know the resource uri of the NEK. This is equivalent to an attacker knowing the wrapped id-key. Consider a malicious host that simply switches two virtual disks. The result would be exactly the same with id-key or standard resources. It might seem like it's easier to guess the name of a resource than an id-key, but this isn't something to bank on (and you could always randomly generate your resource ids).

In your design it looks like you are not measuring anything related to the virtual disk. It seems like all your guests will look exactly the same in terms of TCB. In that case, there is very little you can do to distinguish them confidentially. I don't know exactly how you can provide guarantees about the identity, but a good first step is to measure the virtual disk somehow. For instance, you could put a hash of the nvdata into the init-data.

Possibly a better option is to simply not worry about guest identity. If I start a guest with the wrong nvdata, it simply becomes a different guest. Sure maybe someone can unlock my root disk if they steal my nvdata, but my root disk has my ssh public key inside of it so it's no big deal (you can't get into it). If my guest is designed correctly, it shouldn't be leaking secrets no matter who runs it. I think this approach is very reasonable and much simpler.

As we cannot guarantee that all clients using persistent vTPMs in this manner will use Trustee to attest, the virtual disk format cannot tie itself to any Trustee-specific mechanism (like giving a resource ID tag of the NEK). The virtual disk format must be server-agnostic. This applies for all attestation protocols, for that matter.

I don't see how id-key is any more generic than resource uris. In fact the resource uri thing is just a key value store, which seems easier for someone else to implement than id-key.

As the virtual disk must be server-agnostic, the result of an attestation should only be the unwrapping of the NEK. No other information is supplied to the virtual disk.

I don't quite understand this point.

@tylerfanelli
Copy link
Contributor Author

tylerfanelli commented Feb 21, 2025

Ok. Thanks for adding the details about the use case. It helps to see the bigger picture.

I think you can use a standard resource request for this. You mention a few reasons why you didn't go that way.

Although the resource ID would be given in the virtual disk, there is nothing preventing a motivated bad actor from resources that pertain to other VMs, such as their NEKs. With some inside knowledge, an attacker could steal other VM NEKs and impersonate other VMs with their vTPM endorsement keys simply by knowing which resource ID corresponds to another VMs NEK.

To pass an attestation check, the guest must be running the expected SVSM code. Via attestation we know that this guest will have an SVSM that correctly emulates a vTPM and safeguards its endorsement key.

It's not SVSM who the KBS is talking to. Since SVSM does not have network access, it uses a proxy on the untrusted host for network access.

The concern is that one guest might be able to "impersonate" another. To me this sounds like we're heading for the tricky territory of identity.

It's not that a guest can get another's NEK (at that point, what could it do with it?). It's that the untrusted proxy could get ahold of one of its VM's (the proxy serves all VMs running on the system) NEK.

There's a lot that could be said here, but the most important thing is that I don't think id-key resolves this. You mention that an attacker could impersonate another guest if they know the resource uri of the NEK. This is equivalent to an attacker knowing the wrapped id-key.

Not quite. With the resource URI, the proxy knows the exact resource name of a guest's NEK (it literally is responsible for building the resource URL). It can then use another guest's token to replay that very same resource fetch. This is impossible with id-key, as the encrypted NEK is wrapped with an ephemeral EC key such that the proxy can never replay it.

Consider a malicious host that simply switches two virtual disks. The result would be exactly the same with id-key or standard resources. It might seem like it's easier to guess the name of a resource than an id-key, but this isn't something to bank on (and you could always randomly generate your resource ids).

There's other mitigations for detecting this scenario, such as vTPM monotonic boot counters and rolling hashes of vTPM EKs on boot for some specific NVDATA. This is not being solved by id-key.

In your design it looks like you are not measuring anything related to the virtual disk. It seems like all your guests will look exactly the same in terms of TCB. In that case, there is very little you can do to distinguish them confidentially. I don't know exactly how you can provide guarantees about the identity, but a good first step is to measure the virtual disk somehow. For instance, you could put a hash of the nvdata into the init-data.

The NEK is how the KBS distinguishes VMs to an extent. I do like the idea of hashing the NVDATA into the init-data, but I cannot always guarantee that HOST_DATA will be settable (on a cloud instance, for example).

As we cannot guarantee that all clients using persistent vTPMs in this manner will use Trustee to attest, the virtual disk format cannot tie itself to any Trustee-specific mechanism (like giving a resource ID tag of the NEK). The virtual disk format must be server-agnostic. This applies for all attestation protocols, for that matter.

I don't see how id-key is any more generic than resource uris. In fact the resource uri thing is just a key value store, which seems easier for someone else to implement than id-key.

I see your point, I suppose the data in the "NEK buffer" could be arbitrary, and treated differently based on how the server is implemented. My above points still stand though.

@fitzthum
Copy link
Member

It's not that a guest can get another's NEK (at that point, what could it do with it?). It's that the untrusted proxy could get ahold of one of its VM's (the proxy serves all VMs running on the system) NEK.

If the KBS protocol is terminated properly in the guest, the proxy is not relevant here.

Not quite. With the resource URI, the proxy knows the exact resource name of a guest's NEK (it literally is responsible for building the resource URL). It can then use another guest's token to replay that very same resource fetch. This is impossible with id-key, as the encrypted NEK is wrapped with an ephemeral EC key such that the proxy can never replay it.

It's true that id-key protects the request as it travels from the guest to the KBS, but that really doesn't matter since the virtual disk is exposed to the host.

There's other mitigations for detecting this scenario, such as vTPM monotonic boot counters and rolling hashes of vTPM EKs on boot for some specific NVDATA. This is not being solved by id-key.

If these approaches solve the impersonation problem, then you definitely don't need id-key.

The NEK is how the KBS distinguishes VMs to an extent. I do like the idea of hashing the NVDATA into the init-data, but I cannot always guarantee that HOST_DATA will be settable (on a cloud instance, for example).

If the TCB doesn't change, you can't make any confidential guarantees here.

I feel like we're going in circles a bit. Is there an institutional blocker here? Like a review from another community or a client? Why not just start with simple resources until someone complains, if they do?

@tylerfanelli
Copy link
Contributor Author

tylerfanelli commented Feb 24, 2025

I feel like we're going in circles a bit. Is there an institutional blocker here? Like a review from another community or a client? Why not just start with simple resources until someone complains, if they do?

The idea that guests could request resources that "weren't for them" was a big concern. However, you mentioned randomly-generating resource IDs such that they wouldn't be "guessable". I hadn't thought of that before.

Generate a UUID (or something randomized) per NEK, and embedding that UUID in the virtual disk such that no other guests (w/o the correct disk) could know what to request. I think that could work pretty securely with the existing resource implementation and id-key wouldn't be necessary. The "swapping disks" problem is another issue entirely.

Do you agree?

@fitzthum
Copy link
Member

Do you agree?

Yeah, I think one thing that could really help would be to make some tooling that could do all this automatically. This would be on the front-end of things. It could just be a bash script or two (we could create an integrations or examples dir), but we also have an option to extend the kbs-client. For instance we could add a plugin interface to the client allowing people to create their own logic for registering resources and creating policies.

@tylerfanelli
Copy link
Contributor Author

One desired thing that id-key does provide is the ability to generate the encryption key from an HSM. There's the PKCS11 plugin, but that just takes a label of something in an HSM and returns it (which AFAIU is not recommended), rather than encrypting and decrypting data in-place while the key never leaves the HSM.

This is useful, because it means the secret decryption itself is not tied to a specific KBS instance. That's not available in id-key yet, but I can add it.

@fitzthum
Copy link
Member

One desired thing that id-key does provide is the ability to generate the encryption key from an HSM. There's the PKCS11 plugin, but that just takes a label of something in an HSM and returns it (which AFAIU is not recommended), rather than encrypting and decrypting data in-place while the key never leaves the HSM.

Yeah the PKCS11 backend is very basic. There are some rough plans to improve it but they are low priority for me atm. The PKCS11 backend was developed just before the plugin infrastructure existed. It is a resource backend instead, which creates some limitations. We can't really pass sophisticated parameters to it. At the moment it uses the generic secret mechanism to store secrets as blobs. This is a valid PKCS11 flow but it's not what most people think of when using an HSM.

For a resource backend, keywrapping doesn't add a whole lot of value. It would be one thing if we could unwrap a key provided from the guest (more on this in a second) but the resource backend and the KBS protocol don't really support this. The main thing we miss out on currently is that it's tricky for people to bring their own HSM with keys already in it if they don't happen to be using the right key type. Note that for confidential containers, we have another method of accessing an HSM from inside the guest (sealed secrets) that allows people to use keywrapping.

I think it could be valuable to add more sophisticated HSM support. I mentioned earlier the idea of a remote vTPM provided by Trustee. In some ways that's not so different from the id-key stuff, but it's a little more generic. If we do want to introduce something like that I think we should consider adding two-way encryption to the KBS protocol itself rather than providing it in a plugin. There are some open questions here, but I think it's a good area to explore.

@tylerfanelli
Copy link
Contributor Author

tylerfanelli commented Feb 24, 2025

Thanks for your input.

I'm closing this as I now believe it'd be more worthwhile to add more sophisticated HSM support. As there's restrictions for it being used strictly as a resource backend, I'll try to explore it as its own plugin. It could perhaps replace the PKCS11 resource backend, but that's for others to decide.

If need be, I'll also look into suggesting some two-way encryption to the KBS protocol. I'm not entirely sure how that'll be done yet. What I'm thinking is generating the server's TeePubKey (per-client) after each successful attestation, and returning it alongside the JWT. On normal resource backends it can remain unused, but for plugins that want to use it, they could at that point.

Doesn't really require changes to the KBS protocol, as what is actually returned from a KBS attestation is undefined. What do you think?

@fitzthum
Copy link
Member

Doesn't really require changes to the KBS protocol, as what is actually returned from a KBS attestation is undefined. What do you think?

I'm not sure the best way to implement tbh. Currently we allow plugins to specify if they require encryption or admin authentication. It could be cool to have another field where they can specify that they require requests to be encrypted. Then the KBS would take care of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants