Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugins: add sgx-epc plugin. #156

Merged
merged 1 commit into from
Oct 23, 2023

Conversation

klihub
Copy link
Collaborator

@klihub klihub commented Oct 16, 2023

plugins: add sgx-epc plugin.

Add a plugin for limiting SGX encrypted page cache usage with
pod annotations. Note that the plugin requires a patched cgroup
v2 misc controller with support for something like
echo 'sgx_epc 65536' > /sys/fs/cgroup/$CGRP/misc.max
to work.

@klihub klihub force-pushed the devel/sgx-config-plugin branch from 2719f2e to 8b8e77e Compare October 16, 2023 12:17
@klihub
Copy link
Collaborator Author

klihub commented Oct 16, 2023

@mythi Here is the sgx-config. It is the original NRI PoC sample plugin. The code is updated for latest NRI main HEAD, retrofitted to this repo, and extended with the necessary bits for image building and a kustomizable Helm chart for easier deployment. PTAL.

Copy link
Collaborator

@askervin askervin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is as clean as it gets with the current stuff.

Maybe the name of the plugin could be just nri-sgx, if we can see this plugin to be extended to handle other SGX properties later on. Or nri-sgx-epc it's going to be EPC and nothing else.

The current project structure makes you write three daemonset specs for a single plugin. I guess we could make this somehow simpler, but did not get any good ideas when writing memory plugins. Yet another layer of templating on top of templating did not feel good idea at that point, but this repetition with slight differences is not very nice either.

cmd/plugins/sgx-config/nri-sgx-config-deployment.yaml.in Outdated Show resolved Hide resolved
deployment/helm/sgx-config/values.yaml Outdated Show resolved Hide resolved
fmuyassarov added a commit that referenced this pull request Oct 17, 2023
[release-0.2]: cherry-pick #156: fix image tagging scheme in publishing workflow.
cmd/plugins/sgx-config/sgx-config.go Outdated Show resolved Hide resolved
docs/memory/sgx-config.md Outdated Show resolved Hide resolved
@klihub
Copy link
Collaborator Author

klihub commented Oct 18, 2023

The current project structure makes you write three daemonset specs for a single plugin. I guess we could make this somehow simpler, but did not get any good ideas when writing memory plugins. Yet another layer of templating on top of templating did not feel good idea at that point, but this repetition with slight differences is not very nice either.

@askervin I think we should be able to get this down to one per plugin, the one in the helm chart templates, with relative ease. We have now extra copies to be able to wire up plugins differently for e2e tests which tend to use config files and not ConfigMaps. I think we should (be able to) change this once we reworked the configuration handling bits (IOW, manage to iron out the worst monkey puke from #164 and add the missing bits for e2e test there). That will get us rid of the fallback configuration file altogether.

At that point it would be nice to use helm and a local unpacked helm chart directory for e2e tests, too (and make sure that all the necessary {k,c}ustomizability and other templating is correctly in place to support the e2e test cases).

@klihub klihub force-pushed the devel/sgx-config-plugin branch from 25a3dbc to d25d66e Compare October 18, 2023 20:09
@klihub
Copy link
Collaborator Author

klihub commented Oct 18, 2023

I think the SGX EPC NRI plugin should also have an UpdateContainer method (similar to the already implemented CreateContainer method).

The use case for this is that a user defines an EPC limit for a pod after it has already been created. Or a user updates a defined limit because it was too low/high.

So does that really work already without recreating the whole pod and container ? With in-place vertical pod scaling ?

@klihub klihub changed the title plugins: add sgx-config plugin. plugins: add sgx-epc plugin. Oct 18, 2023
@mythi
Copy link
Contributor

mythi commented Oct 19, 2023

I think the SGX EPC NRI plugin should also have an UpdateContainer method (similar to the already implemented CreateContainer method).
The use case for this is that a user defines an EPC limit for a pod after it has already been created. Or a user updates a defined limit because it was too low/high.

So does that really work already without recreating the whole pod and container ? With in-place vertical pod scaling ?

To me it looks this is not possible right now. I tried patching sgx.intel.com/epc and I get an error:

* spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`,`spec.initContainers[*].image`,`spec.activeDeadlineSeconds`,`spec.tolerations` (only additions to existing tolerations),`spec.terminationGracePeriodSeconds` (allow it to be set to 1 if it was previously negative)

@ffuerste
Copy link
Contributor

I think the SGX EPC NRI plugin should also have an UpdateContainer method (similar to the already implemented CreateContainer method).
The use case for this is that a user defines an EPC limit for a pod after it has already been created. Or a user updates a defined limit because it was too low/high.

So does that really work already without recreating the whole pod and container ? With in-place vertical pod scaling ?

After thinking it completely through, I don't think so. If the annotations change on the pod level (e.g. in .spec.template.metadata.annotations for a Deployment) then Kubernetes is creating a new ReplicaSet. I forgot that when I was writing, sorry. So please ignore my comment. I also deleted it.

@mythi
Copy link
Contributor

mythi commented Oct 19, 2023

@ffuerste I think that would be a reasonable feature for the future if we can get rid of the annotations and in-place autoscaling can be made to work etc but for know I'd keep the focus on the CreateContainer flows

@klihub klihub force-pushed the devel/sgx-config-plugin branch from d25d66e to 66c8f29 Compare October 20, 2023 14:25
@klihub klihub marked this pull request as ready for review October 20, 2023 14:25
@klihub klihub force-pushed the devel/sgx-config-plugin branch from 66c8f29 to 5470767 Compare October 20, 2023 14:39
Copy link
Collaborator

@askervin askervin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

docs/memory/sgx-epc.md Outdated Show resolved Hide resolved
docs/deployment/sgx-epc.md Show resolved Hide resolved
Add a plugin for limiting SGX encrypted page cache usage with
pod annotations. Note that the plugin requires a patched cgroup
v2 misc controller with support for something like
    echo 'sgx_epc 65536' > /sys/fs/cgroup/$CGRP/misc.max
to work.

Signed-off-by: Krisztian Litkey <[email protected]>
@klihub klihub force-pushed the devel/sgx-config-plugin branch from 5470767 to cf79983 Compare October 23, 2023 07:31
@klihub klihub requested a review from marquiz October 23, 2023 07:33
Copy link
Collaborator

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @klihub
LGTM

@marquiz marquiz merged commit e09ca9b into containers:main Oct 23, 2023
Copy link
Collaborator

@askervin askervin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A typo... otherwise LGTM.


## Installing the Chart

Path to the chart: `nri-sg-epc`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/nri-sg-epc/nri-sgx-epc

Comment on lines +15 to +18
# for all containers in the pod
epc-limit.nri.io/pod: "32768"
# alternative notation for all containers in the pod
epc-limit.nri.io: "8192"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

users are expected to set per-container sgx.intel.com/epc limits which we'd take to epc-limit.nri.io/container.c0 annotations. Are these about pod-level misc setting (e.g., a sum of all per-container requests within the pod)?

Copy link
Collaborator Author

@klihub klihub Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the '/pod' (and its alternative) notation applies to any container in the pod. It's simply a shorthand notation that can be used instead of adding for every container CTR in the pod a container-specific epc-limit.nri.io/container.$CTR: $limit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, hmm. There's probably no use for that functionality.

I could think of a case where we'd allow a policy that takes the sum of epc resource requests and adds that to a sandbox level shared resource...

@klihub klihub deleted the devel/sgx-config-plugin branch October 23, 2023 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants